[2602.13889] Parameter-Efficient Fine-Tuning of DINOv2 for Large-Scale Font Classification
Summary
The paper presents a novel approach to font classification using DINOv2, achieving high accuracy with minimal parameter tuning and introducing a synthetic dataset generation pipeline.
Why It Matters
This research addresses the challenge of font classification in computer vision, demonstrating a method that significantly reduces the computational resources needed for model fine-tuning. The open-source release of the model and dataset enhances accessibility for further research and application in typography and design.
Key Takeaways
- Achieved 86% top-1 accuracy in font classification with DINOv2.
- Utilized Low-Rank Adaptation (LoRA) to fine-tune less than 1% of model parameters.
- Introduced a synthetic dataset generation pipeline for diverse training images.
- Ensured consistency between training and inference through built-in preprocessing.
- Released model, dataset, and training pipeline as open-source resources.
Computer Science > Computer Vision and Pattern Recognition arXiv:2602.13889 (cs) [Submitted on 14 Feb 2026] Title:Parameter-Efficient Fine-Tuning of DINOv2 for Large-Scale Font Classification Authors:Daniel Chen, Zaria Zinn, Marcus Lowe View a PDF of the paper titled Parameter-Efficient Fine-Tuning of DINOv2 for Large-Scale Font Classification, by Daniel Chen and 2 other authors View PDF HTML (experimental) Abstract:We present a font classification system capable of identifying 394 font families from rendered text images. Our approach fine-tunes a DINOv2 Vision Transformer using Low-Rank Adaptation (LoRA), achieving approximately 86% top-1 accuracy while training fewer than 1% of the model's 87.2M parameters. We introduce a synthetic dataset generation pipeline that renders Google Fonts at scale with diverse augmentations including randomized colors, alignment, line wrapping, and Gaussian noise, producing training images that generalize to real-world typographic samples. The model incorporates built-in preprocessing to ensure consistency between training and inference, and is deployed as a HuggingFace Inference Endpoint. We release the model, dataset, and full training pipeline as open-source resources. Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG) Cite as: arXiv:2602.13889 [cs.CV] (or arXiv:2602.13889v1 [cs.CV] for this version) https://doi.org/10.48550/arXiv.2602.13889 Focus to learn more arXiv-issued DOI via DataCite (pending re...