[2603.29450] Few-shot Writer Adaptation via Multimodal In-Context Learning
About this article
Abstract page for arXiv paper 2603.29450: Few-shot Writer Adaptation via Multimodal In-Context Learning
Computer Science > Computer Vision and Pattern Recognition arXiv:2603.29450 (cs) [Submitted on 31 Mar 2026] Title:Few-shot Writer Adaptation via Multimodal In-Context Learning Authors:Tom Simon, Stephane Nicolas, Pierrick Tranouez, Clement Chatelain, Thierry Paquet View a PDF of the paper titled Few-shot Writer Adaptation via Multimodal In-Context Learning, by Tom Simon and 4 other authors View PDF HTML (experimental) Abstract:While state-of-the-art Handwritten Text Recognition (HTR) models perform well on standard benchmarks, they frequently struggle with writers exhibiting highly specific styles that are underrepresented in the training data. To handle unseen and atypical writers, writer adaptation techniques personalize HTR models to individual handwriting styles. Leading writer adaptation methods require either offline fine-tuning or parameter updates at inference time, both involving gradient computation and backpropagation, which increase computational costs and demand careful hyperparameter tuning. In this work, we propose a novel context-driven HTR framework3 inspired by multimodal in-context learning, enabling inference-time writer adaptation using only a few examples from the target writer without any parameter updates. We further demonstrate the impact of context length, design a compact 8M-parameter CNN-Transformer that enables few-shot in-context adaptation, and show that combining context-driven and standard OCR training strategies leads to complementary impr...