[2602.18089] DohaScript: A Large-Scale Multi-Writer Dataset for Continuous Handwritten Hindi Text

[2602.18089] DohaScript: A Large-Scale Multi-Writer Dataset for Continuous Handwritten Hindi Text

arXiv - Machine Learning 4 min read Article

Summary

DohaScript introduces a large-scale dataset for continuous handwritten Hindi text, addressing the lack of diverse and high-quality resources for handwriting analysis in Devanagari script.

Why It Matters

This dataset is crucial for advancing research in handwriting recognition and analysis, particularly for low-resource languages like Hindi. It provides a standardized benchmark that can improve machine learning models and applications in natural language processing and computer vision.

Key Takeaways

  • DohaScript is a large-scale dataset featuring continuous handwritten Hindi text from 531 contributors.
  • The dataset allows for systematic analysis of writer-specific variations in handwriting.
  • It supports various applications, including handwriting recognition and style analysis.
  • Rigorous quality curation ensures high reliability and practical value for researchers.
  • DohaScript aims to fill the gap in resources for Devanagari handwriting analysis.

Computer Science > Computer Vision and Pattern Recognition arXiv:2602.18089 (cs) [Submitted on 20 Feb 2026] Title:DohaScript: A Large-Scale Multi-Writer Dataset for Continuous Handwritten Hindi Text Authors:Kunwar Arpit Singh, Ankush Prakash, Haroon R Lone View a PDF of the paper titled DohaScript: A Large-Scale Multi-Writer Dataset for Continuous Handwritten Hindi Text, by Kunwar Arpit Singh and 2 other authors View PDF HTML (experimental) Abstract:Despite having hundreds of millions of speakers, handwritten Devanagari text remains severely underrepresented in publicly available benchmark datasets. Existing resources are limited in scale, focus primarily on isolated characters or short words, and lack controlled lexical content and writer level diversity, which restricts their utility for modern data driven handwriting analysis. As a result, they fail to capture the continuous, fused, and structurally complex nature of Devanagari handwriting, where characters are connected through a shared shirorekha (horizontal headline) and exhibit rich ligature formations. We introduce DohaScript, a large scale, multi writer dataset of handwritten Hindi text collected from 531 unique contributors. The dataset is designed as a parallel stylistic corpus, in which all writers transcribe the same fixed set of six traditional Hindi dohas (couplets). This controlled design enables systematic analysis of writer specific variation independent of linguistic content, and supports tasks such as h...

Related Articles

Mantis Biotech is making 'digital twins' of humans to help solve medicine's data availability problem | TechCrunch
Data Science

Mantis Biotech is making 'digital twins' of humans to help solve medicine's data availability problem | TechCrunch

Mantis takes disparate sources of data to make synthetic datasets that can be used to build so-called "digital twins" of the human body, ...

TechCrunch - AI · 6 min ·
Nlp

[P] Using YouTube as a data source (lessons from building a coffee domain dataset)

I started working on a small coffee coaching app recently - something that could answer questions around brew methods, grind size, extrac...

Reddit - Machine Learning · 1 min ·
UMKC Announces New Master of Science in Artificial Intelligence
Ai Infrastructure

UMKC Announces New Master of Science in Artificial Intelligence

UMKC announces a new Master of Science in Artificial Intelligence program aimed at addressing workforce demand for AI expertise, set to l...

AI News - General · 4 min ·
[2603.16629] MLLM-based Textual Explanations for Face Comparison
Llms

[2603.16629] MLLM-based Textual Explanations for Face Comparison

Abstract page for arXiv paper 2603.16629: MLLM-based Textual Explanations for Face Comparison

arXiv - AI · 4 min ·
More in Data Science: This Week Guide Trending

No comments

No comments yet. Be the first to comment!

Stay updated with AI News

Get the latest news, tools, and insights delivered to your inbox.

Daily or weekly digest • Unsubscribe anytime