[2603.28108] Quid est VERITAS? A Modular Framework for Archival Document Analysis
About this article
Abstract page for arXiv paper 2603.28108: Quid est VERITAS? A Modular Framework for Archival Document Analysis
Computer Science > Digital Libraries arXiv:2603.28108 (cs) [Submitted on 30 Mar 2026] Title:Quid est VERITAS? A Modular Framework for Archival Document Analysis Authors:Leonardo Bassanini, Ludovico Biancardi, Alfio Ferrara, Andrea Gamberini, Sergio Picascia, Folco Vaglienti View a PDF of the paper titled Quid est VERITAS? A Modular Framework for Archival Document Analysis, by Leonardo Bassanini and 5 other authors View PDF HTML (experimental) Abstract:The digitisation of historical documents has traditionally been conceived as a process limited to character-level transcription, producing flat text that lacks the structural and semantic information necessary for substantive computational analysis. We present VERITAS (Vision-Enhanced Reading, Interpretation, and Transcription of Archival Sources), a modular, model-agnostic framework that reconceptualises digitisation as an integrated workflow encompassing transcription, layout analysis, and semantic enrichment. The pipeline is organised into four stages - Preprocessing, Extraction, Refinement, and Enrichment - and employs a schema-driven architecture that allows researchers to declaratively specify their extraction objectives. We evaluate VERITAS on the critical edition of Bernardino Corio's Storia di Milano, a Renaissance chronicle of over 1,600 pages. Results demonstrate that the pipeline achieves a 67.6% relative reduction in word error rate compared to a commercial OCR baseline, with a threefold reduction in end-to-end p...