[2603.24326] Boosting Document Parsing Efficiency and Performance with Coarse-to-Fine Visual Processing
Abstract page for arXiv paper 2603.24326: Boosting Document Parsing Efficiency and Performance with Coarse-to-Fine Visual Processing
Text understanding and language tasks
Abstract page for arXiv paper 2603.24326: Boosting Document Parsing Efficiency and Performance with Coarse-to-Fine Visual Processing
Abstract page for arXiv paper 2601.13508: Autonomous Computational Catalysis Research via Agentic Systems
Abstract page for arXiv paper 2510.20847: Integrated representational signatures strengthen specificity in brains and models
The paper introduces VariViT, a Vision Transformer designed to effectively handle variable image sizes, improving feature representation ...
The paper presents LongAudio-RAG, a framework for event-grounded question answering over lengthy audio recordings, enhancing accuracy thr...
This article evaluates the mathematical reasoning capabilities of large language models (LLMs) in Sinhala and Tamil, revealing significan...
This paper presents a novel uncertainty-aware multimodal segmentation framework that integrates radiological images and clinical text to ...
TruthStance introduces a comprehensive dataset of conversations from Truth Social, focusing on argument mining and stance detection, with...
The paper presents DP-KSA, a novel algorithm that integrates differential privacy into retrieval-augmented generation (RAG) systems, addr...
The paper introduces InnoEval, a framework for evaluating research ideas using knowledge-grounded, multi-perspective reasoning, addressin...
The paper presents AXE, an innovative framework for validating zero-day vulnerabilities using minimal metadata, achieving a significant i...
This paper discusses a novel approach to audience expansion in a two-sided marketplace, focusing on high precision retrieval methods for ...
The paper presents STATe-of-Thoughts, a new method for improving output diversity and interpretability in inference-time compute methods,...
The paper presents AbracADDbra, a framework that enhances object addition in computer vision by decoupling placement and editing tasks th...
This article explores the effectiveness of reasoning language models (RLMs) in assessing parental cooperation during child protection int...
This article explores Relative Voice Impression Estimation (RIE), focusing on how different speech modeling approaches affect listener pe...
The paper introduces the Gaussian Thought Sampler (GTS), a novel approach to inference-time scaling in latent reasoning models, enhancing...
The paper discusses an abstention-aware framework for scientific reasoning, emphasizing the importance of knowing when to abstain from an...
The paper presents Spherical Barycentric Aggregation (SBA), a new method for aggregating outputs in Mixture-of-Experts (MoE) embedding mo...
This paper evaluates the performance of GPT-5 and other LLMs on long short-context tasks, revealing significant gaps between theoretical ...
MC$^2$Mark introduces a novel watermarking framework that ensures reliable embedding of long messages in generated text while maintaining...
This article presents a multi-agent framework for medical AI that enhances clinical query processing by leveraging fine-tuned language mo...
The paper introduces DenseMLLM, a multimodal large language model designed to perform dense predictions without the need for complex, tas...
Get the latest news, tools, and insights delivered to your inbox.
Daily or weekly digest • Unsubscribe anytime