[2603.24326] Boosting Document Parsing Efficiency and Performance with Coarse-to-Fine Visual Processing
Abstract page for arXiv paper 2603.24326: Boosting Document Parsing Efficiency and Performance with Coarse-to-Fine Visual Processing
Text understanding and language tasks
Abstract page for arXiv paper 2603.24326: Boosting Document Parsing Efficiency and Performance with Coarse-to-Fine Visual Processing
Abstract page for arXiv paper 2601.13508: Autonomous Computational Catalysis Research via Agentic Systems
Abstract page for arXiv paper 2510.20847: Integrated representational signatures strengthen specificity in brains and models
This article explores how Vision Language Models (VLMs) enhance performance on text-only tasks by correcting binding shortcuts through vi...
The paper presents a Decoupled Representation Refinement (DRR) paradigm for Implicit Neural Representations (INRs), enhancing speed and f...
This study presents a hybrid approach for equipment anomaly prediction by combining time series embeddings with statistical features, ach...
This article explores the effectiveness of Semantic Compression Vectors (SCVs) in large language models (LLMs), comparing the 5.1 and 4o ...
This article presents findings from testing an INT8 model across five Snapdragon chipsets, revealing significant variations in accuracy, ...
This article discusses a live demo showcasing AI shopping experiences using structured data via the Universal Commerce Protocol (UCP), hi...
NVIDIA has launched the Nemotron-Nano-9B-v2-Japanese, a lightweight language model designed to enhance Japanese language understanding an...
Cohere has launched Tiny Aya, a family of open multilingual models that support over 70 languages and can run on everyday devices, enhanc...
This article evaluates disentangled representations in music generation, focusing on their effectiveness for controllable synthesis and i...
The paper presents C^2ROPE, an advanced positional encoding method for 3D Large Multimodal Models, addressing limitations of existing Rot...
The paper presents a novel framework for enhancing privacy protection in mobile GUI agents by anonymizing sensitive data while maintainin...
The paper explores the role of paraphrase generation and detection in language modeling, emphasizing the need for fine-grained semantic u...
The paper explores Code World Models (CWMs), which simulate program execution and identify error sources, focusing on local semantic exec...
This study examines the relationship between cryptocurrency whitepaper claims and actual market behavior, revealing weak predictive power...
This paper presents a two-stage retrieval system designed for the TREC Tip-of-the-Tongue task, integrating multiple retrieval methods wit...
This paper presents a unified framework for Query Auto-Completion (QAC) that integrates Retrieval-Augmented Generation (RAG) and multi-ob...
ShotFinder introduces a novel benchmark for open-domain video shot retrieval, utilizing LLMs to enhance video search capabilities through...
RosettaSpeech introduces a zero-shot framework for speech-to-speech translation, overcoming the need for parallel speech data by using mo...
This article explores the detection of 19 human values in sentences using transformer models, demonstrating the learnability of moral pre...
The paper presents CogniGent, a novel AI technique for bug localization that enhances traditional methods by leveraging causal reasoning ...
Get the latest news, tools, and insights delivered to your inbox.
Daily or weekly digest • Unsubscribe anytime