[2602.09678] Administrative Law's Fourth Settlement: AI and the Capability-Accountability Trap
Abstract page for arXiv paper 2602.09678: Administrative Law's Fourth Settlement: AI and the Capability-Accountability Trap
Image recognition, detection, and visual AI
Abstract page for arXiv paper 2602.09678: Administrative Law's Fourth Settlement: AI and the Capability-Accountability Trap
Abstract page for arXiv paper 2601.13622: CARPE: Context-Aware Image Representation Prioritization via Ensemble for Large Vision-Language...
Abstract page for arXiv paper 2603.26551: Beyond MACs: Hardware Efficient Architecture Design for Vision Backbones
The paper introduces the Better Audio Transformer (BAT), which utilizes a novel Convex Gated Probing method to enhance audio self-supervi...
This article discusses the development of production-scale Optical Character Recognition (OCR) systems tailored for India's multilingual ...
This article presents a novel framework for generating histopathology reports using a combination of a foundation model and a Transformer...
This paper presents MoMa-SG, a framework for creating semantic-kinematic 3D scene graphs to enhance mobile manipulation of articulated ob...
This article presents a study on Spatial Audio Question Answering (Spatial AQA) focusing on dynamic sound source movements, introducing i...
This paper presents a self-supervised learning approach to enhance feature representations in object detection tasks, reducing the need f...
The paper presents CHAI, a novel approach to enhance text-to-video generation by utilizing Cache Attention for efficient inference, achie...
The paper presents LGQ, a novel image tokenizer that learns discretization geometry to enhance scalability and stability in visual genera...
This article investigates the limitations of vision-language models (VLMs) in spatial reasoning, particularly their struggle to localize ...
The paper presents OmniCT, a unified slice-volume large vision-language model (LVLM) designed for comprehensive CT analysis, addressing l...
This article explores real-time object detection using deep learning, detailing various algorithms, applications, and future research dir...
The paper presents ScenicRules, a benchmark for evaluating autonomous driving systems that balances multiple objectives like safety and e...
The paper presents MedProbCLIP, a probabilistic framework for enhancing the reliability of radiograph-report retrieval using vision-langu...
The paper presents MARVL, a novel approach for robotic manipulation that utilizes Vision-Language Models (VLMs) to enhance task performan...
This paper presents GPEReg-Net, a novel framework for improving image registration in bidirectional photoacoustic microscopy by disentang...
The paper introduces DocSplit, a benchmark dataset and evaluation framework for document packet recognition and splitting, addressing cha...
The paper presents EarthSpatialBench, a benchmark designed to evaluate spatial reasoning capabilities of multimodal large language models...
The paper presents MaS-VQA, a novel framework for Knowledge-Based Visual Question Answering that enhances answer accuracy by integrating ...
This article reviews the current landscape of foundation models (FMs) in medical imaging, discussing their design principles, application...
The paper introduces FlipSet, a benchmark for assessing visual perspective taking in vision-language models, revealing significant egocen...
Get the latest news, tools, and insights delivered to your inbox.
Daily or weekly digest • Unsubscribe anytime