[2602.09678] Administrative Law's Fourth Settlement: AI and the Capability-Accountability Trap
Abstract page for arXiv paper 2602.09678: Administrative Law's Fourth Settlement: AI and the Capability-Accountability Trap
Image recognition, detection, and visual AI
Abstract page for arXiv paper 2602.09678: Administrative Law's Fourth Settlement: AI and the Capability-Accountability Trap
Abstract page for arXiv paper 2601.13622: CARPE: Context-Aware Image Representation Prioritization via Ensemble for Large Vision-Language...
Abstract page for arXiv paper 2603.26551: Beyond MACs: Hardware Efficient Architecture Design for Vision Backbones
VideoMind introduces a novel approach for temporal-grounded video reasoning using a Chain-of-LoRA agent, enhancing multi-modal reasoning ...
The paper presents PASS, a novel algorithmic framework that utilizes visual prompts to enhance structural sparsity in neural networks, im...
The paper presents an innovative approach using an adaptive Runge-Kutta method for spatiotemporal prediction, enhancing model accuracy in...
This article explores self-supervised context reasoning in humans and AI, presenting a model called SeCo that learns contextual relations...
CleverCatch introduces a knowledge-guided weak supervision model for detecting healthcare fraud, enhancing accuracy and interpretability ...
MemOCR introduces a multimodal memory agent that enhances long-horizon reasoning by using layout-aware visual memory, optimizing context ...
This paper discusses the impact of uncertainty in ground truth evaluations on AI performance assessments, proposing a probabilistic frame...
The paper presents BEAT, a novel framework for executing visual backdoor attacks on Vision-Language Model (VLM)-based embodied agents, hi...
The paper presents VIRTUE, a novel Visual-Interactive Text-Image Universal Embedder that enhances multimodal representation learning by i...
This study presents an AI-based framework to analyze tourist perceptions in historic urban quarters of Shanghai, utilizing multimodal soc...
The paper presents GRILL, a method to enhance adversarial attacks on autoencoders by restoring gradient signals in ill-conditioned layers...
NovaPlan introduces a framework for zero-shot long-horizon manipulation in robotics, integrating video language planning with geometrical...
This article presents a benchmarking study on unlearning algorithms for Vision Transformers (VTs), highlighting their performance compare...
The paper presents StructXLIP, a novel approach that enhances vision-language models by integrating multimodal structural cues, improving...
The paper presents HeatPrompt, a zero-shot vision-language framework for estimating urban heat demand from satellite images, enhancing en...
This article examines the performance of multilingual large language models (LLMs) across various languages, revealing that comprehension...
This paper presents a novel constraint-based planning framework for mobile robots, enabling zero-shot generalization in interactive navig...
The paper introduces the Very Big Video Reasoning (VBVR) Dataset, a large-scale resource for studying video reasoning capabilities, featu...
The SEAL-pose framework enhances 3D human pose estimation by utilizing a learned loss function that captures structural consistency among...
The article presents a curated dataset of parasitoid wasps and associated Hymenoptera, aimed at enhancing automated identification system...
Get the latest news, tools, and insights delivered to your inbox.
Daily or weekly digest • Unsubscribe anytime