[2602.09678] Administrative Law's Fourth Settlement: AI and the Capability-Accountability Trap
Abstract page for arXiv paper 2602.09678: Administrative Law's Fourth Settlement: AI and the Capability-Accountability Trap
Image recognition, detection, and visual AI
Abstract page for arXiv paper 2602.09678: Administrative Law's Fourth Settlement: AI and the Capability-Accountability Trap
Abstract page for arXiv paper 2601.13622: CARPE: Context-Aware Image Representation Prioritization via Ensemble for Large Vision-Language...
Abstract page for arXiv paper 2603.26551: Beyond MACs: Hardware Efficient Architecture Design for Vision Backbones
The paper presents Quant VideoGen, a framework for autoregressive long video generation that addresses the limitations of KV cache memory...
The paper introduces Group-Equivariant Posterior Consistency (GEPC), a method for detecting out-of-distribution data in diffusion models ...
COGITAO introduces a novel framework for studying compositionality and generalization in visual reasoning, offering extensive task genera...
The paper introduces MedReasoner, a framework that utilizes reinforcement learning for precise medical reasoning and pixel-level groundin...
The paper presents CARL, a camera-agnostic model for spectral image analysis that enhances AI methodologies across various imaging modali...
This paper presents MedVLSynther, a framework for synthesizing high-quality visual question answering (VQA) from medical documents, enhan...
The paper presents FindAnything, a framework for open-vocabulary and object-centric mapping that enhances robot exploration in unknown en...
This survey reviews advancements in spatiotemporal consistency in video generation, addressing challenges and methodologies in creating c...
The paper introduces FOCUS, a deep learning framework for mapping PFAS contamination by integrating sparse data with environmental contex...
PromptGuard introduces a novel method for moderating unsafe content in text-to-image models, enhancing safety without sacrificing image q...
The paper presents RoboSpatial, a dataset aimed at enhancing spatial understanding in robotics by providing 2D and 3D vision-language mod...
The paper presents MC-LLaVA, a multi-concept personalized vision-language model that enhances user experience by integrating multiple con...
This paper introduces a novel Positional Recovery Training (Port) framework for improving temporal grounding in animal behavior analysis,...
PLAICraft introduces a large-scale dataset capturing time-aligned vision, speech, and action data from multiplayer Minecraft, aimed at ad...
The article presents SurgRAW, a multi-agent workflow utilizing Chain of Thought reasoning for enhanced robotic surgical video analysis, a...
This paper investigates the effectiveness of object-centric representations in enhancing compositional generalization in machine learning...
The paper introduces a zero-shot editing method for video classifiers, allowing for the refinement of coarse categories into finer subcat...
This paper presents CLIP-MHAdapter, a novel contrastive learning framework that enhances street-view image classification by using attent...
The paper presents the Subtractive Modulative Network (SMN), a new architecture for implicit neural representations that enhances paramet...
RefineFormer3D presents a lightweight transformer architecture for 3D medical image segmentation, achieving high accuracy with significan...
Get the latest news, tools, and insights delivered to your inbox.
Daily or weekly digest • Unsubscribe anytime