Computer Vision

Image recognition, detection, and visual AI

Top This Week

[2602.09678] Administrative Law's Fourth Settlement: AI and the Capability-Accountability Trap
Computer Vision

[2602.09678] Administrative Law's Fourth Settlement: AI and the Capability-Accountability Trap

Abstract page for arXiv paper 2602.09678: Administrative Law's Fourth Settlement: AI and the Capability-Accountability Trap

arXiv - AI · 4 min ·
[2601.13622] CARPE: Context-Aware Image Representation Prioritization via Ensemble for Large Vision-Language Models
Llms

[2601.13622] CARPE: Context-Aware Image Representation Prioritization via Ensemble for Large Vision-Language Models

Abstract page for arXiv paper 2601.13622: CARPE: Context-Aware Image Representation Prioritization via Ensemble for Large Vision-Language...

arXiv - AI · 3 min ·
[2603.26551] Beyond MACs: Hardware Efficient Architecture Design for Vision Backbones
Computer Vision

[2603.26551] Beyond MACs: Hardware Efficient Architecture Design for Vision Backbones

Abstract page for arXiv paper 2603.26551: Beyond MACs: Hardware Efficient Architecture Design for Vision Backbones

arXiv - AI · 4 min ·

All Content

[2602.02958] Quant VideoGen: Auto-Regressive Long Video Generation via 2-Bit KV-Cache Quantization
Machine Learning

[2602.02958] Quant VideoGen: Auto-Regressive Long Video Generation via 2-Bit KV-Cache Quantization

The paper presents Quant VideoGen, a framework for autoregressive long video generation that addresses the limitations of KV cache memory...

arXiv - Machine Learning · 4 min ·
[2602.00191] GEPC: Group-Equivariant Posterior Consistency for Out-of-Distribution Detection in Diffusion Models
Machine Learning

[2602.00191] GEPC: Group-Equivariant Posterior Consistency for Out-of-Distribution Detection in Diffusion Models

The paper introduces Group-Equivariant Posterior Consistency (GEPC), a method for detecting out-of-distribution data in diffusion models ...

arXiv - Machine Learning · 4 min ·
[2509.05249] COGITAO: A Visual Reasoning Framework To Study Compositionality & Generalization
Machine Learning

[2509.05249] COGITAO: A Visual Reasoning Framework To Study Compositionality & Generalization

COGITAO introduces a novel framework for studying compositionality and generalization in visual reasoning, offering extensive task genera...

arXiv - AI · 4 min ·
[2508.08177] MedReasoner: Reinforcement Learning Drives Reasoning Grounding from Clinical Thought to Pixel-Level Precision
Llms

[2508.08177] MedReasoner: Reinforcement Learning Drives Reasoning Grounding from Clinical Thought to Pixel-Level Precision

The paper introduces MedReasoner, a framework that utilizes reinforcement learning for precise medical reasoning and pixel-level groundin...

arXiv - AI · 4 min ·
[2504.19223] CARL: Camera-Agnostic Representation Learning for Spectral Image Analysis
Machine Learning

[2504.19223] CARL: Camera-Agnostic Representation Learning for Spectral Image Analysis

The paper presents CARL, a camera-agnostic model for spectral image analysis that enhances AI methodologies across various imaging modali...

arXiv - Machine Learning · 4 min ·
[2510.25867] Synthesizing High-Quality Visual Question Answering from Medical Documents with Generator-Verifier LMMs
Machine Learning

[2510.25867] Synthesizing High-Quality Visual Question Answering from Medical Documents with Generator-Verifier LMMs

This paper presents MedVLSynther, a framework for synthesizing high-quality visual question answering (VQA) from medical documents, enhan...

arXiv - Machine Learning · 4 min ·
[2504.08603] FindAnything: Open-Vocabulary and Object-Centric Mapping for Robot Exploration in Any Environment
Robotics

[2504.08603] FindAnything: Open-Vocabulary and Object-Centric Mapping for Robot Exploration in Any Environment

The paper presents FindAnything, a framework for open-vocabulary and object-centric mapping that enhances robot exploration in unknown en...

arXiv - AI · 4 min ·
[2502.17863] A Survey: Spatiotemporal Consistency in Video Generation
Generative Ai

[2502.17863] A Survey: Spatiotemporal Consistency in Video Generation

This survey reviews advancements in spatiotemporal consistency in video generation, addressing challenges and methodologies in creating c...

arXiv - AI · 4 min ·
[2502.14894] FOCUS on Contamination: Hydrology-Informed Noise-Aware Learning for Geospatial PFAS Mapping
Machine Learning

[2502.14894] FOCUS on Contamination: Hydrology-Informed Noise-Aware Learning for Geospatial PFAS Mapping

The paper introduces FOCUS, a deep learning framework for mapping PFAS contamination by integrating sparse data with environmental contex...

arXiv - Machine Learning · 4 min ·
[2501.03544] PromptGuard: Soft Prompt-Guided Unsafe Content Moderation for Text-to-Image Models
Machine Learning

[2501.03544] PromptGuard: Soft Prompt-Guided Unsafe Content Moderation for Text-to-Image Models

PromptGuard introduces a novel method for moderating unsafe content in text-to-image models, enhancing safety without sacrificing image q...

arXiv - AI · 4 min ·
[2411.16537] RoboSpatial: Teaching Spatial Understanding to 2D and 3D Vision-Language Models for Robotics
Llms

[2411.16537] RoboSpatial: Teaching Spatial Understanding to 2D and 3D Vision-Language Models for Robotics

The paper presents RoboSpatial, a dataset aimed at enhancing spatial understanding in robotics by providing 2D and 3D vision-language mod...

arXiv - AI · 4 min ·
[2411.11706] MC-LLaVA: Multi-Concept Personalized Vision-Language Model
Llms

[2411.11706] MC-LLaVA: Multi-Concept Personalized Vision-Language Model

The paper presents MC-LLaVA, a multi-concept personalized vision-language model that enhances user experience by integrating multiple con...

arXiv - AI · 4 min ·
[2405.05523] Prompt When the Animal is: Temporal Animal Behavior Grounding with Positional Recovery Training
Machine Learning

[2405.05523] Prompt When the Animal is: Temporal Animal Behavior Grounding with Positional Recovery Training

This paper introduces a novel Positional Recovery Training (Port) framework for improving temporal grounding in animal behavior analysis,...

arXiv - AI · 3 min ·
[2505.12707] PLAICraft: Large-Scale Time-Aligned Vision-Speech-Action Dataset for Embodied AI
Machine Learning

[2505.12707] PLAICraft: Large-Scale Time-Aligned Vision-Speech-Action Dataset for Embodied AI

PLAICraft introduces a large-scale dataset capturing time-aligned vision, speech, and action data from multiplayer Minecraft, aimed at ad...

arXiv - Machine Learning · 4 min ·
[2503.10265] SurgRAW: Multi-Agent Workflow with Chain of Thought Reasoning for Robotic Surgical Video Analysis
Llms

[2503.10265] SurgRAW: Multi-Agent Workflow with Chain of Thought Reasoning for Robotic Surgical Video Analysis

The article presents SurgRAW, a multi-agent workflow utilizing Chain of Thought reasoning for enhanced robotic surgical video analysis, a...

arXiv - AI · 4 min ·
[2602.16689] Are Object-Centric Representations Better At Compositional Generalization?
Machine Learning

[2602.16689] Are Object-Centric Representations Better At Compositional Generalization?

This paper investigates the effectiveness of object-centric representations in enhancing compositional generalization in machine learning...

arXiv - Machine Learning · 4 min ·
[2602.16545] Let's Split Up: Zero-Shot Classifier Edits for Fine-Grained Video Understanding
Machine Learning

[2602.16545] Let's Split Up: Zero-Shot Classifier Edits for Fine-Grained Video Understanding

The paper introduces a zero-shot editing method for video classifiers, allowing for the refinement of coarse categories into finer subcat...

arXiv - Machine Learning · 3 min ·
[2602.16590] A Contrastive Learning Framework Empowered by Attention-based Feature Adaptation for Street-View Image Classification
Llms

[2602.16590] A Contrastive Learning Framework Empowered by Attention-based Feature Adaptation for Street-View Image Classification

This paper presents CLIP-MHAdapter, a novel contrastive learning framework that enhances street-view image classification by using attent...

arXiv - Machine Learning · 3 min ·
[2602.16337] Subtractive Modulative Network with Learnable Periodic Activations
Ai Startups

[2602.16337] Subtractive Modulative Network with Learnable Periodic Activations

The paper presents the Subtractive Modulative Network (SMN), a new architecture for implicit neural representations that enhances paramet...

arXiv - Machine Learning · 3 min ·
[2602.16320] RefineFormer3D: Efficient 3D Medical Image Segmentation via Adaptive Multi-Scale Transformer with Cross Attention Fusion
Machine Learning

[2602.16320] RefineFormer3D: Efficient 3D Medical Image Segmentation via Adaptive Multi-Scale Transformer with Cross Attention Fusion

RefineFormer3D presents a lightweight transformer architecture for 3D medical image segmentation, achieving high accuracy with significan...

arXiv - Machine Learning · 4 min ·
Previous Page 32 Next

Related Topics

Stay updated with AI News

Get the latest news, tools, and insights delivered to your inbox.

Daily or weekly digest • Unsubscribe anytime