Computer Vision

Image recognition, detection, and visual AI

Top This Week

[2602.09678] Administrative Law's Fourth Settlement: AI and the Capability-Accountability Trap
Computer Vision

[2602.09678] Administrative Law's Fourth Settlement: AI and the Capability-Accountability Trap

Abstract page for arXiv paper 2602.09678: Administrative Law's Fourth Settlement: AI and the Capability-Accountability Trap

arXiv - AI · 4 min ·
[2601.13622] CARPE: Context-Aware Image Representation Prioritization via Ensemble for Large Vision-Language Models
Llms

[2601.13622] CARPE: Context-Aware Image Representation Prioritization via Ensemble for Large Vision-Language Models

Abstract page for arXiv paper 2601.13622: CARPE: Context-Aware Image Representation Prioritization via Ensemble for Large Vision-Language...

arXiv - AI · 3 min ·
[2603.26551] Beyond MACs: Hardware Efficient Architecture Design for Vision Backbones
Computer Vision

[2603.26551] Beyond MACs: Hardware Efficient Architecture Design for Vision Backbones

Abstract page for arXiv paper 2603.26551: Beyond MACs: Hardware Efficient Architecture Design for Vision Backbones

arXiv - AI · 4 min ·

All Content

[2511.16175] Mantis: A Versatile Vision-Language-Action Model with Disentangled Visual Foresight
Machine Learning

[2511.16175] Mantis: A Versatile Vision-Language-Action Model with Disentangled Visual Foresight

The paper introduces Mantis, a Vision-Language-Action model that enhances visual foresight through a novel framework, achieving superior ...

arXiv - AI · 4 min ·
[2511.02860] AI-driven Large-scale Electron Microscopy enables Whole-tissue Subcellular Digitization
Machine Learning

[2511.02860] AI-driven Large-scale Electron Microscopy enables Whole-tissue Subcellular Digitization

The article presents DeepOrganelle, a deep learning tool that enhances large-scale electron microscopy for mapping organelle distribution...

arXiv - AI · 3 min ·
[2510.06820] Efficient Discriminative Joint Encoders for Large Scale Vision-Language Reranking
Machine Learning

[2510.06820] Efficient Discriminative Joint Encoders for Large Scale Vision-Language Reranking

The paper presents EDJE, an Efficient Discriminative Joint Encoder designed to enhance vision-language reranking by precomputing visual t...

arXiv - Machine Learning · 3 min ·
[2509.26287] Flower: A Flow-Matching Solver for Inverse Problems
Machine Learning

[2509.26287] Flower: A Flow-Matching Solver for Inverse Problems

The paper introduces Flower, a novel solver for linear inverse problems that utilizes a pre-trained flow model to enhance reconstruction ...

arXiv - Machine Learning · 3 min ·
[2510.14979] From Pixels to Words -- Towards Native Vision-Language Primitives at Scale
Llms

[2510.14979] From Pixels to Words -- Towards Native Vision-Language Primitives at Scale

The paper discusses the development of native Vision-Language Models (VLMs) that integrate vision and language capabilities more effectiv...

arXiv - AI · 4 min ·
[2510.02240] RewardMap: Tackling Sparse Rewards in Fine-grained Visual Reasoning via Multi-Stage Reinforcement Learning
Llms

[2510.02240] RewardMap: Tackling Sparse Rewards in Fine-grained Visual Reasoning via Multi-Stage Reinforcement Learning

The paper presents RewardMap, a multi-stage reinforcement learning framework aimed at improving fine-grained visual reasoning in multimod...

arXiv - AI · 4 min ·
[2505.17779] U2-BENCH: Benchmarking Large Vision-Language Models on Ultrasound Understanding
Llms

[2505.17779] U2-BENCH: Benchmarking Large Vision-Language Models on Ultrasound Understanding

The paper introduces U2-BENCH, a benchmark for evaluating large vision-language models (LVLMs) on ultrasound understanding, addressing ch...

arXiv - Machine Learning · 4 min ·
[2509.24526] CMT: Mid-Training for Efficient Learning of Consistency, Mean Flow, and Flow Map Models
Machine Learning

[2509.24526] CMT: Mid-Training for Efficient Learning of Consistency, Mean Flow, and Flow Map Models

The paper introduces Consistency Mid-Training (CMT), a novel method for enhancing the efficiency of training flow map models, achieving s...

arXiv - Machine Learning · 4 min ·
[2503.07853] Hier-COS: Making Deep Features Hierarchy-aware via Composition of Orthogonal Subspaces
Machine Learning

[2503.07853] Hier-COS: Making Deep Features Hierarchy-aware via Composition of Orthogonal Subspaces

The paper presents Hier-COS, a new framework for improving hierarchical classification in deep learning by addressing limitations in exis...

arXiv - Machine Learning · 4 min ·
[2501.06336] MEt3R: Measuring Multi-View Consistency in Generated Images
Machine Learning

[2501.06336] MEt3R: Measuring Multi-View Consistency in Generated Images

The paper presents MEt3R, a novel metric for assessing multi-view consistency in generated images, addressing limitations of traditional ...

arXiv - Machine Learning · 4 min ·
[2509.21628] Comparing and Integrating Different Notions of Representational Correspondence in Neural Systems
Machine Learning

[2509.21628] Comparing and Integrating Different Notions of Representational Correspondence in Neural Systems

This article explores the integration of various representational similarity metrics in neural systems, assessing their effectiveness in ...

arXiv - AI · 4 min ·
[2301.00201] Exploring Singularities in point clouds with the graph Laplacian: An explicit approach
Data Science

[2301.00201] Exploring Singularities in point clouds with the graph Laplacian: An explicit approach

This paper presents a novel approach using the graph Laplacian to analyze singularities in point clouds, offering theoretical guarantees ...

arXiv - Machine Learning · 3 min ·
[2507.20174] LRR-Bench: Left, Right or Rotate? Vision-Language models Still Struggle With Spatial Understanding Tasks
Llms

[2507.20174] LRR-Bench: Left, Right or Rotate? Vision-Language models Still Struggle With Spatial Understanding Tasks

The paper introduces LRR-Bench, a benchmark for evaluating Vision-Language Models (VLMs) on spatial understanding tasks, revealing signif...

arXiv - AI · 4 min ·
[2507.10846] Winsor-CAM: Human-Tunable Visual Explanations from Deep Networks via Layer-Wise Winsorization
Machine Learning

[2507.10846] Winsor-CAM: Human-Tunable Visual Explanations from Deep Networks via Layer-Wise Winsorization

Winsor-CAM introduces a novel method for visual explanations in deep networks, enhancing interpretability through human-tunable parameter...

arXiv - Machine Learning · 4 min ·
[2507.05992] Exploring Partial Multi-Label Learning via Integrating Semantic Co-occurrence Knowledge
Machine Learning

[2507.05992] Exploring Partial Multi-Label Learning via Integrating Semantic Co-occurrence Knowledge

This paper presents SCINet, a novel framework for partial multi-label learning that integrates semantic co-occurrence knowledge to improv...

arXiv - AI · 4 min ·
[2506.17337] Can Generalist Vision Language Models (VLMs) Rival Specialist Medical VLMs? Benchmarking and Strategic Insights
Llms

[2506.17337] Can Generalist Vision Language Models (VLMs) Rival Specialist Medical VLMs? Benchmarking and Strategic Insights

This study evaluates the performance of generalist Vision Language Models (VLMs) compared to specialist medical VLMs, revealing that gene...

arXiv - AI · 3 min ·
[2602.01289] Gradient-Aligned Calibration for Post-Training Quantization of Diffusion Models
Machine Learning

[2602.01289] Gradient-Aligned Calibration for Post-Training Quantization of Diffusion Models

The paper presents a novel method for post-training quantization (PTQ) of diffusion models, addressing inefficiencies in existing calibra...

arXiv - Machine Learning · 4 min ·
[2505.06595] Feature Representation Transferring to Lightweight Models via Perception Coherence
Machine Learning

[2505.06595] Feature Representation Transferring to Lightweight Models via Perception Coherence

This paper introduces a novel method for transferring feature representations from larger teacher models to lightweight student models us...

arXiv - Machine Learning · 4 min ·
[2503.23377] JavisDiT: Joint Audio-Video Diffusion Transformer with Hierarchical Spatio-Temporal Prior Synchronization
Machine Learning

[2503.23377] JavisDiT: Joint Audio-Video Diffusion Transformer with Hierarchical Spatio-Temporal Prior Synchronization

The paper presents JavisDiT, a novel Joint Audio-Video Diffusion Transformer that enhances synchronized audio-video generation through a ...

arXiv - AI · 4 min ·
[2503.21258] Learn by Reasoning: Analogical Weight Generation for Few-Shot Class-Incremental Learning
Machine Learning

[2503.21258] Learn by Reasoning: Analogical Weight Generation for Few-Shot Class-Incremental Learning

This paper presents a novel approach to Few-Shot Class-Incremental Learning (FSCIL) using an analogical generative method, enhancing mode...

arXiv - AI · 4 min ·
Previous Page 20 Next

Related Topics

Stay updated with AI News

Get the latest news, tools, and insights delivered to your inbox.

Daily or weekly digest • Unsubscribe anytime