Computer Vision

Image recognition, detection, and visual AI

This Week's Best | Monthly Best | Guide | Trending

Top This Week

Computer Vision

[2602.09678] Administrative Law's Fourth Settlement: AI and the Capability-Accountability Trap

Abstract page for arXiv paper 2602.09678: Administrative Law's Fourth Settlement: AI and the Capability-Accountability Trap

arXiv - AI · 4 min · about 1 hour ago

Llms

[2601.13622] CARPE: Context-Aware Image Representation Prioritization via Ensemble for Large Vision-Language Models

Abstract page for arXiv paper 2601.13622: CARPE: Context-Aware Image Representation Prioritization via Ensemble for Large Vision-Language...

arXiv - AI · 3 min · about 1 hour ago

Computer Vision

[2603.26551] Beyond MACs: Hardware Efficient Architecture Design for Vision Backbones

Abstract page for arXiv paper 2603.26551: Beyond MACs: Hardware Efficient Architecture Design for Vision Backbones

arXiv - AI · 4 min · about 1 hour ago

All Content

Machine Learning

[2511.16175] Mantis: A Versatile Vision-Language-Action Model with Disentangled Visual Foresight

The paper introduces Mantis, a Vision-Language-Action model that enhances visual foresight through a novel framework, achieving superior ...

arXiv - AI · 4 min · about 1 month ago

Machine Learning

[2511.02860] AI-driven Large-scale Electron Microscopy enables Whole-tissue Subcellular Digitization

The article presents DeepOrganelle, a deep learning tool that enhances large-scale electron microscopy for mapping organelle distribution...

arXiv - AI · 3 min · about 1 month ago

Machine Learning

[2510.06820] Efficient Discriminative Joint Encoders for Large Scale Vision-Language Reranking

The paper presents EDJE, an Efficient Discriminative Joint Encoder designed to enhance vision-language reranking by precomputing visual t...

arXiv - Machine Learning · 3 min · about 1 month ago

Machine Learning

[2509.26287] Flower: A Flow-Matching Solver for Inverse Problems

The paper introduces Flower, a novel solver for linear inverse problems that utilizes a pre-trained flow model to enhance reconstruction ...

arXiv - Machine Learning · 3 min · about 1 month ago

Llms

[2510.14979] From Pixels to Words -- Towards Native Vision-Language Primitives at Scale

The paper discusses the development of native Vision-Language Models (VLMs) that integrate vision and language capabilities more effectiv...

arXiv - AI · 4 min · about 1 month ago

Llms

[2510.02240] RewardMap: Tackling Sparse Rewards in Fine-grained Visual Reasoning via Multi-Stage Reinforcement Learning

The paper presents RewardMap, a multi-stage reinforcement learning framework aimed at improving fine-grained visual reasoning in multimod...

arXiv - AI · 4 min · about 1 month ago

Llms

[2505.17779] U2-BENCH: Benchmarking Large Vision-Language Models on Ultrasound Understanding

The paper introduces U2-BENCH, a benchmark for evaluating large vision-language models (LVLMs) on ultrasound understanding, addressing ch...

arXiv - Machine Learning · 4 min · about 1 month ago

Machine Learning

[2509.24526] CMT: Mid-Training for Efficient Learning of Consistency, Mean Flow, and Flow Map Models

The paper introduces Consistency Mid-Training (CMT), a novel method for enhancing the efficiency of training flow map models, achieving s...

arXiv - Machine Learning · 4 min · about 1 month ago

Machine Learning

[2503.07853] Hier-COS: Making Deep Features Hierarchy-aware via Composition of Orthogonal Subspaces

The paper presents Hier-COS, a new framework for improving hierarchical classification in deep learning by addressing limitations in exis...

arXiv - Machine Learning · 4 min · about 1 month ago

Machine Learning

[2501.06336] MEt3R: Measuring Multi-View Consistency in Generated Images

The paper presents MEt3R, a novel metric for assessing multi-view consistency in generated images, addressing limitations of traditional ...

arXiv - Machine Learning · 4 min · about 1 month ago

Machine Learning

[2509.21628] Comparing and Integrating Different Notions of Representational Correspondence in Neural Systems

This article explores the integration of various representational similarity metrics in neural systems, assessing their effectiveness in ...

arXiv - AI · 4 min · about 1 month ago

Data Science

[2301.00201] Exploring Singularities in point clouds with the graph Laplacian: An explicit approach

This paper presents a novel approach using the graph Laplacian to analyze singularities in point clouds, offering theoretical guarantees ...

arXiv - Machine Learning · 3 min · about 1 month ago

Llms

[2507.20174] LRR-Bench: Left, Right or Rotate? Vision-Language models Still Struggle With Spatial Understanding Tasks

The paper introduces LRR-Bench, a benchmark for evaluating Vision-Language Models (VLMs) on spatial understanding tasks, revealing signif...

arXiv - AI · 4 min · about 1 month ago

Machine Learning

[2507.10846] Winsor-CAM: Human-Tunable Visual Explanations from Deep Networks via Layer-Wise Winsorization

Winsor-CAM introduces a novel method for visual explanations in deep networks, enhancing interpretability through human-tunable parameter...

arXiv - Machine Learning · 4 min · about 1 month ago

Machine Learning

[2507.05992] Exploring Partial Multi-Label Learning via Integrating Semantic Co-occurrence Knowledge

This paper presents SCINet, a novel framework for partial multi-label learning that integrates semantic co-occurrence knowledge to improv...

arXiv - AI · 4 min · about 1 month ago

Llms

[2506.17337] Can Generalist Vision Language Models (VLMs) Rival Specialist Medical VLMs? Benchmarking and Strategic Insights

This study evaluates the performance of generalist Vision Language Models (VLMs) compared to specialist medical VLMs, revealing that gene...

arXiv - AI · 3 min · about 1 month ago

Machine Learning

[2602.01289] Gradient-Aligned Calibration for Post-Training Quantization of Diffusion Models

The paper presents a novel method for post-training quantization (PTQ) of diffusion models, addressing inefficiencies in existing calibra...

arXiv - Machine Learning · 4 min · about 1 month ago

Machine Learning

[2505.06595] Feature Representation Transferring to Lightweight Models via Perception Coherence

This paper introduces a novel method for transferring feature representations from larger teacher models to lightweight student models us...

arXiv - Machine Learning · 4 min · about 1 month ago

Machine Learning

[2503.23377] JavisDiT: Joint Audio-Video Diffusion Transformer with Hierarchical Spatio-Temporal Prior Synchronization

The paper presents JavisDiT, a novel Joint Audio-Video Diffusion Transformer that enhances synchronized audio-video generation through a ...

arXiv - AI · 4 min · about 1 month ago

Machine Learning

[2503.21258] Learn by Reasoning: Analogical Weight Generation for Few-Shot Class-Incremental Learning

This paper presents a novel approach to Few-Shot Class-Incremental Learning (FSCIL) using an analogical generative method, enhancing mode...

arXiv - AI · 4 min · about 1 month ago

Previous Page 20 Next

Stay updated with AI News

Get the latest news, tools, and insights delivered to your inbox.

Subscribe to Newsletter

Daily or weekly digest • Unsubscribe anytime

Computer Vision

Top This Week

[2602.09678] Administrative Law's Fourth Settlement: AI and the Capability-Accountability Trap

[2601.13622] CARPE: Context-Aware Image Representation Prioritization via Ensemble for Large Vision-Language Models

[2603.26551] Beyond MACs: Hardware Efficient Architecture Design for Vision Backbones

All Content

[2511.16175] Mantis: A Versatile Vision-Language-Action Model with Disentangled Visual Foresight

[2511.02860] AI-driven Large-scale Electron Microscopy enables Whole-tissue Subcellular Digitization

[2510.06820] Efficient Discriminative Joint Encoders for Large Scale Vision-Language Reranking

[2509.26287] Flower: A Flow-Matching Solver for Inverse Problems

[2510.14979] From Pixels to Words -- Towards Native Vision-Language Primitives at Scale

[2510.02240] RewardMap: Tackling Sparse Rewards in Fine-grained Visual Reasoning via Multi-Stage Reinforcement Learning

[2505.17779] U2-BENCH: Benchmarking Large Vision-Language Models on Ultrasound Understanding

[2509.24526] CMT: Mid-Training for Efficient Learning of Consistency, Mean Flow, and Flow Map Models

[2503.07853] Hier-COS: Making Deep Features Hierarchy-aware via Composition of Orthogonal Subspaces

[2501.06336] MEt3R: Measuring Multi-View Consistency in Generated Images

[2509.21628] Comparing and Integrating Different Notions of Representational Correspondence in Neural Systems

[2301.00201] Exploring Singularities in point clouds with the graph Laplacian: An explicit approach

[2507.20174] LRR-Bench: Left, Right or Rotate? Vision-Language models Still Struggle With Spatial Understanding Tasks

[2507.10846] Winsor-CAM: Human-Tunable Visual Explanations from Deep Networks via Layer-Wise Winsorization

[2507.05992] Exploring Partial Multi-Label Learning via Integrating Semantic Co-occurrence Knowledge

[2506.17337] Can Generalist Vision Language Models (VLMs) Rival Specialist Medical VLMs? Benchmarking and Strategic Insights

[2602.01289] Gradient-Aligned Calibration for Post-Training Quantization of Diffusion Models

[2505.06595] Feature Representation Transferring to Lightweight Models via Perception Coherence

[2503.23377] JavisDiT: Joint Audio-Video Diffusion Transformer with Hierarchical Spatio-Temporal Prior Synchronization

[2503.21258] Learn by Reasoning: Analogical Weight Generation for Few-Shot Class-Incremental Learning

Related Topics

Stay updated with AI News