Computer Vision

Image recognition, detection, and visual AI

This Week's Best | Monthly Best | Guide | Trending

Top This Week

Computer Vision

[2602.09678] Administrative Law's Fourth Settlement: AI and the Capability-Accountability Trap

Abstract page for arXiv paper 2602.09678: Administrative Law's Fourth Settlement: AI and the Capability-Accountability Trap

arXiv - AI · 4 min · about 10 hours ago

Llms

[2601.13622] CARPE: Context-Aware Image Representation Prioritization via Ensemble for Large Vision-Language Models

Abstract page for arXiv paper 2601.13622: CARPE: Context-Aware Image Representation Prioritization via Ensemble for Large Vision-Language...

arXiv - AI · 3 min · about 10 hours ago

Computer Vision

[2603.26551] Beyond MACs: Hardware Efficient Architecture Design for Vision Backbones

Abstract page for arXiv paper 2603.26551: Beyond MACs: Hardware Efficient Architecture Design for Vision Backbones

arXiv - AI · 4 min · about 10 hours ago

All Content

Nlp

[2602.18728] Phase-Consistent Magnetic Spectral Learning for Multi-View Clustering

This article presents a novel approach to unsupervised multi-view clustering through Phase-Consistent Magnetic Spectral Learning, address...

arXiv - Machine Learning · 4 min · about 1 month ago

Machine Learning

[2602.18639] Learning Invariant Visual Representations for Planning with Joint-Embedding Predictive World Models

This paper presents a novel approach to improving the robustness of latent predictive world models in machine learning by addressing the ...

arXiv - Machine Learning · 4 min · about 1 month ago

Computer Vision

[2602.18981] How Far Can We Go with Pixels Alone? A Pilot Study on Screen-Only Navigation in Commercial 3D ARPGs

This study explores the effectiveness of screen-only navigation in 3D ARPGs, demonstrating how visual affordances can guide gameplay, whi...

arXiv - AI · 4 min · about 1 month ago

Machine Learning

[2602.18528] Audio-Visual Continual Test-Time Adaptation without Forgetting

The paper presents a novel method, AV-CTTA, for audio-visual continual test-time adaptation that minimizes catastrophic forgetting while ...

arXiv - Machine Learning · 4 min · about 1 month ago

Ai Safety

[2602.18519] Wide Open Gazes: Quantifying Visual Exploratory Behavior in Soccer with Pose Enhanced Positional Data

This paper presents a novel approach to quantifying visual exploratory behavior in soccer using pose-enhanced positional data, addressing...

arXiv - Machine Learning · 4 min · about 1 month ago

Llms

Deploying Open Source Vision Language Models (VLM) on Jetson

This article provides a comprehensive guide on deploying Open Source Vision Language Models (VLMs) on NVIDIA Jetson devices, detailing th...

Hugging Face Blog · 8 min · about 1 month ago

Ai Safety

If Big Tech cared about fighting AI slop, we wouldn’t be drowning in it | The Verge

The Verge critiques Big Tech's inadequate efforts in combating AI-generated misinformation, highlighting the shortcomings of the C2PA sys...

The Verge - AI · 13 min · about 1 month ago

Machine Learning

[D] How to convert ONNX into xmodel/tmodel for deploying on PL?

The article discusses the challenges of converting ONNX models into xmodel/tmodel formats for deployment, specifically highlighting issue...

Reddit - Machine Learning · 1 min · about 1 month ago

Machine Learning

[D] CVPR results shock due to impressive score drop since reviews

The CVPR results reveal a significant score drop for a submission, highlighting the impact of reviewer feedback and the importance of adh...

Reddit - Machine Learning · 1 min · about 1 month ago

Machine Learning

[2502.17160] A Pragmatic Note on Evaluating Generative Models with Fréchet Inception Distance for Retinal Image Synthesis

This article discusses the limitations of using Fréchet Inception Distance (FID) as an evaluation metric for generative models in retinal...

arXiv - Machine Learning · 4 min · about 1 month ago

Llms

[2602.04587] VILLAIN at AVerImaTeC: Verifying Image-Text Claims via Multi-Agent Collaboration

The paper presents VILLAIN, a multimodal fact-checking system that verifies image-text claims through collaborative agents, achieving top...

arXiv - AI · 3 min · about 1 month ago

Llms

[2602.00288] TimeBlind: A Spatio-Temporal Compositionality Benchmark for Video LLMs

The paper presents TimeBlind, a benchmark designed to evaluate the spatio-temporal understanding of video Large Language Models (LLMs), h...

arXiv - AI · 4 min · about 1 month ago

Machine Learning

[2602.02437] UniReason 1.0: A Unified Reasoning Framework for World Knowledge Aligned Image Generation and Editing

UniReason 1.0 presents a unified framework for image generation and editing, integrating textual reasoning and visual refinement to enhan...

arXiv - AI · 4 min · about 1 month ago

Machine Learning

[2602.01844] CloDS: Visual-Only Unsupervised Cloth Dynamics Learning in Unknown Conditions

The paper presents CloDS, an unsupervised learning framework for cloth dynamics using visual data, addressing limitations of existing met...

arXiv - AI · 4 min · about 1 month ago

Machine Learning

[2602.04908] Temporal Pair Consistency for Variance-Reduced Flow Matching

The paper introduces Temporal Pair Consistency (TPC), a novel approach to reduce variance in flow matching for continuous-time generative...

arXiv - Machine Learning · 3 min · about 1 month ago

Computer Vision

[2510.06170] Smartphone-based iris recognition through high-quality visible-spectrum iris image capture.V2

This paper presents a smartphone-based iris recognition system using visible-spectrum imaging, demonstrating high accuracy through a cust...

arXiv - AI · 4 min · about 1 month ago

Machine Learning

[2509.00479] A Novel Method to Determine Total Oxidant Concentration Produced by Non-Thermal Plasma Based on Image Processing and Machine Learning

This article presents a novel method for accurately determining total oxidant concentration in non-thermal plasma systems using image pro...

arXiv - Machine Learning · 4 min · about 1 month ago

Llms

[2507.18031] ViGText: Deepfake Image Detection with Vision-Language Model Explanations and Graph Neural Networks

ViGText introduces a novel approach to deepfake detection by integrating Vision-Language Model explanations with Graph Neural Networks, e...

arXiv - Machine Learning · 4 min · about 1 month ago

Machine Learning

[2507.11551] Landmark Detection for Medical Images using a General-purpose Segmentation Model

The paper presents a novel approach to anatomical landmark detection in medical images by combining YOLO and SAM models, enhancing segmen...

arXiv - AI · 4 min · about 1 month ago

Machine Learning

[2506.15316] J3DAI: A tiny DNN-Based Edge AI Accelerator for 3D-Stacked CMOS Image Sensor

The paper presents J3DAI, a compact DNN-based hardware accelerator designed for 3D-stacked CMOS image sensors, emphasizing its efficiency...

arXiv - AI · 4 min · about 1 month ago

Previous Page 26 Next

Stay updated with AI News

Get the latest news, tools, and insights delivered to your inbox.

Subscribe to Newsletter

Daily or weekly digest • Unsubscribe anytime

Computer Vision

Top This Week

[2602.09678] Administrative Law's Fourth Settlement: AI and the Capability-Accountability Trap

[2601.13622] CARPE: Context-Aware Image Representation Prioritization via Ensemble for Large Vision-Language Models

[2603.26551] Beyond MACs: Hardware Efficient Architecture Design for Vision Backbones

All Content

[2602.18728] Phase-Consistent Magnetic Spectral Learning for Multi-View Clustering

[2602.18639] Learning Invariant Visual Representations for Planning with Joint-Embedding Predictive World Models

[2602.18981] How Far Can We Go with Pixels Alone? A Pilot Study on Screen-Only Navigation in Commercial 3D ARPGs

[2602.18528] Audio-Visual Continual Test-Time Adaptation without Forgetting

[2602.18519] Wide Open Gazes: Quantifying Visual Exploratory Behavior in Soccer with Pose Enhanced Positional Data

Deploying Open Source Vision Language Models (VLM) on Jetson

If Big Tech cared about fighting AI slop, we wouldn’t be drowning in it | The Verge

[D] How to convert ONNX into xmodel/tmodel for deploying on PL?

[D] CVPR results shock due to impressive score drop since reviews

[2502.17160] A Pragmatic Note on Evaluating Generative Models with Fréchet Inception Distance for Retinal Image Synthesis

[2602.04587] VILLAIN at AVerImaTeC: Verifying Image-Text Claims via Multi-Agent Collaboration

[2602.00288] TimeBlind: A Spatio-Temporal Compositionality Benchmark for Video LLMs

[2602.02437] UniReason 1.0: A Unified Reasoning Framework for World Knowledge Aligned Image Generation and Editing

[2602.01844] CloDS: Visual-Only Unsupervised Cloth Dynamics Learning in Unknown Conditions

[2602.04908] Temporal Pair Consistency for Variance-Reduced Flow Matching

[2510.06170] Smartphone-based iris recognition through high-quality visible-spectrum iris image capture.V2

[2509.00479] A Novel Method to Determine Total Oxidant Concentration Produced by Non-Thermal Plasma Based on Image Processing and Machine Learning

[2507.18031] ViGText: Deepfake Image Detection with Vision-Language Model Explanations and Graph Neural Networks

[2507.11551] Landmark Detection for Medical Images using a General-purpose Segmentation Model

[2506.15316] J3DAI: A tiny DNN-Based Edge AI Accelerator for 3D-Stacked CMOS Image Sensor

Related Topics

Stay updated with AI News