Computer Vision

Image recognition, detection, and visual AI

Top This Week

[2602.09678] Administrative Law's Fourth Settlement: AI and the Capability-Accountability Trap
Computer Vision

[2602.09678] Administrative Law's Fourth Settlement: AI and the Capability-Accountability Trap

Abstract page for arXiv paper 2602.09678: Administrative Law's Fourth Settlement: AI and the Capability-Accountability Trap

arXiv - AI · 4 min ·
[2601.13622] CARPE: Context-Aware Image Representation Prioritization via Ensemble for Large Vision-Language Models
Llms

[2601.13622] CARPE: Context-Aware Image Representation Prioritization via Ensemble for Large Vision-Language Models

Abstract page for arXiv paper 2601.13622: CARPE: Context-Aware Image Representation Prioritization via Ensemble for Large Vision-Language...

arXiv - AI · 3 min ·
[2603.26551] Beyond MACs: Hardware Efficient Architecture Design for Vision Backbones
Computer Vision

[2603.26551] Beyond MACs: Hardware Efficient Architecture Design for Vision Backbones

Abstract page for arXiv paper 2603.26551: Beyond MACs: Hardware Efficient Architecture Design for Vision Backbones

arXiv - AI · 4 min ·

All Content

[2602.18728] Phase-Consistent Magnetic Spectral Learning for Multi-View Clustering
Nlp

[2602.18728] Phase-Consistent Magnetic Spectral Learning for Multi-View Clustering

This article presents a novel approach to unsupervised multi-view clustering through Phase-Consistent Magnetic Spectral Learning, address...

arXiv - Machine Learning · 4 min ·
[2602.18639] Learning Invariant Visual Representations for Planning with Joint-Embedding Predictive World Models
Machine Learning

[2602.18639] Learning Invariant Visual Representations for Planning with Joint-Embedding Predictive World Models

This paper presents a novel approach to improving the robustness of latent predictive world models in machine learning by addressing the ...

arXiv - Machine Learning · 4 min ·
[2602.18981] How Far Can We Go with Pixels Alone? A Pilot Study on Screen-Only Navigation in Commercial 3D ARPGs
Computer Vision

[2602.18981] How Far Can We Go with Pixels Alone? A Pilot Study on Screen-Only Navigation in Commercial 3D ARPGs

This study explores the effectiveness of screen-only navigation in 3D ARPGs, demonstrating how visual affordances can guide gameplay, whi...

arXiv - AI · 4 min ·
[2602.18528] Audio-Visual Continual Test-Time Adaptation without Forgetting
Machine Learning

[2602.18528] Audio-Visual Continual Test-Time Adaptation without Forgetting

The paper presents a novel method, AV-CTTA, for audio-visual continual test-time adaptation that minimizes catastrophic forgetting while ...

arXiv - Machine Learning · 4 min ·
[2602.18519] Wide Open Gazes: Quantifying Visual Exploratory Behavior in Soccer with Pose Enhanced Positional Data
Ai Safety

[2602.18519] Wide Open Gazes: Quantifying Visual Exploratory Behavior in Soccer with Pose Enhanced Positional Data

This paper presents a novel approach to quantifying visual exploratory behavior in soccer using pose-enhanced positional data, addressing...

arXiv - Machine Learning · 4 min ·
Deploying Open Source Vision Language Models (VLM) on Jetson
Llms

Deploying Open Source Vision Language Models (VLM) on Jetson

This article provides a comprehensive guide on deploying Open Source Vision Language Models (VLMs) on NVIDIA Jetson devices, detailing th...

Hugging Face Blog · 8 min ·
If Big Tech cared about fighting AI slop, we wouldn’t be drowning in it | The Verge
Ai Safety

If Big Tech cared about fighting AI slop, we wouldn’t be drowning in it | The Verge

The Verge critiques Big Tech's inadequate efforts in combating AI-generated misinformation, highlighting the shortcomings of the C2PA sys...

The Verge - AI · 13 min ·
Machine Learning

[D] How to convert ONNX into xmodel/tmodel for deploying on PL?

The article discusses the challenges of converting ONNX models into xmodel/tmodel formats for deployment, specifically highlighting issue...

Reddit - Machine Learning · 1 min ·
Machine Learning

[D] CVPR results shock due to impressive score drop since reviews

The CVPR results reveal a significant score drop for a submission, highlighting the impact of reviewer feedback and the importance of adh...

Reddit - Machine Learning · 1 min ·
[2502.17160] A Pragmatic Note on Evaluating Generative Models with Fréchet Inception Distance for Retinal Image Synthesis
Machine Learning

[2502.17160] A Pragmatic Note on Evaluating Generative Models with Fréchet Inception Distance for Retinal Image Synthesis

This article discusses the limitations of using Fréchet Inception Distance (FID) as an evaluation metric for generative models in retinal...

arXiv - Machine Learning · 4 min ·
[2602.04587] VILLAIN at AVerImaTeC: Verifying Image-Text Claims via Multi-Agent Collaboration
Llms

[2602.04587] VILLAIN at AVerImaTeC: Verifying Image-Text Claims via Multi-Agent Collaboration

The paper presents VILLAIN, a multimodal fact-checking system that verifies image-text claims through collaborative agents, achieving top...

arXiv - AI · 3 min ·
[2602.00288] TimeBlind: A Spatio-Temporal Compositionality Benchmark for Video LLMs
Llms

[2602.00288] TimeBlind: A Spatio-Temporal Compositionality Benchmark for Video LLMs

The paper presents TimeBlind, a benchmark designed to evaluate the spatio-temporal understanding of video Large Language Models (LLMs), h...

arXiv - AI · 4 min ·
[2602.02437] UniReason 1.0: A Unified Reasoning Framework for World Knowledge Aligned Image Generation and Editing
Machine Learning

[2602.02437] UniReason 1.0: A Unified Reasoning Framework for World Knowledge Aligned Image Generation and Editing

UniReason 1.0 presents a unified framework for image generation and editing, integrating textual reasoning and visual refinement to enhan...

arXiv - AI · 4 min ·
[2602.01844] CloDS: Visual-Only Unsupervised Cloth Dynamics Learning in Unknown Conditions
Machine Learning

[2602.01844] CloDS: Visual-Only Unsupervised Cloth Dynamics Learning in Unknown Conditions

The paper presents CloDS, an unsupervised learning framework for cloth dynamics using visual data, addressing limitations of existing met...

arXiv - AI · 4 min ·
[2602.04908] Temporal Pair Consistency for Variance-Reduced Flow Matching
Machine Learning

[2602.04908] Temporal Pair Consistency for Variance-Reduced Flow Matching

The paper introduces Temporal Pair Consistency (TPC), a novel approach to reduce variance in flow matching for continuous-time generative...

arXiv - Machine Learning · 3 min ·
[2510.06170] Smartphone-based iris recognition through high-quality visible-spectrum iris image capture.V2
Computer Vision

[2510.06170] Smartphone-based iris recognition through high-quality visible-spectrum iris image capture.V2

This paper presents a smartphone-based iris recognition system using visible-spectrum imaging, demonstrating high accuracy through a cust...

arXiv - AI · 4 min ·
[2509.00479] A Novel Method to Determine Total Oxidant Concentration Produced by Non-Thermal Plasma Based on Image Processing and Machine Learning
Machine Learning

[2509.00479] A Novel Method to Determine Total Oxidant Concentration Produced by Non-Thermal Plasma Based on Image Processing and Machine Learning

This article presents a novel method for accurately determining total oxidant concentration in non-thermal plasma systems using image pro...

arXiv - Machine Learning · 4 min ·
[2507.18031] ViGText: Deepfake Image Detection with Vision-Language Model Explanations and Graph Neural Networks
Llms

[2507.18031] ViGText: Deepfake Image Detection with Vision-Language Model Explanations and Graph Neural Networks

ViGText introduces a novel approach to deepfake detection by integrating Vision-Language Model explanations with Graph Neural Networks, e...

arXiv - Machine Learning · 4 min ·
[2507.11551] Landmark Detection for Medical Images using a General-purpose Segmentation Model
Machine Learning

[2507.11551] Landmark Detection for Medical Images using a General-purpose Segmentation Model

The paper presents a novel approach to anatomical landmark detection in medical images by combining YOLO and SAM models, enhancing segmen...

arXiv - AI · 4 min ·
[2506.15316] J3DAI: A tiny DNN-Based Edge AI Accelerator for 3D-Stacked CMOS Image Sensor
Machine Learning

[2506.15316] J3DAI: A tiny DNN-Based Edge AI Accelerator for 3D-Stacked CMOS Image Sensor

The paper presents J3DAI, a compact DNN-based hardware accelerator designed for 3D-stacked CMOS image sensors, emphasizing its efficiency...

arXiv - AI · 4 min ·
Previous Page 26 Next

Related Topics

Stay updated with AI News

Get the latest news, tools, and insights delivered to your inbox.

Daily or weekly digest • Unsubscribe anytime