Computer Vision

Image recognition, detection, and visual AI

Top This Week

[2506.22504] Patch2Loc: Learning to Localize Patches for Unsupervised Brain Lesion Detection
Machine Learning

[2506.22504] Patch2Loc: Learning to Localize Patches for Unsupervised Brain Lesion Detection

Abstract page for arXiv paper 2506.22504: Patch2Loc: Learning to Localize Patches for Unsupervised Brain Lesion Detection

arXiv - Machine Learning · 4 min ·
[2508.00307] Acoustic Imaging for Low-SNR UAV Detection: Dense Beamformed Energy Maps and U-Net SELD
Machine Learning

[2508.00307] Acoustic Imaging for Low-SNR UAV Detection: Dense Beamformed Energy Maps and U-Net SELD

Abstract page for arXiv paper 2508.00307: Acoustic Imaging for Low-SNR UAV Detection: Dense Beamformed Energy Maps and U-Net SELD

arXiv - AI · 4 min ·
[2603.25524] CHIRP dataset: towards long-term, individual-level, behavioral monitoring of bird populations in the wild
Computer Vision

[2603.25524] CHIRP dataset: towards long-term, individual-level, behavioral monitoring of bird populations in the wild

Abstract page for arXiv paper 2603.25524: CHIRP dataset: towards long-term, individual-level, behavioral monitoring of bird populations i...

arXiv - AI · 4 min ·

All Content

[2512.09069] KD-OCT: Efficient Knowledge Distillation for Clinical-Grade Retinal OCT Classification
Machine Learning

[2512.09069] KD-OCT: Efficient Knowledge Distillation for Clinical-Grade Retinal OCT Classification

The paper presents KD-OCT, a novel knowledge distillation framework that enhances the efficiency of deep learning models for classifying ...

arXiv - Machine Learning · 4 min ·
[2511.00062] World Simulation with Video Foundation Models for Physical AI
Llms

[2511.00062] World Simulation with Video Foundation Models for Physical AI

The paper presents Cosmos-Predict2.5, an advanced model for world simulation in Physical AI, integrating various generation methods and i...

arXiv - Machine Learning · 5 min ·
[2509.24072] Uncovering Grounding IDs: How External Cues Shape Multimodal Binding
Llms

[2509.24072] Uncovering Grounding IDs: How External Cues Shape Multimodal Binding

This article explores the concept of Grounding IDs, which are latent identifiers that enhance multimodal binding in large vision-language...

arXiv - AI · 4 min ·
[2506.01085] Learning What Matters: Prioritized Concept Learning via Relative Error-driven Sample Selection
Llms

[2506.01085] Learning What Matters: Prioritized Concept Learning via Relative Error-driven Sample Selection

The paper presents PROGRESS, a framework for prioritized concept learning in vision-language models, enabling efficient sample selection ...

arXiv - AI · 4 min ·
[2411.06657] Renaissance: Investigating the Pretraining of Vision-Language Encoders
Machine Learning

[2411.06657] Renaissance: Investigating the Pretraining of Vision-Language Encoders

The paper 'Renaissance' explores the pretraining of vision-language encoders, addressing best practices and introducing a flexible evalua...

arXiv - Machine Learning · 4 min ·
[2406.17115] Measuring the Measurers: Quality Evaluation of Hallucination Benchmarks for Large Vision-Language Models
Llms

[2406.17115] Measuring the Measurers: Quality Evaluation of Hallucination Benchmarks for Large Vision-Language Models

This article evaluates the quality of hallucination benchmarks for Large Vision-Language Models (LVLMs) and introduces a new framework fo...

arXiv - AI · 4 min ·
[2310.17167] Improving Denoising Diffusion Models via Simultaneous Estimation of Image and Noise
Machine Learning

[2310.17167] Improving Denoising Diffusion Models via Simultaneous Estimation of Image and Noise

This paper presents advancements in denoising diffusion models, focusing on simultaneous estimation of image and noise to enhance image g...

arXiv - Machine Learning · 4 min ·
[2509.01350] Error Notebook-Guided, Training-Free Part Retrieval in 3D CAD Assemblies via Vision-Language Models
Llms

[2509.01350] Error Notebook-Guided, Training-Free Part Retrieval in 3D CAD Assemblies via Vision-Language Models

The paper presents a novel framework for part retrieval in 3D CAD assemblies using vision-language models, emphasizing training-free meth...

arXiv - AI · 4 min ·
[2507.14899] InsightX Agent: An LMM-based Agentic Framework with Integrated Tools for Reliable X-ray NDT Analysis
Ai Agents

[2507.14899] InsightX Agent: An LMM-based Agentic Framework with Integrated Tools for Reliable X-ray NDT Analysis

The paper presents InsightX Agent, an LMM-based framework that enhances X-ray non-destructive testing (NDT) by improving reliability, int...

arXiv - AI · 4 min ·
[2602.22197] Off-The-Shelf Image-to-Image Models Are All You Need To Defeat Image Protection Schemes
Machine Learning

[2602.22197] Off-The-Shelf Image-to-Image Models Are All You Need To Defeat Image Protection Schemes

This paper demonstrates that off-the-shelf image-to-image models can effectively defeat various image protection schemes, highlighting a ...

arXiv - AI · 4 min ·
[2602.22144] NoLan: Mitigating Object Hallucinations in Large Vision-Language Models via Dynamic Suppression of Language Priors
Llms

[2602.22144] NoLan: Mitigating Object Hallucinations in Large Vision-Language Models via Dynamic Suppression of Language Priors

The paper presents NoLan, a framework aimed at reducing object hallucinations in Large Vision-Language Models (LVLMs) by dynamically supp...

arXiv - AI · 4 min ·
[2602.22026] RGB-Event HyperGraph Prompt for Kilometer Marker Recognition based on Pre-trained Foundation Models
Llms

[2602.22026] RGB-Event HyperGraph Prompt for Kilometer Marker Recognition based on Pre-trained Foundation Models

This article presents a novel approach to Kilometer Marker Recognition (KMR) using RGB-event cameras, enhancing visual perception for aut...

arXiv - AI · 3 min ·
[2602.21987] PatchDenoiser: Parameter-efficient multi-scale patch learning and fusion denoiser for medical images
Machine Learning

[2602.21987] PatchDenoiser: Parameter-efficient multi-scale patch learning and fusion denoiser for medical images

PatchDenoiser introduces a lightweight, multi-scale denoising framework for medical images, effectively reducing noise while preserving a...

arXiv - AI · 4 min ·
[2602.21935] A Framework for Cross-Domain Generalization in Coronary Artery Calcium Scoring Across Gated and Non-Gated Computed Tomography
Machine Learning

[2602.21935] A Framework for Cross-Domain Generalization in Coronary Artery Calcium Scoring Across Gated and Non-Gated Computed Tomography

This article presents a framework for coronary artery calcium (CAC) scoring that generalizes across gated and non-gated CT scans, enhanci...

arXiv - AI · 4 min ·
[2602.21855] Understanding Annotation Error Propagation and Learning an Adaptive Policy for Expert Intervention in Barrett's Video Segmentation
Machine Learning

[2602.21855] Understanding Annotation Error Propagation and Learning an Adaptive Policy for Expert Intervention in Barrett's Video Segmentation

This article explores the challenges of annotation error propagation in endoscopic video segmentation, proposing a framework for optimizi...

arXiv - AI · 3 min ·
[2602.21829] StoryMovie: A Dataset for Semantic Alignment of Visual Stories with Movie Scripts and Subtitles
Machine Learning

[2602.21829] StoryMovie: A Dataset for Semantic Alignment of Visual Stories with Movie Scripts and Subtitles

The paper introduces StoryMovie, a dataset designed for aligning visual stories with movie scripts and subtitles, enhancing dialogue attr...

arXiv - AI · 3 min ·
[2602.21819] SemVideo: Reconstructs What You Watch from Brain Activity via Hierarchical Semantic Guidance
Machine Learning

[2602.21819] SemVideo: Reconstructs What You Watch from Brain Activity via Hierarchical Semantic Guidance

The paper presents SemVideo, a novel framework that reconstructs videos from brain activity using hierarchical semantic guidance, address...

arXiv - AI · 4 min ·
[2602.21779] Beyond Static Artifacts: A Forensic Benchmark for Video Deepfake Reasoning in Vision Language Models
Llms

[2602.21779] Beyond Static Artifacts: A Forensic Benchmark for Video Deepfake Reasoning in Vision Language Models

This paper introduces a forensic benchmark for evaluating video deepfake reasoning in vision-language models, focusing on temporal incons...

arXiv - AI · 4 min ·
[2602.21706] SurGo-R1: Benchmarking and Modeling Contextual Reasoning for Operative Zone in Surgical Video
Machine Learning

[2602.21706] SurGo-R1: Benchmarking and Modeling Contextual Reasoning for Operative Zone in Surgical Video

The paper presents SurGo-R1, a model designed to enhance contextual reasoning in surgical video analysis, addressing challenges in identi...

arXiv - AI · 4 min ·
[2602.21657] Following the Diagnostic Trace: Visual Cognition-guided Cooperative Network for Chest X-Ray Diagnosis
Machine Learning

[2602.21657] Following the Diagnostic Trace: Visual Cognition-guided Cooperative Network for Chest X-Ray Diagnosis

The paper presents VCC-Net, a visual cognition-guided cooperative network aimed at enhancing chest X-ray diagnosis through improved human...

arXiv - AI · 4 min ·
Previous Page 14 Next

Related Topics

Stay updated with AI News

Get the latest news, tools, and insights delivered to your inbox.

Daily or weekly digest • Unsubscribe anytime