Computer Vision

Image recognition, detection, and visual AI

This Week's Best | Monthly Best | Guide | Trending

Top This Week

Machine Learning

[2506.22504] Patch2Loc: Learning to Localize Patches for Unsupervised Brain Lesion Detection

Abstract page for arXiv paper 2506.22504: Patch2Loc: Learning to Localize Patches for Unsupervised Brain Lesion Detection

arXiv - Machine Learning · 4 min · 3 days ago

Machine Learning

[2508.00307] Acoustic Imaging for Low-SNR UAV Detection: Dense Beamformed Energy Maps and U-Net SELD

Abstract page for arXiv paper 2508.00307: Acoustic Imaging for Low-SNR UAV Detection: Dense Beamformed Energy Maps and U-Net SELD

arXiv - AI · 4 min · 3 days ago

Computer Vision

[2603.25524] CHIRP dataset: towards long-term, individual-level, behavioral monitoring of bird populations in the wild

Abstract page for arXiv paper 2603.25524: CHIRP dataset: towards long-term, individual-level, behavioral monitoring of bird populations i...

arXiv - AI · 4 min · 3 days ago

All Content

Machine Learning

[2512.09069] KD-OCT: Efficient Knowledge Distillation for Clinical-Grade Retinal OCT Classification

The paper presents KD-OCT, a novel knowledge distillation framework that enhances the efficiency of deep learning models for classifying ...

arXiv - Machine Learning · 4 min · about 1 month ago

Llms

[2511.00062] World Simulation with Video Foundation Models for Physical AI

The paper presents Cosmos-Predict2.5, an advanced model for world simulation in Physical AI, integrating various generation methods and i...

arXiv - Machine Learning · 5 min · about 1 month ago

Llms

[2509.24072] Uncovering Grounding IDs: How External Cues Shape Multimodal Binding

This article explores the concept of Grounding IDs, which are latent identifiers that enhance multimodal binding in large vision-language...

arXiv - AI · 4 min · about 1 month ago

Llms

[2506.01085] Learning What Matters: Prioritized Concept Learning via Relative Error-driven Sample Selection

The paper presents PROGRESS, a framework for prioritized concept learning in vision-language models, enabling efficient sample selection ...

arXiv - AI · 4 min · about 1 month ago

Machine Learning

[2411.06657] Renaissance: Investigating the Pretraining of Vision-Language Encoders

The paper 'Renaissance' explores the pretraining of vision-language encoders, addressing best practices and introducing a flexible evalua...

arXiv - Machine Learning · 4 min · about 1 month ago

Llms

[2406.17115] Measuring the Measurers: Quality Evaluation of Hallucination Benchmarks for Large Vision-Language Models

This article evaluates the quality of hallucination benchmarks for Large Vision-Language Models (LVLMs) and introduces a new framework fo...

arXiv - AI · 4 min · about 1 month ago

Machine Learning

[2310.17167] Improving Denoising Diffusion Models via Simultaneous Estimation of Image and Noise

This paper presents advancements in denoising diffusion models, focusing on simultaneous estimation of image and noise to enhance image g...

arXiv - Machine Learning · 4 min · about 1 month ago

Llms

[2509.01350] Error Notebook-Guided, Training-Free Part Retrieval in 3D CAD Assemblies via Vision-Language Models

The paper presents a novel framework for part retrieval in 3D CAD assemblies using vision-language models, emphasizing training-free meth...

arXiv - AI · 4 min · about 1 month ago

Ai Agents

[2507.14899] InsightX Agent: An LMM-based Agentic Framework with Integrated Tools for Reliable X-ray NDT Analysis

The paper presents InsightX Agent, an LMM-based framework that enhances X-ray non-destructive testing (NDT) by improving reliability, int...

arXiv - AI · 4 min · about 1 month ago

Machine Learning

[2602.22197] Off-The-Shelf Image-to-Image Models Are All You Need To Defeat Image Protection Schemes

This paper demonstrates that off-the-shelf image-to-image models can effectively defeat various image protection schemes, highlighting a ...

arXiv - AI · 4 min · about 1 month ago

Llms

[2602.22144] NoLan: Mitigating Object Hallucinations in Large Vision-Language Models via Dynamic Suppression of Language Priors

The paper presents NoLan, a framework aimed at reducing object hallucinations in Large Vision-Language Models (LVLMs) by dynamically supp...

arXiv - AI · 4 min · about 1 month ago

Llms

[2602.22026] RGB-Event HyperGraph Prompt for Kilometer Marker Recognition based on Pre-trained Foundation Models

This article presents a novel approach to Kilometer Marker Recognition (KMR) using RGB-event cameras, enhancing visual perception for aut...

arXiv - AI · 3 min · about 1 month ago

Machine Learning

[2602.21987] PatchDenoiser: Parameter-efficient multi-scale patch learning and fusion denoiser for medical images

PatchDenoiser introduces a lightweight, multi-scale denoising framework for medical images, effectively reducing noise while preserving a...

arXiv - AI · 4 min · about 1 month ago

Machine Learning

[2602.21935] A Framework for Cross-Domain Generalization in Coronary Artery Calcium Scoring Across Gated and Non-Gated Computed Tomography

This article presents a framework for coronary artery calcium (CAC) scoring that generalizes across gated and non-gated CT scans, enhanci...

arXiv - AI · 4 min · about 1 month ago

Machine Learning

[2602.21855] Understanding Annotation Error Propagation and Learning an Adaptive Policy for Expert Intervention in Barrett's Video Segmentation

This article explores the challenges of annotation error propagation in endoscopic video segmentation, proposing a framework for optimizi...

arXiv - AI · 3 min · about 1 month ago

Machine Learning

[2602.21829] StoryMovie: A Dataset for Semantic Alignment of Visual Stories with Movie Scripts and Subtitles

The paper introduces StoryMovie, a dataset designed for aligning visual stories with movie scripts and subtitles, enhancing dialogue attr...

arXiv - AI · 3 min · about 1 month ago

Machine Learning

[2602.21819] SemVideo: Reconstructs What You Watch from Brain Activity via Hierarchical Semantic Guidance

The paper presents SemVideo, a novel framework that reconstructs videos from brain activity using hierarchical semantic guidance, address...

arXiv - AI · 4 min · about 1 month ago

Llms

[2602.21779] Beyond Static Artifacts: A Forensic Benchmark for Video Deepfake Reasoning in Vision Language Models

This paper introduces a forensic benchmark for evaluating video deepfake reasoning in vision-language models, focusing on temporal incons...

arXiv - AI · 4 min · about 1 month ago

Machine Learning

[2602.21706] SurGo-R1: Benchmarking and Modeling Contextual Reasoning for Operative Zone in Surgical Video

The paper presents SurGo-R1, a model designed to enhance contextual reasoning in surgical video analysis, addressing challenges in identi...

arXiv - AI · 4 min · about 1 month ago

Machine Learning

[2602.21657] Following the Diagnostic Trace: Visual Cognition-guided Cooperative Network for Chest X-Ray Diagnosis

The paper presents VCC-Net, a visual cognition-guided cooperative network aimed at enhancing chest X-ray diagnosis through improved human...

arXiv - AI · 4 min · about 1 month ago

Previous Page 14 Next

Stay updated with AI News

Get the latest news, tools, and insights delivered to your inbox.

Subscribe to Newsletter

Daily or weekly digest • Unsubscribe anytime

Computer Vision

Top This Week

[2506.22504] Patch2Loc: Learning to Localize Patches for Unsupervised Brain Lesion Detection

[2508.00307] Acoustic Imaging for Low-SNR UAV Detection: Dense Beamformed Energy Maps and U-Net SELD

[2603.25524] CHIRP dataset: towards long-term, individual-level, behavioral monitoring of bird populations in the wild

All Content

[2512.09069] KD-OCT: Efficient Knowledge Distillation for Clinical-Grade Retinal OCT Classification

[2511.00062] World Simulation with Video Foundation Models for Physical AI

[2509.24072] Uncovering Grounding IDs: How External Cues Shape Multimodal Binding

[2506.01085] Learning What Matters: Prioritized Concept Learning via Relative Error-driven Sample Selection

[2411.06657] Renaissance: Investigating the Pretraining of Vision-Language Encoders

[2406.17115] Measuring the Measurers: Quality Evaluation of Hallucination Benchmarks for Large Vision-Language Models

[2310.17167] Improving Denoising Diffusion Models via Simultaneous Estimation of Image and Noise

[2509.01350] Error Notebook-Guided, Training-Free Part Retrieval in 3D CAD Assemblies via Vision-Language Models

[2507.14899] InsightX Agent: An LMM-based Agentic Framework with Integrated Tools for Reliable X-ray NDT Analysis

[2602.22197] Off-The-Shelf Image-to-Image Models Are All You Need To Defeat Image Protection Schemes

[2602.22144] NoLan: Mitigating Object Hallucinations in Large Vision-Language Models via Dynamic Suppression of Language Priors

[2602.22026] RGB-Event HyperGraph Prompt for Kilometer Marker Recognition based on Pre-trained Foundation Models

[2602.21987] PatchDenoiser: Parameter-efficient multi-scale patch learning and fusion denoiser for medical images

[2602.21935] A Framework for Cross-Domain Generalization in Coronary Artery Calcium Scoring Across Gated and Non-Gated Computed Tomography

[2602.21855] Understanding Annotation Error Propagation and Learning an Adaptive Policy for Expert Intervention in Barrett's Video Segmentation

[2602.21829] StoryMovie: A Dataset for Semantic Alignment of Visual Stories with Movie Scripts and Subtitles

[2602.21819] SemVideo: Reconstructs What You Watch from Brain Activity via Hierarchical Semantic Guidance

[2602.21779] Beyond Static Artifacts: A Forensic Benchmark for Video Deepfake Reasoning in Vision Language Models

[2602.21706] SurGo-R1: Benchmarking and Modeling Contextual Reasoning for Operative Zone in Surgical Video

[2602.21657] Following the Diagnostic Trace: Visual Cognition-guided Cooperative Network for Chest X-Ray Diagnosis

Related Topics

Stay updated with AI News