[2506.22504] Patch2Loc: Learning to Localize Patches for Unsupervised Brain Lesion Detection
Abstract page for arXiv paper 2506.22504: Patch2Loc: Learning to Localize Patches for Unsupervised Brain Lesion Detection
Image recognition, detection, and visual AI
Abstract page for arXiv paper 2506.22504: Patch2Loc: Learning to Localize Patches for Unsupervised Brain Lesion Detection
Abstract page for arXiv paper 2508.00307: Acoustic Imaging for Low-SNR UAV Detection: Dense Beamformed Energy Maps and U-Net SELD
Abstract page for arXiv paper 2603.25524: CHIRP dataset: towards long-term, individual-level, behavioral monitoring of bird populations i...
The paper presents KD-OCT, a novel knowledge distillation framework that enhances the efficiency of deep learning models for classifying ...
The paper presents Cosmos-Predict2.5, an advanced model for world simulation in Physical AI, integrating various generation methods and i...
This article explores the concept of Grounding IDs, which are latent identifiers that enhance multimodal binding in large vision-language...
The paper presents PROGRESS, a framework for prioritized concept learning in vision-language models, enabling efficient sample selection ...
The paper 'Renaissance' explores the pretraining of vision-language encoders, addressing best practices and introducing a flexible evalua...
This article evaluates the quality of hallucination benchmarks for Large Vision-Language Models (LVLMs) and introduces a new framework fo...
This paper presents advancements in denoising diffusion models, focusing on simultaneous estimation of image and noise to enhance image g...
The paper presents a novel framework for part retrieval in 3D CAD assemblies using vision-language models, emphasizing training-free meth...
The paper presents InsightX Agent, an LMM-based framework that enhances X-ray non-destructive testing (NDT) by improving reliability, int...
This paper demonstrates that off-the-shelf image-to-image models can effectively defeat various image protection schemes, highlighting a ...
The paper presents NoLan, a framework aimed at reducing object hallucinations in Large Vision-Language Models (LVLMs) by dynamically supp...
This article presents a novel approach to Kilometer Marker Recognition (KMR) using RGB-event cameras, enhancing visual perception for aut...
PatchDenoiser introduces a lightweight, multi-scale denoising framework for medical images, effectively reducing noise while preserving a...
This article presents a framework for coronary artery calcium (CAC) scoring that generalizes across gated and non-gated CT scans, enhanci...
This article explores the challenges of annotation error propagation in endoscopic video segmentation, proposing a framework for optimizi...
The paper introduces StoryMovie, a dataset designed for aligning visual stories with movie scripts and subtitles, enhancing dialogue attr...
The paper presents SemVideo, a novel framework that reconstructs videos from brain activity using hierarchical semantic guidance, address...
This paper introduces a forensic benchmark for evaluating video deepfake reasoning in vision-language models, focusing on temporal incons...
The paper presents SurGo-R1, a model designed to enhance contextual reasoning in surgical video analysis, addressing challenges in identi...
The paper presents VCC-Net, a visual cognition-guided cooperative network aimed at enhancing chest X-ray diagnosis through improved human...
Get the latest news, tools, and insights delivered to your inbox.
Daily or weekly digest • Unsubscribe anytime