Computer Vision

Image recognition, detection, and visual AI

Top This Week

[2506.22504] Patch2Loc: Learning to Localize Patches for Unsupervised Brain Lesion Detection
Machine Learning

[2506.22504] Patch2Loc: Learning to Localize Patches for Unsupervised Brain Lesion Detection

Abstract page for arXiv paper 2506.22504: Patch2Loc: Learning to Localize Patches for Unsupervised Brain Lesion Detection

arXiv - Machine Learning · 4 min ·
[2508.00307] Acoustic Imaging for Low-SNR UAV Detection: Dense Beamformed Energy Maps and U-Net SELD
Machine Learning

[2508.00307] Acoustic Imaging for Low-SNR UAV Detection: Dense Beamformed Energy Maps and U-Net SELD

Abstract page for arXiv paper 2508.00307: Acoustic Imaging for Low-SNR UAV Detection: Dense Beamformed Energy Maps and U-Net SELD

arXiv - AI · 4 min ·
[2603.25524] CHIRP dataset: towards long-term, individual-level, behavioral monitoring of bird populations in the wild
Computer Vision

[2603.25524] CHIRP dataset: towards long-term, individual-level, behavioral monitoring of bird populations in the wild

Abstract page for arXiv paper 2603.25524: CHIRP dataset: towards long-term, individual-level, behavioral monitoring of bird populations i...

arXiv - AI · 4 min ·

All Content

[2602.22376] AeroDGS: Physically Consistent Dynamic Gaussian Splatting for Single-Sequence Aerial 4D Reconstruction
Machine Learning

[2602.22376] AeroDGS: Physically Consistent Dynamic Gaussian Splatting for Single-Sequence Aerial 4D Reconstruction

AeroDGS presents a novel framework for 4D reconstruction from monocular UAV videos, addressing challenges in depth ambiguity and motion e...

arXiv - AI · 4 min ·
[2602.22275] Deep Accurate Solver for the Geodesic Problem
Nlp

[2602.22275] Deep Accurate Solver for the Geodesic Problem

This article presents a novel deep learning approach for accurately solving the geodesic problem on continuous surfaces, achieving third-...

arXiv - Machine Learning · 4 min ·
[2602.22347] Enabling clinical use of foundation models in histopathology
Llms

[2602.22347] Enabling clinical use of foundation models in histopathology

This article discusses the application of foundation models in histopathology, highlighting a novel approach that improves robustness and...

arXiv - AI · 4 min ·
[2602.22279] Learning to reconstruct from saturated data: audio declipping and high-dynamic range imaging
Machine Learning

[2602.22279] Learning to reconstruct from saturated data: audio declipping and high-dynamic range imaging

This paper presents a novel approach to reconstruct audio and images from clipped measurements using self-supervised learning, addressing...

arXiv - AI · 3 min ·
[2602.23353] SOTAlign: Semi-Supervised Alignment of Unimodal Vision and Language Models via Optimal Transport
Llms

[2602.23353] SOTAlign: Semi-Supervised Alignment of Unimodal Vision and Language Models via Optimal Transport

The paper introduces SOTAlign, a semi-supervised framework for aligning unimodal vision and language models using minimal paired data and...

arXiv - AI · 4 min ·
[2602.22263] CryoNet.Refine: A One-step Diffusion Model for Rapid Refinement of Structural Models with Cryo-EM Density Map Restraints
Machine Learning

[2602.22263] CryoNet.Refine: A One-step Diffusion Model for Rapid Refinement of Structural Models with Cryo-EM Density Map Restraints

CryoNet.Refine introduces a one-step diffusion model for efficiently refining structural models using cryo-EM density maps, offering a si...

arXiv - AI · 4 min ·
[2602.22235] Unsupervised Denoising of Diffusion-Weighted Images with Bias and Variance Corrected Noise Modeling
Machine Learning

[2602.22235] Unsupervised Denoising of Diffusion-Weighted Images with Bias and Variance Corrected Noise Modeling

This article presents a novel approach for unsupervised denoising of diffusion-weighted images (dMRI) by addressing noise bias and varian...

arXiv - AI · 4 min ·
[2602.23060] RhythmBERT: A Self-Supervised Language Model Based on Latent Representations of ECG Waveforms for Heart Disease Detection
Llms

[2602.23060] RhythmBERT: A Self-Supervised Language Model Based on Latent Representations of ECG Waveforms for Heart Disease Detection

RhythmBERT is a novel self-supervised language model designed for ECG waveform analysis, enhancing heart disease detection by treating EC...

arXiv - Machine Learning · 4 min ·
[2602.23276] CXReasonAgent: Evidence-Grounded Diagnostic Reasoning Agent for Chest X-rays
Llms

[2602.23276] CXReasonAgent: Evidence-Grounded Diagnostic Reasoning Agent for Chest X-rays

The CXReasonAgent integrates large language models with diagnostic tools for improved reasoning in chest X-ray interpretations, addressin...

arXiv - AI · 3 min ·
[2602.22794] Doubly Adaptive Channel and Spatial Attention for Semantic Image Communication by IoT Devices
Machine Learning

[2602.22794] Doubly Adaptive Channel and Spatial Attention for Semantic Image Communication by IoT Devices

This paper presents a novel approach to semantic image communication in IoT networks using a doubly adaptive channel and spatial attentio...

arXiv - Machine Learning · 4 min ·
[2602.22703] Enhancing Geometric Perception in VLMs via Translator-Guided Reinforcement Learning
Llms

[2602.22703] Enhancing Geometric Perception in VLMs via Translator-Guided Reinforcement Learning

The paper presents GeoPerceive, a benchmark for evaluating geometric perception in vision-language models (VLMs), and introduces GeoDPO, ...

arXiv - Machine Learning · 4 min ·
[2602.22968] Certified Circuits: Stability Guarantees for Mechanistic Circuits
Machine Learning

[2602.22968] Certified Circuits: Stability Guarantees for Mechanistic Circuits

The paper introduces Certified Circuits, a framework that enhances the stability and accuracy of circuit discovery in neural networks, ad...

arXiv - AI · 3 min ·
[2602.22963] FactGuard: Agentic Video Misinformation Detection via Reinforcement Learning
Llms

[2602.22963] FactGuard: Agentic Video Misinformation Detection via Reinforcement Learning

FactGuard introduces an innovative framework for detecting video misinformation using reinforcement learning, enhancing the capabilities ...

arXiv - AI · 3 min ·
[2602.22592] pQuant: Towards Effective Low-Bit Language Models via Decoupled Linear Quantization-Aware Training
Llms

[2602.22592] pQuant: Towards Effective Low-Bit Language Models via Decoupled Linear Quantization-Aware Training

The paper presents pQuant, a novel approach for low-bit language models that utilizes decoupled linear quantization-aware training to enh...

arXiv - Machine Learning · 3 min ·
[2602.22537] LUMOS: Democratizing SciML Workflows with L0-Regularized Learning for Unified Feature and Parameter Adaptation
Machine Learning

[2602.22537] LUMOS: Democratizing SciML Workflows with L0-Regularized Learning for Unified Feature and Parameter Adaptation

LUMOS introduces an innovative framework for scientific machine learning (SciML) that simplifies model design by integrating feature sele...

arXiv - Machine Learning · 3 min ·
[2602.22507] Space Syntax-guided Post-training for Residential Floor Plan Generation
Machine Learning

[2602.22507] Space Syntax-guided Post-training for Residential Floor Plan Generation

This paper introduces Space Syntax-guided Post-training (SSPT) for enhancing residential floor plan generation by integrating architectur...

arXiv - Machine Learning · 4 min ·
[2602.22284] BrepCoder: A Unified Multimodal Large Language Model for Multi-task B-rep Reasoning
Llms

[2602.22284] BrepCoder: A Unified Multimodal Large Language Model for Multi-task B-rep Reasoning

BrepCoder is a unified multimodal large language model designed for multi-task reasoning in Computer-Aided Design (CAD), specifically uti...

arXiv - Machine Learning · 3 min ·
[2602.22265] Entropy-Controlled Flow Matching
Generative Ai

[2602.22265] Entropy-Controlled Flow Matching

The paper introduces Entropy-Controlled Flow Matching (ECFM), a method that optimizes flow matching in machine learning by controlling in...

arXiv - Machine Learning · 3 min ·
Google launches Nano Banana 2 model with faster image generation | TechCrunch
Llms

Google launches Nano Banana 2 model with faster image generation | TechCrunch

Google has launched the Nano Banana 2 model, enhancing image generation capabilities with faster processing and improved realism, now def...

TechCrunch - AI · 5 min ·
Google’s Nano Banana 2 brings advanced AI image tools to free users | The Verge
Llms

Google’s Nano Banana 2 brings advanced AI image tools to free users | The Verge

Google's Nano Banana 2 introduces advanced AI image generation tools to free users, enhancing capabilities previously exclusive to paid s...

The Verge - AI · 5 min ·
Previous Page 12 Next

Related Topics

Stay updated with AI News

Get the latest news, tools, and insights delivered to your inbox.

Daily or weekly digest • Unsubscribe anytime