Computer Vision

Image recognition, detection, and visual AI

This Week's Best | Monthly Best | Guide | Trending

Top This Week

Machine Learning

[2506.22504] Patch2Loc: Learning to Localize Patches for Unsupervised Brain Lesion Detection

Abstract page for arXiv paper 2506.22504: Patch2Loc: Learning to Localize Patches for Unsupervised Brain Lesion Detection

arXiv - Machine Learning · 4 min · 3 days ago

Machine Learning

[2508.00307] Acoustic Imaging for Low-SNR UAV Detection: Dense Beamformed Energy Maps and U-Net SELD

Abstract page for arXiv paper 2508.00307: Acoustic Imaging for Low-SNR UAV Detection: Dense Beamformed Energy Maps and U-Net SELD

arXiv - AI · 4 min · 3 days ago

Computer Vision

[2603.25524] CHIRP dataset: towards long-term, individual-level, behavioral monitoring of bird populations in the wild

Abstract page for arXiv paper 2603.25524: CHIRP dataset: towards long-term, individual-level, behavioral monitoring of bird populations i...

arXiv - AI · 4 min · 3 days ago

All Content

Machine Learning

[2602.22376] AeroDGS: Physically Consistent Dynamic Gaussian Splatting for Single-Sequence Aerial 4D Reconstruction

AeroDGS presents a novel framework for 4D reconstruction from monocular UAV videos, addressing challenges in depth ambiguity and motion e...

arXiv - AI · 4 min · about 1 month ago

Nlp

[2602.22275] Deep Accurate Solver for the Geodesic Problem

This article presents a novel deep learning approach for accurately solving the geodesic problem on continuous surfaces, achieving third-...

arXiv - Machine Learning · 4 min · about 1 month ago

Llms

[2602.22347] Enabling clinical use of foundation models in histopathology

This article discusses the application of foundation models in histopathology, highlighting a novel approach that improves robustness and...

arXiv - AI · 4 min · about 1 month ago

Machine Learning

[2602.22279] Learning to reconstruct from saturated data: audio declipping and high-dynamic range imaging

This paper presents a novel approach to reconstruct audio and images from clipped measurements using self-supervised learning, addressing...

arXiv - AI · 3 min · about 1 month ago

Llms

[2602.23353] SOTAlign: Semi-Supervised Alignment of Unimodal Vision and Language Models via Optimal Transport

The paper introduces SOTAlign, a semi-supervised framework for aligning unimodal vision and language models using minimal paired data and...

arXiv - AI · 4 min · about 1 month ago

Machine Learning

[2602.22263] CryoNet.Refine: A One-step Diffusion Model for Rapid Refinement of Structural Models with Cryo-EM Density Map Restraints

CryoNet.Refine introduces a one-step diffusion model for efficiently refining structural models using cryo-EM density maps, offering a si...

arXiv - AI · 4 min · about 1 month ago

Machine Learning

[2602.22235] Unsupervised Denoising of Diffusion-Weighted Images with Bias and Variance Corrected Noise Modeling

This article presents a novel approach for unsupervised denoising of diffusion-weighted images (dMRI) by addressing noise bias and varian...

arXiv - AI · 4 min · about 1 month ago

Llms

[2602.23060] RhythmBERT: A Self-Supervised Language Model Based on Latent Representations of ECG Waveforms for Heart Disease Detection

RhythmBERT is a novel self-supervised language model designed for ECG waveform analysis, enhancing heart disease detection by treating EC...

arXiv - Machine Learning · 4 min · about 1 month ago

Llms

[2602.23276] CXReasonAgent: Evidence-Grounded Diagnostic Reasoning Agent for Chest X-rays

The CXReasonAgent integrates large language models with diagnostic tools for improved reasoning in chest X-ray interpretations, addressin...

arXiv - AI · 3 min · about 1 month ago

Machine Learning

[2602.22794] Doubly Adaptive Channel and Spatial Attention for Semantic Image Communication by IoT Devices

This paper presents a novel approach to semantic image communication in IoT networks using a doubly adaptive channel and spatial attentio...

arXiv - Machine Learning · 4 min · about 1 month ago

Llms

[2602.22703] Enhancing Geometric Perception in VLMs via Translator-Guided Reinforcement Learning

The paper presents GeoPerceive, a benchmark for evaluating geometric perception in vision-language models (VLMs), and introduces GeoDPO, ...

arXiv - Machine Learning · 4 min · about 1 month ago

Machine Learning

[2602.22968] Certified Circuits: Stability Guarantees for Mechanistic Circuits

The paper introduces Certified Circuits, a framework that enhances the stability and accuracy of circuit discovery in neural networks, ad...

arXiv - AI · 3 min · about 1 month ago

Llms

[2602.22963] FactGuard: Agentic Video Misinformation Detection via Reinforcement Learning

FactGuard introduces an innovative framework for detecting video misinformation using reinforcement learning, enhancing the capabilities ...

arXiv - AI · 3 min · about 1 month ago

Llms

[2602.22592] pQuant: Towards Effective Low-Bit Language Models via Decoupled Linear Quantization-Aware Training

The paper presents pQuant, a novel approach for low-bit language models that utilizes decoupled linear quantization-aware training to enh...

arXiv - Machine Learning · 3 min · about 1 month ago

Machine Learning

[2602.22537] LUMOS: Democratizing SciML Workflows with L0-Regularized Learning for Unified Feature and Parameter Adaptation

LUMOS introduces an innovative framework for scientific machine learning (SciML) that simplifies model design by integrating feature sele...

arXiv - Machine Learning · 3 min · about 1 month ago

Machine Learning

[2602.22507] Space Syntax-guided Post-training for Residential Floor Plan Generation

This paper introduces Space Syntax-guided Post-training (SSPT) for enhancing residential floor plan generation by integrating architectur...

arXiv - Machine Learning · 4 min · about 1 month ago

Llms

[2602.22284] BrepCoder: A Unified Multimodal Large Language Model for Multi-task B-rep Reasoning

BrepCoder is a unified multimodal large language model designed for multi-task reasoning in Computer-Aided Design (CAD), specifically uti...

arXiv - Machine Learning · 3 min · about 1 month ago

Generative Ai

[2602.22265] Entropy-Controlled Flow Matching

The paper introduces Entropy-Controlled Flow Matching (ECFM), a method that optimizes flow matching in machine learning by controlling in...

arXiv - Machine Learning · 3 min · about 1 month ago

Llms

Google launches Nano Banana 2 model with faster image generation | TechCrunch

Google has launched the Nano Banana 2 model, enhancing image generation capabilities with faster processing and improved realism, now def...

TechCrunch - AI · 5 min · about 1 month ago

Llms

Google’s Nano Banana 2 brings advanced AI image tools to free users | The Verge

Google's Nano Banana 2 introduces advanced AI image generation tools to free users, enhancing capabilities previously exclusive to paid s...

The Verge - AI · 5 min · about 1 month ago

Previous Page 12 Next

Stay updated with AI News

Get the latest news, tools, and insights delivered to your inbox.

Subscribe to Newsletter

Daily or weekly digest • Unsubscribe anytime

Computer Vision

Top This Week

[2506.22504] Patch2Loc: Learning to Localize Patches for Unsupervised Brain Lesion Detection

[2508.00307] Acoustic Imaging for Low-SNR UAV Detection: Dense Beamformed Energy Maps and U-Net SELD

[2603.25524] CHIRP dataset: towards long-term, individual-level, behavioral monitoring of bird populations in the wild

All Content

[2602.22376] AeroDGS: Physically Consistent Dynamic Gaussian Splatting for Single-Sequence Aerial 4D Reconstruction

[2602.22275] Deep Accurate Solver for the Geodesic Problem

[2602.22347] Enabling clinical use of foundation models in histopathology

[2602.22279] Learning to reconstruct from saturated data: audio declipping and high-dynamic range imaging

[2602.23353] SOTAlign: Semi-Supervised Alignment of Unimodal Vision and Language Models via Optimal Transport

[2602.22263] CryoNet.Refine: A One-step Diffusion Model for Rapid Refinement of Structural Models with Cryo-EM Density Map Restraints

[2602.22235] Unsupervised Denoising of Diffusion-Weighted Images with Bias and Variance Corrected Noise Modeling

[2602.23060] RhythmBERT: A Self-Supervised Language Model Based on Latent Representations of ECG Waveforms for Heart Disease Detection

[2602.23276] CXReasonAgent: Evidence-Grounded Diagnostic Reasoning Agent for Chest X-rays

[2602.22794] Doubly Adaptive Channel and Spatial Attention for Semantic Image Communication by IoT Devices

[2602.22703] Enhancing Geometric Perception in VLMs via Translator-Guided Reinforcement Learning

[2602.22968] Certified Circuits: Stability Guarantees for Mechanistic Circuits

[2602.22963] FactGuard: Agentic Video Misinformation Detection via Reinforcement Learning

[2602.22592] pQuant: Towards Effective Low-Bit Language Models via Decoupled Linear Quantization-Aware Training

[2602.22537] LUMOS: Democratizing SciML Workflows with L0-Regularized Learning for Unified Feature and Parameter Adaptation

[2602.22507] Space Syntax-guided Post-training for Residential Floor Plan Generation

[2602.22284] BrepCoder: A Unified Multimodal Large Language Model for Multi-task B-rep Reasoning

[2602.22265] Entropy-Controlled Flow Matching

Google launches Nano Banana 2 model with faster image generation | TechCrunch

Google’s Nano Banana 2 brings advanced AI image tools to free users | The Verge

Related Topics

Stay updated with AI News