Computer Vision

Image recognition, detection, and visual AI

Top This Week

[2506.22504] Patch2Loc: Learning to Localize Patches for Unsupervised Brain Lesion Detection
Machine Learning

[2506.22504] Patch2Loc: Learning to Localize Patches for Unsupervised Brain Lesion Detection

Abstract page for arXiv paper 2506.22504: Patch2Loc: Learning to Localize Patches for Unsupervised Brain Lesion Detection

arXiv - Machine Learning · 4 min ·
[2508.00307] Acoustic Imaging for Low-SNR UAV Detection: Dense Beamformed Energy Maps and U-Net SELD
Machine Learning

[2508.00307] Acoustic Imaging for Low-SNR UAV Detection: Dense Beamformed Energy Maps and U-Net SELD

Abstract page for arXiv paper 2508.00307: Acoustic Imaging for Low-SNR UAV Detection: Dense Beamformed Energy Maps and U-Net SELD

arXiv - AI · 4 min ·
[2603.25524] CHIRP dataset: towards long-term, individual-level, behavioral monitoring of bird populations in the wild
Computer Vision

[2603.25524] CHIRP dataset: towards long-term, individual-level, behavioral monitoring of bird populations in the wild

Abstract page for arXiv paper 2603.25524: CHIRP dataset: towards long-term, individual-level, behavioral monitoring of bird populations i...

arXiv - AI · 4 min ·

All Content

[2510.09736] Chlorophyll-a Mapping and Prediction in the Mar Menor Lagoon Using C2RCC-Processed Sentinel 2 Imagery
Data Science

[2510.09736] Chlorophyll-a Mapping and Prediction in the Mar Menor Lagoon Using C2RCC-Processed Sentinel 2 Imagery

This study presents a methodology for mapping and predicting chlorophyll-a levels in the Mar Menor Lagoon using C2RCC-processed Sentinel ...

arXiv - AI · 4 min ·
[2510.06868] Multi-hop Deep Joint Source-Channel Coding with Deep Hash Distillation for Semantically Aligned Image Recovery
Machine Learning

[2510.06868] Multi-hop Deep Joint Source-Channel Coding with Deep Hash Distillation for Semantically Aligned Image Recovery

This paper presents a novel approach to image transmission using multi-hop deep joint source-channel coding (DeepJSCC) combined with deep...

arXiv - Machine Learning · 3 min ·
[2510.00037] On Robustness of Vision-Language-Action Model against Multi-Modal Perturbations
Machine Learning

[2510.00037] On Robustness of Vision-Language-Action Model against Multi-Modal Perturbations

This paper evaluates the robustness of Vision-Language-Action (VLA) models against various multi-modal perturbations, proposing a new met...

arXiv - AI · 4 min ·
[2509.25774] PCPO: Proportionate Credit Policy Optimization for Aligning Image Generation Models
Machine Learning

[2509.25774] PCPO: Proportionate Credit Policy Optimization for Aligning Image Generation Models

The paper introduces Proportionate Credit Policy Optimization (PCPO), a novel framework aimed at improving the stability and quality of t...

arXiv - Machine Learning · 3 min ·
[2508.06878] Seeing Through the Noise: Improving Infrared Small Target Detection and Segmentation from Noise Suppression Perspective
Computer Vision

[2508.06878] Seeing Through the Noise: Improving Infrared Small Target Detection and Segmentation from Noise Suppression Perspective

This paper presents a novel approach to infrared small target detection and segmentation (IRSTDS) by introducing a noise-suppression feat...

arXiv - AI · 4 min ·
[2506.14856] Peering into the Unknown: Active View Selection with Neural Uncertainty Maps for 3D Reconstruction
Computer Vision

[2506.14856] Peering into the Unknown: Active View Selection with Neural Uncertainty Maps for 3D Reconstruction

This article presents a novel approach to active view selection (AVS) for 3D reconstruction using neural uncertainty maps, significantly ...

arXiv - AI · 4 min ·
[2505.17645] HoloLLM: Multisensory Foundation Model for Language-Grounded Human Sensing and Reasoning
Llms

[2505.17645] HoloLLM: Multisensory Foundation Model for Language-Grounded Human Sensing and Reasoning

HoloLLM introduces a Multimodal Large Language Model that enhances human sensing and reasoning by integrating diverse sensory inputs, out...

arXiv - Machine Learning · 4 min ·
[2504.13647] An Efficient LiDAR-Camera Fusion Network for Multi-Class 3D Dynamic Object Detection and Trajectory Prediction
Computer Vision

[2504.13647] An Efficient LiDAR-Camera Fusion Network for Multi-Class 3D Dynamic Object Detection and Trajectory Prediction

The paper presents a novel LiDAR-camera fusion framework for real-time 3D dynamic object detection and trajectory prediction, enhancing s...

arXiv - AI · 4 min ·
[2502.17457] MoEMba: A Mamba-based Mixture of Experts for High-Density EMG-based Hand Gesture Recognition
Machine Learning

[2502.17457] MoEMba: A Mamba-based Mixture of Experts for High-Density EMG-based Hand Gesture Recognition

The paper presents MoEMba, a novel framework utilizing Mamba-based Mixture of Experts for enhancing high-density EMG-based hand gesture r...

arXiv - Machine Learning · 4 min ·
[2502.17028] Distributional Vision-Language Alignment by Cauchy-Schwarz Divergence
Nlp

[2502.17028] Distributional Vision-Language Alignment by Cauchy-Schwarz Divergence

The paper presents CS-Aligner, a novel framework for vision-language alignment that integrates Cauchy-Schwarz divergence with mutual info...

arXiv - Machine Learning · 4 min ·
[2602.21178] XMorph: Explainable Brain Tumor Analysis Via LLM-Assisted Hybrid Deep Intelligence
Llms

[2602.21178] XMorph: Explainable Brain Tumor Analysis Via LLM-Assisted Hybrid Deep Intelligence

XMorph presents a novel framework for explainable brain tumor analysis, achieving 96% accuracy while addressing interpretability and comp...

arXiv - AI · 3 min ·
[2602.21054] VAUQ: Vision-Aware Uncertainty Quantification for LVLM Self-Evaluation
Llms

[2602.21054] VAUQ: Vision-Aware Uncertainty Quantification for LVLM Self-Evaluation

The paper introduces VAUQ, a framework for vision-aware uncertainty quantification in large vision-language models (LVLMs), enhancing sel...

arXiv - AI · 3 min ·
[2602.21033] MIP Candy: A Modular PyTorch Framework for Medical Image Processing
Machine Learning

[2602.21033] MIP Candy: A Modular PyTorch Framework for Medical Image Processing

MIP Candy is a modular framework built on PyTorch for medical image processing, offering a flexible pipeline for data handling, training,...

arXiv - Machine Learning · 4 min ·
[2602.20980] CrystaL: Spontaneous Emergence of Visual Latents in MLLMs
Llms

[2602.20980] CrystaL: Spontaneous Emergence of Visual Latents in MLLMs

The paper presents CrystaL, a novel framework for Multimodal Large Language Models (MLLMs) that enhances visual understanding by crystall...

arXiv - AI · 3 min ·
[2602.20981] Echoes Over Time: Unlocking Length Generalization in Video-to-Audio Generation Models
Machine Learning

[2602.20981] Echoes Over Time: Unlocking Length Generalization in Video-to-Audio Generation Models

This paper presents MMHNet, a novel multimodal hierarchical network that enhances video-to-audio generation by enabling models to general...

arXiv - AI · 4 min ·
[2602.20994] Multimodal MRI Report Findings Supervised Brain Lesion Segmentation with Substructures
Computer Vision

[2602.20994] Multimodal MRI Report Findings Supervised Brain Lesion Segmentation with Substructures

This paper presents a novel approach to brain lesion segmentation in MRI scans using report-supervised learning, enhancing accuracy by in...

arXiv - Machine Learning · 4 min ·
[2602.20958] EKF-Based Depth Camera and Deep Learning Fusion for UAV-Person Distance Estimation and Following in SAR Operations
Machine Learning

[2602.20958] EKF-Based Depth Camera and Deep Learning Fusion for UAV-Person Distance Estimation and Following in SAR Operations

This paper presents a novel system that integrates depth camera measurements and deep learning for accurate distance estimation in UAV-as...

arXiv - AI · 4 min ·
[2602.20951] See and Fix the Flaws: Enabling VLMs and Diffusion Models to Comprehend Visual Artifacts via Agentic Data Synthesis
Machine Learning

[2602.20951] See and Fix the Flaws: Enabling VLMs and Diffusion Models to Comprehend Visual Artifacts via Agentic Data Synthesis

This paper presents ArtiAgent, a novel approach to automate the creation of artifact-annotated datasets for training visual language mode...

arXiv - AI · 4 min ·
[2602.20924] Airavat: An Agentic Framework for Internet Measurement
Computer Vision

[2602.20924] Airavat: An Agentic Framework for Internet Measurement

Airavat introduces an innovative framework for automating Internet measurement workflows, ensuring both generation and verification again...

arXiv - AI · 3 min ·
[2602.20752] OrthoDiffusion: A Generalizable Multi-Task Diffusion Foundation Model for Musculoskeletal MRI Interpretation
Llms

[2602.20752] OrthoDiffusion: A Generalizable Multi-Task Diffusion Foundation Model for Musculoskeletal MRI Interpretation

OrthoDiffusion is a novel diffusion-based model designed for multi-task interpretation of musculoskeletal MRI scans, improving diagnostic...

arXiv - AI · 4 min ·
Previous Page 17 Next

Related Topics

Stay updated with AI News

Get the latest news, tools, and insights delivered to your inbox.

Daily or weekly digest • Unsubscribe anytime