Computer Vision

Image recognition, detection, and visual AI

Top This Week

[2506.22504] Patch2Loc: Learning to Localize Patches for Unsupervised Brain Lesion Detection
Machine Learning

[2506.22504] Patch2Loc: Learning to Localize Patches for Unsupervised Brain Lesion Detection

Abstract page for arXiv paper 2506.22504: Patch2Loc: Learning to Localize Patches for Unsupervised Brain Lesion Detection

arXiv - Machine Learning · 4 min ·
[2508.00307] Acoustic Imaging for Low-SNR UAV Detection: Dense Beamformed Energy Maps and U-Net SELD
Machine Learning

[2508.00307] Acoustic Imaging for Low-SNR UAV Detection: Dense Beamformed Energy Maps and U-Net SELD

Abstract page for arXiv paper 2508.00307: Acoustic Imaging for Low-SNR UAV Detection: Dense Beamformed Energy Maps and U-Net SELD

arXiv - AI · 4 min ·
[2603.25524] CHIRP dataset: towards long-term, individual-level, behavioral monitoring of bird populations in the wild
Computer Vision

[2603.25524] CHIRP dataset: towards long-term, individual-level, behavioral monitoring of bird populations in the wild

Abstract page for arXiv paper 2603.25524: CHIRP dataset: towards long-term, individual-level, behavioral monitoring of bird populations i...

arXiv - AI · 4 min ·

All Content

[2602.23192] FairQuant: Fairness-Aware Mixed-Precision Quantization for Medical Image Classification
Machine Learning

[2602.23192] FairQuant: Fairness-Aware Mixed-Precision Quantization for Medical Image Classification

The paper presents FairQuant, a framework for fairness-aware mixed-precision quantization in medical image classification, optimizing bot...

arXiv - Machine Learning · 3 min ·
[2602.22740] AMLRIS: Alignment-aware Masked Learning for Referring Image Segmentation
Machine Learning

[2602.22740] AMLRIS: Alignment-aware Masked Learning for Referring Image Segmentation

The paper presents AMLRIS, a novel training strategy for Referring Image Segmentation (RIS) that enhances object segmentation through ali...

arXiv - AI · 3 min ·
[2602.23013] SubspaceAD: Training-Free Few-Shot Anomaly Detection via Subspace Modeling
Llms

[2602.23013] SubspaceAD: Training-Free Few-Shot Anomaly Detection via Subspace Modeling

The paper introduces SubspaceAD, a training-free method for few-shot anomaly detection that utilizes subspace modeling to achieve state-o...

arXiv - Machine Learning · 4 min ·
[2602.22716] SoPE: Spherical Coordinate-Based Positional Embedding for Enhancing Spatial Perception of 3D LVLMs
Llms

[2602.22716] SoPE: Spherical Coordinate-Based Positional Embedding for Enhancing Spatial Perception of 3D LVLMs

The paper presents SoPE, a novel Spherical Coordinate-Based Positional Embedding method aimed at improving the spatial perception capabil...

arXiv - AI · 4 min ·
[2602.22938] pMoE: Prompting Diverse Experts Together Wins More in Visual Adaptation
Machine Learning

[2602.22938] pMoE: Prompting Diverse Experts Together Wins More in Visual Adaptation

The paper presents pMoE, a novel Mixture-of-Experts prompt tuning method that enhances visual adaptation by integrating diverse domain kn...

arXiv - Machine Learning · 4 min ·
[2602.22683] SUPERGLASSES: Benchmarking Vision Language Models as Intelligent Agents for AI Smart Glasses
Llms

[2602.22683] SUPERGLASSES: Benchmarking Vision Language Models as Intelligent Agents for AI Smart Glasses

The paper introduces SUPERGLASSES, a benchmark for evaluating Vision Language Models (VLMs) in AI smart glasses, addressing the limitatio...

arXiv - AI · 4 min ·
[2602.22678] ViCLIP-OT: The First Foundation Vision-Language Model for Vietnamese Image-Text Retrieval with Optimal Transport
Llms

[2602.22678] ViCLIP-OT: The First Foundation Vision-Language Model for Vietnamese Image-Text Retrieval with Optimal Transport

ViCLIP-OT introduces a novel vision-language model tailored for Vietnamese image-text retrieval, outperforming existing models in low-res...

arXiv - AI · 4 min ·
[2602.22624] Instruction-based Image Editing with Planning, Reasoning, and Generation
Llms

[2602.22624] Instruction-based Image Editing with Planning, Reasoning, and Generation

This paper presents a novel approach to instruction-based image editing by integrating planning, reasoning, and generation through a mult...

arXiv - AI · 4 min ·
[2602.22621] CGSA: Class-Guided Slot-Aware Adaptation for Source-Free Object Detection
Computer Vision

[2602.22621] CGSA: Class-Guided Slot-Aware Adaptation for Source-Free Object Detection

The paper presents CGSA, a novel framework for Source-Free Domain Adaptive Object Detection that integrates object-centric learning to en...

arXiv - AI · 3 min ·
[2602.22596] BetterScene: 3D Scene Synthesis with Representation-Aligned Generative Model
Machine Learning

[2602.22596] BetterScene: 3D Scene Synthesis with Representation-Aligned Generative Model

BetterScene introduces an innovative approach to 3D scene synthesis, enhancing novel view synthesis quality using sparse photos and a rep...

arXiv - AI · 4 min ·
[2602.22570] Guidance Matters: Rethinking the Evaluation Pitfall for Text-to-Image Generation
Machine Learning

[2602.22570] Guidance Matters: Rethinking the Evaluation Pitfall for Text-to-Image Generation

The paper discusses the evaluation challenges in text-to-image generation, focusing on classifier-free guidance (CFG) and proposing a new...

arXiv - AI · 4 min ·
[2602.22568] Quality-Aware Robust Multi-View Clustering for Heterogeneous Observation Noise
Machine Learning

[2602.22568] Quality-Aware Robust Multi-View Clustering for Heterogeneous Observation Noise

The paper presents Quality-Aware Robust Multi-View Clustering (QARMVC), a novel framework addressing the challenges of heterogeneous obse...

arXiv - AI · 4 min ·
[2602.22549] DrivePTS: A Progressive Learning Framework with Textual and Structural Enhancement for Driving Scene Generation
Machine Learning

[2602.22549] DrivePTS: A Progressive Learning Framework with Textual and Structural Enhancement for Driving Scene Generation

DrivePTS introduces a progressive learning framework for generating diverse driving scenes, enhancing fidelity and controllability in aut...

arXiv - AI · 4 min ·
[2602.22545] DisQ-HNet: A Disentangled Quantized Half-UNet for Interpretable Multimodal Image Synthesis Applications to Tau-PET Synthesis from T1 and FLAIR MRI
Computer Vision

[2602.22545] DisQ-HNet: A Disentangled Quantized Half-UNet for Interpretable Multimodal Image Synthesis Applications to Tau-PET Synthesis from T1 and FLAIR MRI

DisQ-HNet introduces a novel framework for synthesizing tau-PET images from MRI scans, enhancing interpretability and preserving anatomic...

arXiv - AI · 3 min ·
[2602.22544] HARU-Net: Hybrid Attention Residual U-Net for Edge-Preserving Denoising in Cone-Beam Computed Tomography
Machine Learning

[2602.22544] HARU-Net: Hybrid Attention Residual U-Net for Edge-Preserving Denoising in Cone-Beam Computed Tomography

HARU-Net introduces a novel deep learning architecture for denoising cone-beam computed tomography (CBCT) images, enhancing edge preserva...

arXiv - Machine Learning · 4 min ·
[2602.22514] SignVLA: A Gloss-Free Vision-Language-Action Framework for Real-Time Sign Language-Guided Robotic Manipulation
Robotics

[2602.22514] SignVLA: A Gloss-Free Vision-Language-Action Framework for Real-Time Sign Language-Guided Robotic Manipulation

The paper presents SignVLA, a novel gloss-free Vision-Language-Action framework for real-time robotic manipulation guided by sign languag...

arXiv - AI · 4 min ·
[2602.22469] Beyond Dominant Patches: Spatial Credit Redistribution For Grounded Vision-Language Models
Llms

[2602.22469] Beyond Dominant Patches: Spatial Credit Redistribution For Grounded Vision-Language Models

This paper introduces Spatial Credit Redistribution (SCR) to address hallucinations in vision-language models by redistributing activatio...

arXiv - AI · 4 min ·
[2602.22430] TopoEdit: Fast Post-Optimization Editing of Topology Optimized Structures
Machine Learning

[2602.22430] TopoEdit: Fast Post-Optimization Editing of Topology Optimized Structures

TopoEdit presents a novel approach for fast post-optimization editing of topology optimized structures, enhancing mechanical performance ...

arXiv - Machine Learning · 4 min ·
[2602.22426] SimpleOCR: Rendering Visualized Questions to Teach MLLMs to Read
Llms

[2602.22426] SimpleOCR: Rendering Visualized Questions to Teach MLLMs to Read

The paper introduces SimpleOCR, a method to enhance Multimodal Large Language Models (MLLMs) by rendering visualized questions, addressin...

arXiv - Machine Learning · 4 min ·
[2602.22381] Enhancing Renal Tumor Malignancy Prediction: Deep Learning with Automatic 3D CT Organ Focused Attention
Machine Learning

[2602.22381] Enhancing Renal Tumor Malignancy Prediction: Deep Learning with Automatic 3D CT Organ Focused Attention

This article presents a novel deep learning framework for predicting malignancy in renal tumors using 3D CT images, eliminating the need ...

arXiv - AI · 4 min ·
Previous Page 11 Next

Related Topics

Stay updated with AI News

Get the latest news, tools, and insights delivered to your inbox.

Daily or weekly digest • Unsubscribe anytime