Computer Vision

Image recognition, detection, and visual AI

Top This Week

[2602.09678] Administrative Law's Fourth Settlement: AI and the Capability-Accountability Trap
Computer Vision

[2602.09678] Administrative Law's Fourth Settlement: AI and the Capability-Accountability Trap

Abstract page for arXiv paper 2602.09678: Administrative Law's Fourth Settlement: AI and the Capability-Accountability Trap

arXiv - AI · 4 min ·
[2601.13622] CARPE: Context-Aware Image Representation Prioritization via Ensemble for Large Vision-Language Models
Llms

[2601.13622] CARPE: Context-Aware Image Representation Prioritization via Ensemble for Large Vision-Language Models

Abstract page for arXiv paper 2601.13622: CARPE: Context-Aware Image Representation Prioritization via Ensemble for Large Vision-Language...

arXiv - AI · 3 min ·
[2603.26551] Beyond MACs: Hardware Efficient Architecture Design for Vision Backbones
Computer Vision

[2603.26551] Beyond MACs: Hardware Efficient Architecture Design for Vision Backbones

Abstract page for arXiv paper 2603.26551: Beyond MACs: Hardware Efficient Architecture Design for Vision Backbones

arXiv - AI · 4 min ·

All Content

[2602.19385] Adaptive Data Augmentation with Multi-armed Bandit: Sample-Efficient Embedding Calibration for Implicit Pattern Recognition
Llms

[2602.19385] Adaptive Data Augmentation with Multi-armed Bandit: Sample-Efficient Embedding Calibration for Implicit Pattern Recognition

The paper presents ADAMAB, a novel framework for efficient embedding calibration in few-shot pattern recognition, leveraging adaptive dat...

arXiv - Machine Learning · 4 min ·
[2602.19536] Fore-Mamba3D: Mamba-based Foreground-Enhanced Encoding for 3D Object Detection
Machine Learning

[2602.19536] Fore-Mamba3D: Mamba-based Foreground-Enhanced Encoding for 3D Object Detection

The paper presents Fore-Mamba3D, a novel approach for 3D object detection that enhances foreground encoding while addressing limitations ...

arXiv - AI · 4 min ·
[2602.19359] Vid2Sid: Videos Can Help Close the Sim2Real Gap
Robotics

[2602.19359] Vid2Sid: Videos Can Help Close the Sim2Real Gap

The paper presents Vid2Sid, a novel video-driven system identification pipeline that enhances the calibration of robot simulators by anal...

arXiv - Machine Learning · 4 min ·
[2602.19357] MentalBlackboard: Evaluating Spatial Visualization via Mathematical Transformations
Llms

[2602.19357] MentalBlackboard: Evaluating Spatial Visualization via Mathematical Transformations

The paper 'MentalBlackboard' evaluates spatial visualization capabilities of Vision-Language Models (VLMs) through mathematical transform...

arXiv - Machine Learning · 3 min ·
[2602.19219] Controlled Face Manipulation and Synthesis for Data Augmentation
Machine Learning

[2602.19219] Controlled Face Manipulation and Synthesis for Data Augmentation

The paper presents a novel method for controlled face manipulation to augment data for facial expression analysis, addressing label scarc...

arXiv - Machine Learning · 4 min ·
[2602.19412] Redefining the Down-Sampling Scheme of U-Net for Precision Biomedical Image Segmentation
Computer Vision

[2602.19412] Redefining the Down-Sampling Scheme of U-Net for Precision Biomedical Image Segmentation

This paper presents a novel down-sampling strategy called Stair Pooling for U-Net architectures, aimed at enhancing precision in biomedic...

arXiv - AI · 4 min ·
[2602.19437] FinSight-Net:A Physics-Aware Decoupled Network with Frequency-Domain Compensation for Underwater Fish Detection in Smart Aquaculture
Computer Vision

[2602.19437] FinSight-Net:A Physics-Aware Decoupled Network with Frequency-Domain Compensation for Underwater Fish Detection in Smart Aquaculture

FinSight-Net introduces a physics-aware framework for underwater fish detection, improving accuracy while reducing computational overhead...

arXiv - AI · 4 min ·
[2602.19140] CaReFlow: Cyclic Adaptive Rectified Flow for Multimodal Fusion
Machine Learning

[2602.19140] CaReFlow: Cyclic Adaptive Rectified Flow for Multimodal Fusion

The paper presents CaReFlow, a novel approach for multimodal fusion that addresses modality gaps using cyclic adaptive rectified flow, en...

arXiv - Machine Learning · 4 min ·
[2602.19089] Ani3DHuman: Photorealistic 3D Human Animation with Self-guided Stochastic Sampling
Generative Ai

[2602.19089] Ani3DHuman: Photorealistic 3D Human Animation with Self-guided Stochastic Sampling

Ani3DHuman presents a novel framework for photorealistic 3D human animation, combining kinematics-based methods with video diffusion prio...

arXiv - Machine Learning · 4 min ·
[2602.19349] UP-Fuse: Uncertainty-guided LiDAR-Camera Fusion for 3D Panoptic Segmentation
Nlp

[2602.19349] UP-Fuse: Uncertainty-guided LiDAR-Camera Fusion for 3D Panoptic Segmentation

The paper presents UP-Fuse, an innovative framework for LiDAR-camera fusion that enhances 3D panoptic segmentation by addressing sensor d...

arXiv - AI · 4 min ·
[2602.19348] MultiDiffSense: Diffusion-Based Multi-Modal Visuo-Tactile Image Generation Conditioned on Object Shape and Contact Pose
Machine Learning

[2602.19348] MultiDiffSense: Diffusion-Based Multi-Modal Visuo-Tactile Image Generation Conditioned on Object Shape and Contact Pose

The paper presents MultiDiffSense, a diffusion-based model for generating visuo-tactile images conditioned on object shape and contact po...

arXiv - AI · 3 min ·
[2602.19005] GUIDE-US: Grade-Informed Unpaired Distillation of Encoder Knowledge from Histopathology to Micro-UltraSound
Machine Learning

[2602.19005] GUIDE-US: Grade-Informed Unpaired Distillation of Encoder Knowledge from Histopathology to Micro-UltraSound

The paper presents a novel method for non-invasive grading of prostate cancer using micro-ultrasound, leveraging knowledge distillation f...

arXiv - Machine Learning · 3 min ·
[2602.19324] RetinaVision: XAI-Driven Augmented Regulation for Precise Retinal Disease Classification using deep learning framework
Machine Learning

[2602.19324] RetinaVision: XAI-Driven Augmented Regulation for Precise Retinal Disease Classification using deep learning framework

The article presents RetinaVision, a deep learning framework for accurate classification of retinal diseases using optical coherence tomo...

arXiv - AI · 3 min ·
[2602.19322] US-JEPA: A Joint Embedding Predictive Architecture for Medical Ultrasound
Nlp

[2602.19322] US-JEPA: A Joint Embedding Predictive Architecture for Medical Ultrasound

The paper presents US-JEPA, a novel self-supervised framework for medical ultrasound imaging that enhances representation learning by pre...

arXiv - Machine Learning · 4 min ·
[2602.19314] IPv2: An Improved Image Purification Strategy for Real-World Ultra-Low-Dose Lung CT Denoising
Machine Learning

[2602.19314] IPv2: An Improved Image Purification Strategy for Real-World Ultra-Low-Dose Lung CT Denoising

The paper presents IPv2, an enhanced image purification strategy for improving lung CT denoising at ultra-low doses, addressing limitatio...

arXiv - AI · 4 min ·
[2602.18863] TIACam: Text-Anchored Invariant Feature Learning with Auto-Augmentation for Camera-Robust Zero-Watermarking
Computer Vision

[2602.18863] TIACam: Text-Anchored Invariant Feature Learning with Auto-Augmentation for Camera-Robust Zero-Watermarking

The paper presents TIACam, a novel framework for camera-robust zero-watermarking that utilizes text-anchored invariant feature learning w...

arXiv - Machine Learning · 3 min ·
[2602.19248] No Need For Real Anomaly: MLLM Empowered Zero-Shot Video Anomaly Detection
Llms

[2602.19248] No Need For Real Anomaly: MLLM Empowered Zero-Shot Video Anomaly Detection

The paper presents LAVIDA, a novel zero-shot video anomaly detection framework that utilizes a Multimodal Large Language Model to enhance...

arXiv - AI · 3 min ·
[2602.18726] WiCompass: Oracle-driven Data Scaling for mmWave Human Pose Estimation
Nlp

[2602.18726] WiCompass: Oracle-driven Data Scaling for mmWave Human Pose Estimation

The paper presents WiCompass, a framework for improving mmWave human pose estimation by focusing on data coverage rather than brute-force...

arXiv - Machine Learning · 3 min ·
[2602.19193] Visual Prompt Guided Unified Pushing Policy
Robotics

[2602.19193] Visual Prompt Guided Unified Pushing Policy

The paper presents a novel unified pushing policy that utilizes visual prompts to enhance the efficiency and versatility of robotic pushi...

arXiv - AI · 3 min ·
[2602.19190] FUSAR-GPT : A Spatiotemporal Feature-Embedded and Two-Stage Decoupled Visual Language Model for SAR Imagery
Llms

[2602.19190] FUSAR-GPT : A Spatiotemporal Feature-Embedded and Two-Stage Decoupled Visual Language Model for SAR Imagery

FUSAR-GPT is a novel visual language model designed for interpreting SAR imagery, enhancing performance through spatiotemporal feature em...

arXiv - AI · 4 min ·
Previous Page 23 Next

Related Topics

Stay updated with AI News

Get the latest news, tools, and insights delivered to your inbox.

Daily or weekly digest • Unsubscribe anytime