Computer Vision

Image recognition, detection, and visual AI

Top This Week

[2511.21428] From Observation to Action: Latent Action-based Primitive Segmentation for VLA Pre-training in Industrial Settings
Machine Learning

[2511.21428] From Observation to Action: Latent Action-based Primitive Segmentation for VLA Pre-training in Industrial Settings

Abstract page for arXiv paper 2511.21428: From Observation to Action: Latent Action-based Primitive Segmentation for VLA Pre-training in ...

arXiv - AI · 4 min ·
[2511.16719] SAM 3: Segment Anything with Concepts
Machine Learning

[2511.16719] SAM 3: Segment Anything with Concepts

Abstract page for arXiv paper 2511.16719: SAM 3: Segment Anything with Concepts

arXiv - AI · 4 min ·
[2603.28594] Detection of Adversarial Attacks in Robotic Perception
Machine Learning

[2603.28594] Detection of Adversarial Attacks in Robotic Perception

Abstract page for arXiv paper 2603.28594: Detection of Adversarial Attacks in Robotic Perception

arXiv - AI · 3 min ·

All Content

Generative Ai

best ai headshot generator – which one really works?

This article explores the effectiveness of various AI headshot generators, focusing on the author's experience with Headshot Kiwi, highli...

Reddit - Artificial Intelligence · 1 min ·
[2602.11858] Zooming without Zooming: Region-to-Image Distillation for Fine-Grained Multimodal Perception
Llms

[2602.11858] Zooming without Zooming: Region-to-Image Distillation for Fine-Grained Multimodal Perception

The paper presents Region-to-Image Distillation, a novel approach to enhance fine-grained multimodal perception in MLLMs by internalizing...

arXiv - AI · 4 min ·
[2602.11554] HyperDet: 3D Object Detection with Hyper 4D Radar Point Clouds
Computer Vision

[2602.11554] HyperDet: 3D Object Detection with Hyper 4D Radar Point Clouds

The paper presents HyperDet, a novel framework for 3D object detection using hyper 4D radar point clouds, addressing limitations of tradi...

arXiv - Machine Learning · 4 min ·
[2602.11575] ReaDy-Go: Real-to-Sim Dynamic 3D Gaussian Splatting Simulation for Environment-Specific Visual Navigation with Moving Obstacles
Machine Learning

[2602.11575] ReaDy-Go: Real-to-Sim Dynamic 3D Gaussian Splatting Simulation for Environment-Specific Visual Navigation with Moving Obstacles

The paper presents ReaDy-Go, a novel simulation pipeline that enhances visual navigation in dynamic environments by integrating 3D Gaussi...

arXiv - AI · 4 min ·
[2602.10551] C^2ROPE: Causal Continuous Rotary Positional Encoding for 3D Large Multimodal-Models Reasoning
Llms

[2602.10551] C^2ROPE: Causal Continuous Rotary Positional Encoding for 3D Large Multimodal-Models Reasoning

The paper presents C^2ROPE, an advanced positional encoding method for 3D Large Multimodal Models, addressing limitations of existing Rot...

arXiv - AI · 4 min ·
[2602.07047] ShapBPT: Image Feature Attributions Using Data-Aware Binary Partition Trees
Machine Learning

[2602.07047] ShapBPT: Image Feature Attributions Using Data-Aware Binary Partition Trees

The paper introduces ShapBPT, a novel method for image feature attributions using data-aware binary partition trees, enhancing interpreta...

arXiv - Machine Learning · 4 min ·
[2601.20336] Do Whitepaper Claims Predict Market Behavior? Evidence from Cryptocurrency Factor Analysis
Nlp

[2601.20336] Do Whitepaper Claims Predict Market Behavior? Evidence from Cryptocurrency Factor Analysis

This study examines the relationship between cryptocurrency whitepaper claims and actual market behavior, revealing weak predictive power...

arXiv - Machine Learning · 3 min ·
[2601.06793] CliffordNet: All You Need is Geometric Algebra
Machine Learning

[2601.06793] CliffordNet: All You Need is Geometric Algebra

CliffordNet proposes a novel approach to computer vision using Geometric Algebra, challenging traditional architectures by achieving high...

arXiv - Machine Learning · 4 min ·
[2602.01696] Cross-Modal Purification and Fusion for Small-Object RGB-D Transmission-Line Defect Detection
Ai Safety

[2602.01696] Cross-Modal Purification and Fusion for Small-Object RGB-D Transmission-Line Defect Detection

This paper presents CMAFNet, a novel network for detecting small defects in transmission lines using RGB-D data, achieving significant pe...

arXiv - AI · 4 min ·
[2601.23232] ShotFinder: Imagination-Driven Open-Domain Video Shot Retrieval via Web Search
Llms

[2601.23232] ShotFinder: Imagination-Driven Open-Domain Video Shot Retrieval via Web Search

ShotFinder introduces a novel benchmark for open-domain video shot retrieval, utilizing LLMs to enhance video search capabilities through...

arXiv - AI · 4 min ·
[2512.15774] Two-Step Data Augmentation for Masked Face Detection and Recognition: Turning Fake Masks to Real
Computer Vision

[2512.15774] Two-Step Data Augmentation for Masked Face Detection and Recognition: Turning Fake Masks to Real

This article presents a two-step data augmentation framework for improving masked face detection and recognition, addressing challenges o...

arXiv - Machine Learning · 3 min ·
[2601.15235] Tracing 3D Anatomy in 2D Strokes: A Multi-Stage Projection Driven Approach to Cervical Spine Fracture Identification
Computer Vision

[2601.15235] Tracing 3D Anatomy in 2D Strokes: A Multi-Stage Projection Driven Approach to Cervical Spine Fracture Identification

This article presents a novel approach for identifying cervical spine fractures using a multi-stage projection method that combines 2D im...

arXiv - AI · 4 min ·
[2601.02085] Vision-Based Early Fault Diagnosis and Self-Recovery for Strawberry Harvesting Robots
Robotics

[2601.02085] Vision-Based Early Fault Diagnosis and Self-Recovery for Strawberry Harvesting Robots

This article presents a framework for early fault diagnosis and self-recovery in strawberry harvesting robots, leveraging vision-based te...

arXiv - AI · 4 min ·
[2510.12764] AnyUp: Universal Feature Upsampling
Machine Learning

[2510.12764] AnyUp: Universal Feature Upsampling

The paper presents AnyUp, a novel method for universal feature upsampling applicable to various vision features at any resolution, enhanc...

arXiv - Machine Learning · 3 min ·
[2510.08431] Large Scale Diffusion Distillation via Score-Regularized Continuous-Time Consistency
Machine Learning

[2510.08431] Large Scale Diffusion Distillation via Score-Regularized Continuous-Time Consistency

This paper presents a novel approach to large-scale diffusion distillation using a score-regularized continuous-time consistency model, a...

arXiv - Machine Learning · 4 min ·
[2512.12206] ALERT Open Dataset and Input-Size-Agnostic Vision Transformer for Driver Activity Recognition using IR-UWB
Machine Learning

[2512.12206] ALERT Open Dataset and Input-Size-Agnostic Vision Transformer for Driver Activity Recognition using IR-UWB

The paper presents the ALERT dataset and an input-size-agnostic Vision Transformer (ISA-ViT) for driver activity recognition using IR-UWB...

arXiv - Machine Learning · 4 min ·
[2512.09185] Learning Patient-Specific Disease Dynamics with Latent Flow Matching for Longitudinal Imaging Generation
Machine Learning

[2512.09185] Learning Patient-Specific Disease Dynamics with Latent Flow Matching for Longitudinal Imaging Generation

The paper presents a novel framework, $ ext{Δ}$-LFM, for modeling patient-specific disease dynamics using latent flow matching, enhancing...

arXiv - AI · 4 min ·
[2509.19665] Deep Learning for Clouds and Cloud Shadow Segmentation in Methane Satellite and Airborne Imaging Spectroscopy
Machine Learning

[2509.19665] Deep Learning for Clouds and Cloud Shadow Segmentation in Methane Satellite and Airborne Imaging Spectroscopy

This article presents a study on deep learning techniques for detecting clouds and cloud shadows in methane satellite and airborne imagin...

arXiv - Machine Learning · 4 min ·
[2511.11030] Algorithms Trained on Normal Chest X-rays Can Predict Health Insurance Types
Machine Learning

[2511.11030] Algorithms Trained on Normal Chest X-rays Can Predict Health Insurance Types

This study explores how deep learning algorithms trained on normal chest X-rays can predict patients' health insurance types, revealing h...

arXiv - AI · 4 min ·
[2510.22391] Top-Down Semantic Refinement for Image Captioning
Llms

[2510.22391] Top-Down Semantic Refinement for Image Captioning

This paper introduces Top-Down Semantic Refinement (TDSR) for image captioning, addressing the limitations of Vision-Language Models (VLM...

arXiv - AI · 4 min ·
Previous Page 38 Next

Related Topics

Stay updated with AI News

Get the latest news, tools, and insights delivered to your inbox.

Daily or weekly digest • Unsubscribe anytime