Computer Vision

Image recognition, detection, and visual AI

Top This Week

[2511.21428] From Observation to Action: Latent Action-based Primitive Segmentation for VLA Pre-training in Industrial Settings
Machine Learning

[2511.21428] From Observation to Action: Latent Action-based Primitive Segmentation for VLA Pre-training in Industrial Settings

Abstract page for arXiv paper 2511.21428: From Observation to Action: Latent Action-based Primitive Segmentation for VLA Pre-training in ...

arXiv - AI · 4 min ·
[2511.16719] SAM 3: Segment Anything with Concepts
Machine Learning

[2511.16719] SAM 3: Segment Anything with Concepts

Abstract page for arXiv paper 2511.16719: SAM 3: Segment Anything with Concepts

arXiv - AI · 4 min ·
[2603.28594] Detection of Adversarial Attacks in Robotic Perception
Machine Learning

[2603.28594] Detection of Adversarial Attacks in Robotic Perception

Abstract page for arXiv paper 2603.28594: Detection of Adversarial Attacks in Robotic Perception

arXiv - AI · 3 min ·

All Content

[2508.14746] MissionHD: Hyperdimensional Refinement of Distribution-Deficient Reasoning Graphs for Video Anomaly Detection
Llms

[2508.14746] MissionHD: Hyperdimensional Refinement of Distribution-Deficient Reasoning Graphs for Video Anomaly Detection

The paper presents MissionHD, a novel approach for video anomaly detection using hyperdimensional refinement of reasoning graphs, address...

arXiv - Machine Learning · 3 min ·
[2507.03168] Adopting a human developmental visual diet yields robust, shape-based AI vision
Ai Safety

[2507.03168] Adopting a human developmental visual diet yields robust, shape-based AI vision

This article presents a novel approach to AI vision by adopting a human developmental visual diet, enhancing shape recognition and resili...

arXiv - Machine Learning · 4 min ·
[2602.13197] Imitating What Works: Simulation-Filtered Modular Policy Learning from Human Videos
Robotics

[2602.13197] Imitating What Works: Simulation-Filtered Modular Policy Learning from Human Videos

This article presents a framework called Perceive-Simulate-Imitate (PSI) for training robots to learn manipulation skills from human vide...

arXiv - Machine Learning · 4 min ·
[2602.13168] Realistic Face Reconstruction from Facial Embeddings via Diffusion Models
Machine Learning

[2602.13168] Realistic Face Reconstruction from Facial Embeddings via Diffusion Models

This paper presents a novel framework for reconstructing realistic high-resolution face images from facial embeddings using diffusion mod...

arXiv - Machine Learning · 3 min ·
[2602.13024] FedHENet: A Frugal Federated Learning Framework for Heterogeneous Environments
Machine Learning

[2602.13024] FedHENet: A Frugal Federated Learning Framework for Heterogeneous Environments

FedHENet introduces a frugal federated learning framework that enhances energy efficiency and stability in heterogeneous environments whi...

arXiv - Machine Learning · 3 min ·
[2602.13003] MASAR: Motion-Appearance Synergy Refinement for Joint Detection and Trajectory Forecasting
Machine Learning

[2602.13003] MASAR: Motion-Appearance Synergy Refinement for Joint Detection and Trajectory Forecasting

The paper presents MASAR, a novel framework for joint 3D detection and trajectory forecasting that enhances performance by integrating mo...

arXiv - Machine Learning · 3 min ·
[2602.12916] Reliable Thinking with Images
Llms

[2602.12916] Reliable Thinking with Images

The paper discusses 'Reliable Thinking with Images,' a method to enhance reasoning in Multi-modal Large Language Models (MLLMs) by addres...

arXiv - Machine Learning · 4 min ·
[2602.12742] Synthetic Craquelure Generation for Unsupervised Painting Restoration
Computer Vision

[2602.12742] Synthetic Craquelure Generation for Unsupervised Painting Restoration

This article presents a novel framework for unsupervised painting restoration by generating synthetic craquelure patterns, enhancing the ...

arXiv - Machine Learning · 3 min ·
[2602.12696] Channel-Aware Probing for Multi-Channel Imaging
Machine Learning

[2602.12696] Channel-Aware Probing for Multi-Channel Imaging

The paper presents Channel-Aware Probing (CAP), a method for improving multi-channel imaging (MCI) performance by leveraging inter-channe...

arXiv - Machine Learning · 3 min ·
[2602.12510] Visual RAG Toolkit: Scaling Multi-Vector Visual Retrieval with Training-Free Pooling and Multi-Stage Search
Machine Learning

[2602.12510] Visual RAG Toolkit: Scaling Multi-Vector Visual Retrieval with Training-Free Pooling and Multi-Stage Search

The Visual RAG Toolkit enhances multi-vector visual retrieval by introducing a training-free pooling method and a multi-stage search proc...

arXiv - Machine Learning · 4 min ·
[2602.12407] MiDAS: A Multimodal Data Acquisition System and Dataset for Robot-Assisted Minimally Invasive Surgery
Robotics

[2602.12407] MiDAS: A Multimodal Data Acquisition System and Dataset for Robot-Assisted Minimally Invasive Surgery

The paper presents MiDAS, an open-source multimodal data acquisition system for robot-assisted minimally invasive surgery, enabling synch...

arXiv - Machine Learning · 3 min ·
[2602.12349] Variational Green's Functions for Volumetric PDEs
Computer Vision

[2602.12349] Variational Green's Functions for Volumetric PDEs

This article presents a novel method called Variational Green's Function (VGF) for efficiently computing Green's functions for volumetric...

arXiv - Machine Learning · 3 min ·
[2602.13030] Resource-Efficient Gesture Recognition through Convexified Attention
Machine Learning

[2602.13030] Resource-Efficient Gesture Recognition through Convexified Attention

This paper presents a novel convexified attention mechanism for resource-efficient gesture recognition in wearable e-textile interfaces, ...

arXiv - Machine Learning · 4 min ·
[2602.12982] Multi-Dimensional Visual Data Recovery: Scale-Aware Tensor Modeling and Accelerated Randomized Computation
Machine Learning

[2602.12982] Multi-Dimensional Visual Data Recovery: Scale-Aware Tensor Modeling and Accelerated Randomized Computation

The paper presents a novel approach to multi-dimensional visual data recovery using Scale-Aware Tensor Modeling and accelerated randomize...

arXiv - Machine Learning · 4 min ·
[2602.12744] Adaptive Structured Pruning of Convolutional Neural Networks for Time Series Classification
Machine Learning

[2602.12744] Adaptive Structured Pruning of Convolutional Neural Networks for Time Series Classification

This article presents Dynamic Structured Pruning (DSP), an innovative method for optimizing convolutional neural networks in time series ...

arXiv - Machine Learning · 4 min ·
[2602.12624] Formalizing the Sampling Design Space of Diffusion-Based Generative Models via Adaptive Solvers and Wasserstein-Bounded Timesteps
Machine Learning

[2602.12624] Formalizing the Sampling Design Space of Diffusion-Based Generative Models via Adaptive Solvers and Wasserstein-Bounded Timesteps

This paper presents a framework for optimizing sampling in diffusion-based generative models, addressing high sampling costs through adap...

arXiv - Machine Learning · 4 min ·
[2602.12205] DeepGen 1.0: A Lightweight Unified Multimodal Model for Advancing Image Generation and Editing
Machine Learning

[2602.12205] DeepGen 1.0: A Lightweight Unified Multimodal Model for Advancing Image Generation and Editing

DeepGen 1.0 is a lightweight unified multimodal model designed for image generation and editing, achieving competitive performance with o...

arXiv - AI · 4 min ·
[2602.11638] Variation-aware Flexible 3D Gaussian Editing
Computer Vision

[2602.11638] Variation-aware Flexible 3D Gaussian Editing

The paper presents VF-Editor, a novel approach for flexible 3D Gaussian editing that addresses limitations of indirect editing methods by...

arXiv - AI · 3 min ·
[2601.12357] SimpleMatch: A Simple and Strong Baseline for Semantic Correspondence
Machine Learning

[2601.12357] SimpleMatch: A Simple and Strong Baseline for Semantic Correspondence

The paper presents SimpleMatch, a novel framework for semantic correspondence that enhances performance at lower resolutions while reduci...

arXiv - AI · 4 min ·
[2601.09605] Sim2real Image Translation Enables Viewpoint-Robust Policies from Fixed-Camera Datasets
Nlp

[2601.09605] Sim2real Image Translation Enables Viewpoint-Robust Policies from Fixed-Camera Datasets

The paper presents MANGO, a novel image translation method that enhances viewpoint robustness in robot manipulation policies using fixed-...

arXiv - AI · 4 min ·
Previous Page 45 Next

Related Topics

Stay updated with AI News

Get the latest news, tools, and insights delivered to your inbox.

Daily or weekly digest • Unsubscribe anytime