Computer Vision

Image recognition, detection, and visual AI

This Week's Best | Monthly Best | Guide | Trending

Top This Week

Machine Learning

[2511.21428] From Observation to Action: Latent Action-based Primitive Segmentation for VLA Pre-training in Industrial Settings

Abstract page for arXiv paper 2511.21428: From Observation to Action: Latent Action-based Primitive Segmentation for VLA Pre-training in ...

arXiv - AI · 4 min · about 13 hours ago

Machine Learning

[2511.16719] SAM 3: Segment Anything with Concepts

Abstract page for arXiv paper 2511.16719: SAM 3: Segment Anything with Concepts

arXiv - AI · 4 min · about 13 hours ago

Machine Learning

[2603.28594] Detection of Adversarial Attacks in Robotic Perception

Abstract page for arXiv paper 2603.28594: Detection of Adversarial Attacks in Robotic Perception

arXiv - AI · 3 min · about 13 hours ago

All Content

Llms

[2508.14746] MissionHD: Hyperdimensional Refinement of Distribution-Deficient Reasoning Graphs for Video Anomaly Detection

The paper presents MissionHD, a novel approach for video anomaly detection using hyperdimensional refinement of reasoning graphs, address...

arXiv - Machine Learning · 3 min · about 1 month ago

Ai Safety

[2507.03168] Adopting a human developmental visual diet yields robust, shape-based AI vision

This article presents a novel approach to AI vision by adopting a human developmental visual diet, enhancing shape recognition and resili...

arXiv - Machine Learning · 4 min · about 1 month ago

Robotics

[2602.13197] Imitating What Works: Simulation-Filtered Modular Policy Learning from Human Videos

This article presents a framework called Perceive-Simulate-Imitate (PSI) for training robots to learn manipulation skills from human vide...

arXiv - Machine Learning · 4 min · about 1 month ago

Machine Learning

[2602.13168] Realistic Face Reconstruction from Facial Embeddings via Diffusion Models

This paper presents a novel framework for reconstructing realistic high-resolution face images from facial embeddings using diffusion mod...

arXiv - Machine Learning · 3 min · about 1 month ago

Machine Learning

[2602.13024] FedHENet: A Frugal Federated Learning Framework for Heterogeneous Environments

FedHENet introduces a frugal federated learning framework that enhances energy efficiency and stability in heterogeneous environments whi...

arXiv - Machine Learning · 3 min · about 1 month ago

Machine Learning

[2602.13003] MASAR: Motion-Appearance Synergy Refinement for Joint Detection and Trajectory Forecasting

The paper presents MASAR, a novel framework for joint 3D detection and trajectory forecasting that enhances performance by integrating mo...

arXiv - Machine Learning · 3 min · about 1 month ago

Llms

[2602.12916] Reliable Thinking with Images

The paper discusses 'Reliable Thinking with Images,' a method to enhance reasoning in Multi-modal Large Language Models (MLLMs) by addres...

arXiv - Machine Learning · 4 min · about 1 month ago

Computer Vision

[2602.12742] Synthetic Craquelure Generation for Unsupervised Painting Restoration

This article presents a novel framework for unsupervised painting restoration by generating synthetic craquelure patterns, enhancing the ...

arXiv - Machine Learning · 3 min · about 1 month ago

Machine Learning

[2602.12696] Channel-Aware Probing for Multi-Channel Imaging

The paper presents Channel-Aware Probing (CAP), a method for improving multi-channel imaging (MCI) performance by leveraging inter-channe...

arXiv - Machine Learning · 3 min · about 1 month ago

Machine Learning

[2602.12510] Visual RAG Toolkit: Scaling Multi-Vector Visual Retrieval with Training-Free Pooling and Multi-Stage Search

The Visual RAG Toolkit enhances multi-vector visual retrieval by introducing a training-free pooling method and a multi-stage search proc...

arXiv - Machine Learning · 4 min · about 1 month ago

Robotics

[2602.12407] MiDAS: A Multimodal Data Acquisition System and Dataset for Robot-Assisted Minimally Invasive Surgery

The paper presents MiDAS, an open-source multimodal data acquisition system for robot-assisted minimally invasive surgery, enabling synch...

arXiv - Machine Learning · 3 min · about 1 month ago

Computer Vision

[2602.12349] Variational Green's Functions for Volumetric PDEs

This article presents a novel method called Variational Green's Function (VGF) for efficiently computing Green's functions for volumetric...

arXiv - Machine Learning · 3 min · about 1 month ago

Machine Learning

[2602.13030] Resource-Efficient Gesture Recognition through Convexified Attention

This paper presents a novel convexified attention mechanism for resource-efficient gesture recognition in wearable e-textile interfaces, ...

arXiv - Machine Learning · 4 min · about 1 month ago

Machine Learning

[2602.12982] Multi-Dimensional Visual Data Recovery: Scale-Aware Tensor Modeling and Accelerated Randomized Computation

The paper presents a novel approach to multi-dimensional visual data recovery using Scale-Aware Tensor Modeling and accelerated randomize...

arXiv - Machine Learning · 4 min · about 1 month ago

Machine Learning

[2602.12744] Adaptive Structured Pruning of Convolutional Neural Networks for Time Series Classification

This article presents Dynamic Structured Pruning (DSP), an innovative method for optimizing convolutional neural networks in time series ...

arXiv - Machine Learning · 4 min · about 1 month ago

Machine Learning

[2602.12624] Formalizing the Sampling Design Space of Diffusion-Based Generative Models via Adaptive Solvers and Wasserstein-Bounded Timesteps

This paper presents a framework for optimizing sampling in diffusion-based generative models, addressing high sampling costs through adap...

arXiv - Machine Learning · 4 min · about 1 month ago

Machine Learning

[2602.12205] DeepGen 1.0: A Lightweight Unified Multimodal Model for Advancing Image Generation and Editing

DeepGen 1.0 is a lightweight unified multimodal model designed for image generation and editing, achieving competitive performance with o...

arXiv - AI · 4 min · about 1 month ago

Computer Vision

[2602.11638] Variation-aware Flexible 3D Gaussian Editing

The paper presents VF-Editor, a novel approach for flexible 3D Gaussian editing that addresses limitations of indirect editing methods by...

arXiv - AI · 3 min · about 1 month ago

Machine Learning

[2601.12357] SimpleMatch: A Simple and Strong Baseline for Semantic Correspondence

The paper presents SimpleMatch, a novel framework for semantic correspondence that enhances performance at lower resolutions while reduci...

arXiv - AI · 4 min · about 1 month ago

Nlp

[2601.09605] Sim2real Image Translation Enables Viewpoint-Robust Policies from Fixed-Camera Datasets

The paper presents MANGO, a novel image translation method that enhances viewpoint robustness in robot manipulation policies using fixed-...

arXiv - AI · 4 min · about 1 month ago

Previous Page 45 Next

Stay updated with AI News

Get the latest news, tools, and insights delivered to your inbox.

Subscribe to Newsletter

Daily or weekly digest • Unsubscribe anytime

Computer Vision

Top This Week

[2511.21428] From Observation to Action: Latent Action-based Primitive Segmentation for VLA Pre-training in Industrial Settings

[2511.16719] SAM 3: Segment Anything with Concepts

[2603.28594] Detection of Adversarial Attacks in Robotic Perception

All Content

[2508.14746] MissionHD: Hyperdimensional Refinement of Distribution-Deficient Reasoning Graphs for Video Anomaly Detection

[2507.03168] Adopting a human developmental visual diet yields robust, shape-based AI vision

[2602.13197] Imitating What Works: Simulation-Filtered Modular Policy Learning from Human Videos

[2602.13168] Realistic Face Reconstruction from Facial Embeddings via Diffusion Models

[2602.13024] FedHENet: A Frugal Federated Learning Framework for Heterogeneous Environments

[2602.13003] MASAR: Motion-Appearance Synergy Refinement for Joint Detection and Trajectory Forecasting

[2602.12916] Reliable Thinking with Images

[2602.12742] Synthetic Craquelure Generation for Unsupervised Painting Restoration

[2602.12696] Channel-Aware Probing for Multi-Channel Imaging

[2602.12510] Visual RAG Toolkit: Scaling Multi-Vector Visual Retrieval with Training-Free Pooling and Multi-Stage Search

[2602.12407] MiDAS: A Multimodal Data Acquisition System and Dataset for Robot-Assisted Minimally Invasive Surgery

[2602.12349] Variational Green's Functions for Volumetric PDEs

[2602.13030] Resource-Efficient Gesture Recognition through Convexified Attention

[2602.12982] Multi-Dimensional Visual Data Recovery: Scale-Aware Tensor Modeling and Accelerated Randomized Computation

[2602.12744] Adaptive Structured Pruning of Convolutional Neural Networks for Time Series Classification

[2602.12624] Formalizing the Sampling Design Space of Diffusion-Based Generative Models via Adaptive Solvers and Wasserstein-Bounded Timesteps

[2602.12205] DeepGen 1.0: A Lightweight Unified Multimodal Model for Advancing Image Generation and Editing

[2602.11638] Variation-aware Flexible 3D Gaussian Editing

[2601.12357] SimpleMatch: A Simple and Strong Baseline for Semantic Correspondence

[2601.09605] Sim2real Image Translation Enables Viewpoint-Robust Policies from Fixed-Camera Datasets

Related Topics

Stay updated with AI News