Computer Vision

Image recognition, detection, and visual AI

This Week's Best | Monthly Best | Guide | Trending

Top This Week

Machine Learning

[2511.21428] From Observation to Action: Latent Action-based Primitive Segmentation for VLA Pre-training in Industrial Settings

Abstract page for arXiv paper 2511.21428: From Observation to Action: Latent Action-based Primitive Segmentation for VLA Pre-training in ...

arXiv - AI · 4 min · about 16 hours ago

Machine Learning

[2511.16719] SAM 3: Segment Anything with Concepts

Abstract page for arXiv paper 2511.16719: SAM 3: Segment Anything with Concepts

arXiv - AI · 4 min · about 16 hours ago

Machine Learning

[2603.28594] Detection of Adversarial Attacks in Robotic Perception

Abstract page for arXiv paper 2603.28594: Detection of Adversarial Attacks in Robotic Perception

arXiv - AI · 3 min · about 16 hours ago

All Content

Llms

[2602.12618] Vision Token Reduction via Attention-Driven Self-Compression for Efficient Multimodal Large Language Models

This paper presents Attention-Driven Self-Compression (ADSC), a novel method for reducing vision tokens in Multimodal Large Language Mode...

arXiv - AI · 4 min · about 1 month ago

Machine Learning

[2602.12508] Monocular Reconstruction of Neural Tactile Fields

This paper presents a novel approach to robotic navigation using neural tactile fields, enabling robots to predict tactile responses from...

arXiv - AI · 3 min · about 1 month ago

Machine Learning

[2602.12486] Human-Like Coarse Object Representations in Vision Models

This paper explores how vision models can develop human-like coarse object representations, emphasizing the balance between detail and ph...

arXiv - AI · 3 min · about 1 month ago

Machine Learning

[2602.12484] A Lightweight and Explainable DenseNet-121 Framework for Grape Leaf Disease Classification

This article presents a novel DenseNet-121 framework for classifying grape leaf diseases, achieving high accuracy and interpretability wh...

arXiv - AI · 4 min · about 1 month ago

Llms

[2602.12395] What does RL improve for Visual Reasoning? A Frankenstein-Style Analysis

This paper explores the impact of reinforcement learning (RL) on visual reasoning capabilities in vision-language models, proposing a nov...

arXiv - AI · 3 min · about 1 month ago

Machine Learning

[2602.12393] Reproducing DragDiffusion: Interactive Point-Based Editing with Diffusion Models

This article presents a reproducibility study of DragDiffusion, a method for interactive point-based image editing using diffusion models...

arXiv - Machine Learning · 4 min · about 1 month ago

Machine Learning

[2602.12322] ForeAct: Steering Your VLA with Efficient Visual Foresight Planning

The paper presents ForeAct, a novel Visual Foresight Planning framework that enhances Vision-Language-Action (VLA) models by enabling the...

arXiv - AI · 4 min · about 1 month ago

Llms

[2602.12317] Free Lunch in Medical Image Foundation Model Pre-training via Randomized Synthesis and Disentanglement

The paper presents RaSD, a framework for pre-training medical image foundation models using synthetic data, demonstrating superior perfor...

arXiv - Machine Learning · 4 min · about 1 month ago

Computer Vision

[2602.12313] Visible and Hyperspectral Imaging for Quality Assessment of Milk: Property Characterisation and Identification

This study explores the use of visible and hyperspectral imaging for the rapid, non-destructive assessment of milk quality, demonstrating...

arXiv - Machine Learning · 4 min · about 1 month ago

Computer Vision

[2602.12306] Quantum walk inspired JPEG compression of images

This article presents a novel JPEG compression method inspired by quantum walks, enhancing traditional techniques through an adaptive qua...

arXiv - AI · 3 min · about 1 month ago

Machine Learning

[2602.12304] OmniCustom: Sync Audio-Video Customization Via Joint Audio-Video Generation Model

The paper introduces OmniCustom, a novel framework for synchronizing audio-video customization, enhancing identity and timbre fidelity th...

arXiv - AI · 4 min · about 1 month ago

Llms

[2511.13494] Language-Guided Invariance Probing of Vision-Language Models

This article introduces Language-Guided Invariance Probing (LGIP), a benchmark for evaluating the robustness of vision-language models (V...

arXiv - AI · 3 min · about 1 month ago

Llms

[2602.12876] BrowseComp-$V^3$: A Visual, Vertical, and Verifiable Benchmark for Multimodal Browsing Agents

BrowseComp-$V^3$ introduces a new benchmark for evaluating multimodal browsing agents, focusing on complex reasoning across visual and te...

arXiv - AI · 4 min · about 1 month ago

Machine Learning

[P]ut a Neural Network in VCV Rack 2 and told it to make sounds that influence my emotion tracking module…

It decided to blow out my right headphone to make me show fear Some Background: I’m working on integrating computer vision and facial tra...

Reddit - Machine Learning · 1 min · about 1 month ago

Open Source Ai

429 – Hugging Face

We’re on a journey to advance and democratize artificial intelligence through open source and open science.