Computer Vision

Image recognition, detection, and visual AI

Top This Week

[2511.21428] From Observation to Action: Latent Action-based Primitive Segmentation for VLA Pre-training in Industrial Settings
Machine Learning

[2511.21428] From Observation to Action: Latent Action-based Primitive Segmentation for VLA Pre-training in Industrial Settings

Abstract page for arXiv paper 2511.21428: From Observation to Action: Latent Action-based Primitive Segmentation for VLA Pre-training in ...

arXiv - AI · 4 min ·
[2511.16719] SAM 3: Segment Anything with Concepts
Machine Learning

[2511.16719] SAM 3: Segment Anything with Concepts

Abstract page for arXiv paper 2511.16719: SAM 3: Segment Anything with Concepts

arXiv - AI · 4 min ·
[2603.28594] Detection of Adversarial Attacks in Robotic Perception
Machine Learning

[2603.28594] Detection of Adversarial Attacks in Robotic Perception

Abstract page for arXiv paper 2603.28594: Detection of Adversarial Attacks in Robotic Perception

arXiv - AI · 3 min ·

All Content

[2602.15072] GRAFNet: Multiscale Retinal Processing via Guided Cortical Attention Feedback for Enhancing Medical Image Polyp Segmentation
Machine Learning

[2602.15072] GRAFNet: Multiscale Retinal Processing via Guided Cortical Attention Feedback for Enhancing Medical Image Polyp Segmentation

GRAFNet introduces a novel architecture for polyp segmentation in colonoscopy, enhancing accuracy through biologically inspired multi-sca...

arXiv - AI · 4 min ·
[2602.15727] Spanning the Visual Analogy Space with a Weight Basis of LoRAs
Machine Learning

[2602.15727] Spanning the Visual Analogy Space with a Weight Basis of LoRAs

The paper presents LoRWeB, a novel approach to visual analogy learning that enhances image manipulation by dynamically selecting and weig...

arXiv - AI · 4 min ·
[2602.15382] The Vision Wormhole: Latent-Space Communication in Heterogeneous Multi-Agent Systems
Llms

[2602.15382] The Vision Wormhole: Latent-Space Communication in Heterogeneous Multi-Agent Systems

The paper introduces the Vision Wormhole, a framework for enabling efficient latent-space communication in heterogeneous multi-agent syst...

arXiv - Machine Learning · 4 min ·
[2602.15368] GMAIL: Generative Modality Alignment for generated Image Learning
Machine Learning

[2602.15368] GMAIL: Generative Modality Alignment for generated Image Learning

The paper presents GMAIL, a novel framework for aligning generated images with real images in machine learning, enhancing performance in ...

arXiv - Machine Learning · 4 min ·
[2602.15645] CARE Drive A Framework for Evaluating Reason-Responsiveness of Vision Language Models in Automated Driving
Llms

[2602.15645] CARE Drive A Framework for Evaluating Reason-Responsiveness of Vision Language Models in Automated Driving

The article presents CARE Drive, a framework for evaluating the reason-responsiveness of vision language models in automated driving, add...

arXiv - AI · 4 min ·
[2602.15580] How Vision Becomes Language: A Layer-wise Information-Theoretic Analysis of Multimodal Reasoning
Machine Learning

[2602.15580] How Vision Becomes Language: A Layer-wise Information-Theoretic Analysis of Multimodal Reasoning

This paper analyzes how multimodal Transformers integrate visual and linguistic information, revealing a layer-wise evolution of predicti...

arXiv - AI · 4 min ·
[2602.15277] Accelerating Large-Scale Dataset Distillation via Exploration-Exploitation Optimization
Machine Learning

[2602.15277] Accelerating Large-Scale Dataset Distillation via Exploration-Exploitation Optimization

This paper presents Exploration-Exploitation Distillation (E^2D), a method for efficient large-scale dataset distillation that balances a...

arXiv - Machine Learning · 4 min ·
[2602.15181] Time-Archival Camera Virtualization for Sports and Visual Performances
Computer Vision

[2602.15181] Time-Archival Camera Virtualization for Sports and Visual Performances

This paper presents a novel approach to camera virtualization for sports and visual performances, enabling photorealistic rendering from ...

arXiv - Machine Learning · 4 min ·
[2602.15154] Loss Knows Best: Detecting Annotation Errors in Videos via Loss Trajectories
Machine Learning

[2602.15154] Loss Knows Best: Detecting Annotation Errors in Videos via Loss Trajectories

The paper presents a novel method for detecting annotation errors in video datasets by analyzing loss trajectories, enhancing model train...

arXiv - Machine Learning · 4 min ·
[2602.15294] EAA: Automating materials characterization with vision language model agents
Llms

[2602.15294] EAA: Automating materials characterization with vision language model agents

The paper introduces Experiment Automation Agents (EAA), a system leveraging vision-language models to automate complex microscopy workfl...

arXiv - AI · 3 min ·
[2602.15087] StrokeNeXt: A Siamese-encoder Approach for Brain Stroke Classification in Computed Tomography Imagery
Machine Learning

[2602.15087] StrokeNeXt: A Siamese-encoder Approach for Brain Stroke Classification in Computed Tomography Imagery

StrokeNeXt introduces a Siamese-encoder model for classifying brain strokes in CT images, achieving high accuracy and low misclassificati...

arXiv - Machine Learning · 3 min ·
[2306.17652] Accurate 2D Reconstruction for PET Scanners based on the Analytical White Image Model
Machine Learning

[2306.17652] Accurate 2D Reconstruction for PET Scanners based on the Analytical White Image Model

This paper presents a mathematical model for accurate 2D reconstruction in PET scanners, utilizing an Analytical White Image Model to enh...

arXiv - Machine Learning · 4 min ·
[2602.15067] Attention-gated U-Net model for semantic segmentation of brain tumors and feature extraction for survival prognosis
Machine Learning

[2602.15067] Attention-gated U-Net model for semantic segmentation of brain tumors and feature extraction for survival prognosis

The article presents an Attention-Gated U-Net model for semantic segmentation of brain tumors, enhancing treatment planning through impro...

arXiv - AI · 3 min ·
[2602.15648] Guided Diffusion by Optimized Loss Functions on Relaxed Parameters for Inverse Material Design
Generative Ai

[2602.15648] Guided Diffusion by Optimized Loss Functions on Relaxed Parameters for Inverse Material Design

This paper presents a novel method for inverse material design using guided diffusion and optimized loss functions, addressing challenges...

arXiv - Machine Learning · 4 min ·
[2602.15460] On the Out-of-Distribution Generalization of Reasoning in Multimodal LLMs for Simple Visual Planning Tasks
Llms

[2602.15460] On the Out-of-Distribution Generalization of Reasoning in Multimodal LLMs for Simple Visual Planning Tasks

This paper evaluates the out-of-distribution generalization of reasoning in multimodal large language models (LLMs) through a grid-based ...

arXiv - Machine Learning · 4 min ·
[2602.15393] Doubly Stochastic Mean-Shift Clustering
Nlp

[2602.15393] Doubly Stochastic Mean-Shift Clustering

The paper presents Doubly Stochastic Mean-Shift (DSMS), an innovative clustering algorithm that enhances standard Mean-Shift methods by i...

arXiv - Machine Learning · 3 min ·
[2602.15200] COMPOT: Calibration-Optimized Matrix Procrustes Orthogonalization for Transformers Compression
Machine Learning

[2602.15200] COMPOT: Calibration-Optimized Matrix Procrustes Orthogonalization for Transformers Compression

The paper presents COMPOT, a novel framework for compressing Transformer models using Calibration-Optimized Matrix Procrustes Orthogonali...

arXiv - Machine Learning · 3 min ·
[2602.15183] Seeing to Generalize: How Visual Data Corrects Binding Shortcuts
Llms

[2602.15183] Seeing to Generalize: How Visual Data Corrects Binding Shortcuts

This article explores how Vision Language Models (VLMs) enhance performance on text-only tasks by correcting binding shortcuts through vi...

arXiv - Machine Learning · 4 min ·
[2602.15155] Refine Now, Query Fast: A Decoupled Refinement Paradigm for Implicit Neural Fields
Machine Learning

[2602.15155] Refine Now, Query Fast: A Decoupled Refinement Paradigm for Implicit Neural Fields

The paper presents a Decoupled Representation Refinement (DRR) paradigm for Implicit Neural Representations (INRs), enhancing speed and f...

arXiv - Machine Learning · 4 min ·
Apple is reportedly planning to launch AI-powered glasses, a pendant, and AirPods | The Verge
Ai Startups

Apple is reportedly planning to launch AI-powered glasses, a pendant, and AirPods | The Verge

Apple is set to launch AI-powered smart glasses, a pendant, and upgraded AirPods, enhancing its AI hardware lineup with features like cam...

The Verge - AI · 5 min ·
Previous Page 37 Next

Related Topics

Stay updated with AI News

Get the latest news, tools, and insights delivered to your inbox.

Daily or weekly digest • Unsubscribe anytime