Computer Vision

Image recognition, detection, and visual AI

This Week's Best | Monthly Best | Guide | Trending

Top This Week

Machine Learning

[2511.21428] From Observation to Action: Latent Action-based Primitive Segmentation for VLA Pre-training in Industrial Settings

Abstract page for arXiv paper 2511.21428: From Observation to Action: Latent Action-based Primitive Segmentation for VLA Pre-training in ...

arXiv - AI · 4 min · about 12 hours ago

Machine Learning

[2511.16719] SAM 3: Segment Anything with Concepts

Abstract page for arXiv paper 2511.16719: SAM 3: Segment Anything with Concepts

arXiv - AI · 4 min · about 12 hours ago

Machine Learning

[2603.28594] Detection of Adversarial Attacks in Robotic Perception

Abstract page for arXiv paper 2603.28594: Detection of Adversarial Attacks in Robotic Perception

arXiv - AI · 3 min · about 12 hours ago

All Content

Machine Learning

[2602.14065] REAL: Resolving Knowledge Conflicts in Knowledge-Intensive Visual Question Answering via Reasoning-Pivot Alignment

The paper presents the REAL framework, which addresses knowledge conflicts in Knowledge-Intensive Visual Question Answering (KI-VQA) by i...

arXiv - AI · 3 min · about 1 month ago

Machine Learning

[2602.13670] Advancing Analytic Class-Incremental Learning through Vision-Language Calibration

This article presents VILA, a novel framework for class-incremental learning that utilizes vision-language calibration to enhance efficie...

arXiv - Machine Learning · 3 min · about 1 month ago

Computer Vision

[2602.13660] Optimized Certainty Equivalent Risk-Controlling Prediction Sets

This paper presents the Optimized Certainty Equivalent Risk-Controlling Prediction Sets (OCE-RCPS), a framework designed to enhance relia...

arXiv - Machine Learning · 3 min · about 1 month ago

Llms

[2602.13912] From Pixels to Policies: Reinforcing Spatial Reasoning in Language Models for Content-Aware Layout Design

The paper presents LaySPA, a reinforcement learning framework designed to enhance spatial reasoning in large language models for effectiv...

arXiv - AI · 3 min · about 1 month ago

Machine Learning

[2602.13880] VSAL: A Vision Solver with Adaptive Layouts for Graph Property Detection

The paper presents VSAL, a vision-based framework for graph property detection that utilizes adaptive layouts to enhance the detection of...

arXiv - AI · 3 min · about 1 month ago

Machine Learning

[2602.13738] OneLatent: Single-Token Compression for Visual Latent Reasoning

The paper introduces OneLatent, a framework that compresses reasoning in visual tasks into a single token, significantly reducing output ...

arXiv - AI · 3 min · about 1 month ago

Machine Learning

[2602.13348] Exploring the Performance of ML/DL Architectures on the MNIST-1D Dataset

This article evaluates the performance of advanced machine learning architectures on the MNIST-1D dataset, demonstrating their effectiven...

arXiv - AI · 4 min · about 1 month ago

Nlp

[2602.13345] BLUEPRINT Rebuilding a Legacy: Multimodal Retrieval for Complex Engineering Drawings and Documents

The paper presents Blueprint, a multimodal retrieval system designed to enhance the accessibility of complex engineering drawings and doc...

arXiv - Machine Learning · 3 min · about 1 month ago

Llms

[2602.13235] Lang2Act: Fine-Grained Visual Reasoning through Self-Emergent Linguistic Toolchains

The paper introduces Lang2Act, a novel framework for enhancing visual reasoning in Vision-Language Models (VLMs) through self-emergent li...

arXiv - AI · 4 min · about 1 month ago

Llms

[2602.13232] PlotChain: Deterministic Checkpointed Evaluation of Multimodal LLMs on Engineering Plot Reading

PlotChain introduces a deterministic benchmark for evaluating multimodal large language models (MLLMs) on engineering plot reading, focus...

arXiv - AI · 4 min · about 1 month ago

Machine Learning

[D] SparseFormer and the future of efficient Al vision models

The article discusses SparseFormer, a new architecture for vision transformers that addresses the compute bottleneck in AI vision models,...

Reddit - Machine Learning · 1 min · about 1 month ago

Machine Learning

Collaboration invite - medical Imag!ng, algorithmic fairness or open track [D]

A 2nd year PhD student seeks collaboration opportunities in medical imaging and algorithmic fairness, inviting community members to conne...

Reddit - Machine Learning · 1 min · about 1 month ago

Machine Learning

[2602.11850] Free Lunch for Stabilizing Rectified Flow Inversion

This paper presents Proximal-Mean Inversion (PMI), a novel method for stabilizing Rectified-Flow (RF) models, enhancing image reconstruct...

arXiv - Machine Learning · 4 min · about 1 month ago

Llms

[2602.04884] Reinforced Attention Learning

The paper introduces Reinforced Attention Learning (RAL), a novel framework that optimizes internal attention distributions in multimodal...

arXiv - Machine Learning · 3 min · about 1 month ago

Machine Learning

[2602.05096] Visual concept ranking uncovers medical shortcuts used by large multimodal models

This article presents a method called Visual Concept Ranking (VCR) to identify visual concepts in large multimodal models, focusing on th...

arXiv - Machine Learning · 3 min · about 1 month ago

Computer Vision

[2508.21418] From slides to AI-ready maps: Standardized multi-layer tissue maps as metadata for artificial intelligence in digital pathology

This article presents a framework for creating standardized multi-layer tissue maps as metadata for AI in digital pathology, enhancing th...

arXiv - Machine Learning · 4 min · about 1 month ago

Generative Ai

[2506.06027] Sample-Specific Noise Injection For Diffusion-Based Adversarial Purification

This paper introduces Sample-specific Score-aware Noise Injection (SSNI), a novel framework for diffusion-based adversarial purification ...

arXiv - Machine Learning · 4 min · about 1 month ago

Machine Learning

[2505.04586] Active Sampling for MRI-based Sequential Decision Making

This article presents a novel multi-objective reinforcement learning framework for MRI-based sequential decision-making, improving diagno...

arXiv - Machine Learning · 4 min · about 1 month ago

Llms

[2412.06014] Post-hoc Probabilistic Vision-Language Models

This article presents a novel approach to uncertainty estimation in vision-language models (VLMs) by proposing a post-hoc method that enh...

arXiv - Machine Learning · 3 min · about 1 month ago

Machine Learning

[2602.00099] Gauss-Newton Natural Gradient Descent for Shape Learning

This paper presents the Gauss-Newton method for optimization in shape learning, demonstrating faster convergence and improved accuracy ov...

arXiv - Machine Learning · 3 min · about 1 month ago

Previous Page 44 Next

Stay updated with AI News

Get the latest news, tools, and insights delivered to your inbox.

Subscribe to Newsletter

Daily or weekly digest • Unsubscribe anytime

Computer Vision

Top This Week

[2511.21428] From Observation to Action: Latent Action-based Primitive Segmentation for VLA Pre-training in Industrial Settings

[2511.16719] SAM 3: Segment Anything with Concepts

[2603.28594] Detection of Adversarial Attacks in Robotic Perception

All Content

[2602.14065] REAL: Resolving Knowledge Conflicts in Knowledge-Intensive Visual Question Answering via Reasoning-Pivot Alignment

[2602.13670] Advancing Analytic Class-Incremental Learning through Vision-Language Calibration

[2602.13660] Optimized Certainty Equivalent Risk-Controlling Prediction Sets

[2602.13912] From Pixels to Policies: Reinforcing Spatial Reasoning in Language Models for Content-Aware Layout Design

[2602.13880] VSAL: A Vision Solver with Adaptive Layouts for Graph Property Detection

[2602.13738] OneLatent: Single-Token Compression for Visual Latent Reasoning

[2602.13348] Exploring the Performance of ML/DL Architectures on the MNIST-1D Dataset

[2602.13345] BLUEPRINT Rebuilding a Legacy: Multimodal Retrieval for Complex Engineering Drawings and Documents

[2602.13235] Lang2Act: Fine-Grained Visual Reasoning through Self-Emergent Linguistic Toolchains

[2602.13232] PlotChain: Deterministic Checkpointed Evaluation of Multimodal LLMs on Engineering Plot Reading

[D] SparseFormer and the future of efficient Al vision models

Collaboration invite - medical Imag!ng, algorithmic fairness or open track [D]

[2602.11850] Free Lunch for Stabilizing Rectified Flow Inversion

[2602.04884] Reinforced Attention Learning

[2602.05096] Visual concept ranking uncovers medical shortcuts used by large multimodal models

[2508.21418] From slides to AI-ready maps: Standardized multi-layer tissue maps as metadata for artificial intelligence in digital pathology

[2506.06027] Sample-Specific Noise Injection For Diffusion-Based Adversarial Purification

[2505.04586] Active Sampling for MRI-based Sequential Decision Making

[2412.06014] Post-hoc Probabilistic Vision-Language Models

[2602.00099] Gauss-Newton Natural Gradient Descent for Shape Learning

Related Topics

Stay updated with AI News