Computer Vision

Image recognition, detection, and visual AI

This Week's Best | Monthly Best | Guide | Trending

Top This Week

Machine Learning

[2511.21428] From Observation to Action: Latent Action-based Primitive Segmentation for VLA Pre-training in Industrial Settings

Abstract page for arXiv paper 2511.21428: From Observation to Action: Latent Action-based Primitive Segmentation for VLA Pre-training in ...

arXiv - AI · 4 min · about 10 hours ago

Machine Learning

[2511.16719] SAM 3: Segment Anything with Concepts

Abstract page for arXiv paper 2511.16719: SAM 3: Segment Anything with Concepts

arXiv - AI · 4 min · about 10 hours ago

Machine Learning

[2603.28594] Detection of Adversarial Attacks in Robotic Perception

Abstract page for arXiv paper 2603.28594: Detection of Adversarial Attacks in Robotic Perception

arXiv - AI · 3 min · about 10 hours ago

All Content

Machine Learning

[2602.13329] HiST-VLA: A Hierarchical Spatio-Temporal Vision-Language-Action Model for End-to-End Autonomous Driving

The HiST-VLA model enhances autonomous driving by integrating vision, language, and action through improved spatio-temporal reasoning and...

arXiv - AI · 3 min · about 1 month ago

Llms

[2602.13324] Synthesizing the Kill Chain: A Zero-Shot Framework for Target Verification and Tactical Reasoning on the Edge

This paper presents a zero-shot framework for target verification and tactical reasoning in autonomous edge robotics, addressing challeng...

arXiv - AI · 4 min · about 1 month ago

Llms

[2602.13315] IDPruner: Harmonizing Importance and Diversity in Visual Token Pruning for MLLMs

The paper presents IDPruner, a novel method for visual token pruning in Multimodal Large Language Models (MLLMs), balancing importance an...

arXiv - AI · 4 min · about 1 month ago

Machine Learning

[2602.13314] Sim2Radar: Toward Bridging the Radar Sim-to-Real Gap with VLM-Guided Scene Reconstruction

The paper presents Sim2Radar, a framework that generates synthetic radar data from RGB images, addressing the challenges of limited radar...

arXiv - AI · 3 min · about 1 month ago

Ai Agents

[2602.13313] Agentic Spatio-Temporal Grounding via Collaborative Reasoning

The paper presents the Agentic Spatio-Temporal Grounder (ASTG), a novel framework for Spatio-Temporal Video Grounding (STVG) that enhance...

arXiv - AI · 3 min · about 1 month ago

Llms

[2602.13310] Visual Para-Thinker: Divide-and-Conquer Reasoning for Visual Comprehension

The paper introduces Visual Para-Thinker, a novel framework for parallel reasoning in visual comprehension, addressing limitations in exi...

arXiv - AI · 3 min · about 1 month ago

Machine Learning

[2602.13308] Learning to Select Like Humans: Explainable Active Learning for Medical Imaging

This paper presents an explainable active learning framework for medical imaging that enhances data efficiency and interpretability by in...

arXiv - AI · 4 min · about 1 month ago

Llms

[2602.13306] Fine-Tuning a Large Vision-Language Model for Artwork's Scoring and Critique

This paper presents a framework for automating the scoring and critique of artwork using a fine-tuned vision-language model, achieving hi...

arXiv - Machine Learning · 4 min · about 1 month ago

Computer Vision

[2602.13305] WildfireVLM: AI-powered Analysis for Early Wildfire Detection and Risk Assessment Using Satellite Imagery

WildfireVLM introduces an AI framework for early wildfire detection and risk assessment using satellite imagery, enhancing disaster manag...

arXiv - AI · 4 min · about 1 month ago

Ai Safety

[2602.13304] Progressive Contrast Registration for High-Fidelity Bidirectional Photoacoustic Microscopy Alignment

This article presents PCReg-Net, a novel framework for high-fidelity alignment in bidirectional photoacoustic microscopy, significantly i...

arXiv - AI · 3 min · about 1 month ago

Generative Ai

[2602.13303] Spectral Collapse in Diffusion Inversion

The paper discusses 'spectral collapse' in diffusion inversion, highlighting failures in standard deterministic methods for image transla...

arXiv - Machine Learning · 3 min · about 1 month ago

Machine Learning

[2602.13299] KidMesh: Computational Mesh Reconstruction for Pediatric Congenital Hydronephrosis Using Deep Neural Networks

The paper presents KidMesh, a deep learning approach for reconstructing computational meshes for pediatric congenital hydronephrosis from...

arXiv - AI · 4 min · about 1 month ago

Machine Learning

[2602.13298] Effect of Convolutional Depth on Image Recognition Performance: VGG vs. ResNet vs. GoogLeNet

This paper examines how convolutional depth affects image recognition performance across three architectures: VGG, ResNet, and GoogLeNet,...

arXiv - AI · 3 min · about 1 month ago

Llms

[2602.13294] VisPhyWorld: Probing Physical Reasoning via Code-Driven Video Reconstruction

The paper introduces VisPhyWorld, a framework for evaluating physical reasoning in Multimodal Large Language Models (MLLMs) through code-...

arXiv - AI · 3 min · about 1 month ago

Llms

[2602.13289] Evaluating the Impact of Post-Training Quantization on Reliable VQA with Multimodal LLMs

This paper evaluates the effects of Post-Training Quantization (PTQ) on the reliability and accuracy of Visual Question Answering (VQA) u...

arXiv - AI · 4 min · about 1 month ago

Machine Learning

[2602.13286] Explanatory Interactive Machine Learning for Bias Mitigation in Visual Gender Classification

This article explores Explanatory Interactive Machine Learning (XIL) as a method to mitigate bias in visual gender classification, demons...

arXiv - Machine Learning · 4 min · about 1 month ago

Machine Learning

[2602.14318] In Transformer We Trust? A Perspective on Transformer Architecture Failure Modes

The paper examines the trustworthiness of transformer architectures in high-stakes applications, analyzing their reliability, interpretab...

arXiv - Machine Learning · 4 min · about 1 month ago

Machine Learning

[2602.14078] Policy Gradient with Adaptive Entropy Annealing for Continual Fine-Tuning

This paper presents a novel approach, Adaptive Entropy Annealing (aEPG), to enhance continual fine-tuning of large pretrained vision mode...

arXiv - AI · 4 min · about 1 month ago

Machine Learning

[2602.14225] Text Before Vision: Staged Knowledge Injection Matters for Agentic RLVR in Ultra-High-Resolution Remote Sensing Understanding

This paper explores the significance of staged knowledge injection in enhancing agentic reinforcement learning for ultra-high-resolution ...

arXiv - AI · 4 min · about 1 month ago

Machine Learning

[2602.13710] HBVLA: Pushing 1-Bit Post-Training Quantization for Vision-Language-Action Models

The paper presents HBVLA, a framework for 1-bit post-training quantization of Vision-Language-Action models, enhancing efficiency while m...

arXiv - Machine Learning · 4 min · about 1 month ago

Previous Page 43 Next

Stay updated with AI News

Get the latest news, tools, and insights delivered to your inbox.

Subscribe to Newsletter

Daily or weekly digest • Unsubscribe anytime

Computer Vision

Top This Week

[2511.21428] From Observation to Action: Latent Action-based Primitive Segmentation for VLA Pre-training in Industrial Settings

[2511.16719] SAM 3: Segment Anything with Concepts

[2603.28594] Detection of Adversarial Attacks in Robotic Perception

All Content

[2602.13329] HiST-VLA: A Hierarchical Spatio-Temporal Vision-Language-Action Model for End-to-End Autonomous Driving

[2602.13324] Synthesizing the Kill Chain: A Zero-Shot Framework for Target Verification and Tactical Reasoning on the Edge

[2602.13315] IDPruner: Harmonizing Importance and Diversity in Visual Token Pruning for MLLMs

[2602.13314] Sim2Radar: Toward Bridging the Radar Sim-to-Real Gap with VLM-Guided Scene Reconstruction

[2602.13313] Agentic Spatio-Temporal Grounding via Collaborative Reasoning

[2602.13310] Visual Para-Thinker: Divide-and-Conquer Reasoning for Visual Comprehension

[2602.13308] Learning to Select Like Humans: Explainable Active Learning for Medical Imaging

[2602.13306] Fine-Tuning a Large Vision-Language Model for Artwork's Scoring and Critique

[2602.13305] WildfireVLM: AI-powered Analysis for Early Wildfire Detection and Risk Assessment Using Satellite Imagery

[2602.13304] Progressive Contrast Registration for High-Fidelity Bidirectional Photoacoustic Microscopy Alignment

[2602.13303] Spectral Collapse in Diffusion Inversion

[2602.13299] KidMesh: Computational Mesh Reconstruction for Pediatric Congenital Hydronephrosis Using Deep Neural Networks

[2602.13298] Effect of Convolutional Depth on Image Recognition Performance: VGG vs. ResNet vs. GoogLeNet

[2602.13294] VisPhyWorld: Probing Physical Reasoning via Code-Driven Video Reconstruction

[2602.13289] Evaluating the Impact of Post-Training Quantization on Reliable VQA with Multimodal LLMs

[2602.13286] Explanatory Interactive Machine Learning for Bias Mitigation in Visual Gender Classification

[2602.14318] In Transformer We Trust? A Perspective on Transformer Architecture Failure Modes

[2602.14078] Policy Gradient with Adaptive Entropy Annealing for Continual Fine-Tuning

[2602.14225] Text Before Vision: Staged Knowledge Injection Matters for Agentic RLVR in Ultra-High-Resolution Remote Sensing Understanding

[2602.13710] HBVLA: Pushing 1-Bit Post-Training Quantization for Vision-Language-Action Models

Related Topics

Stay updated with AI News