Computer Vision

Image recognition, detection, and visual AI

This Week's Best | Monthly Best | Guide | Trending

Top This Week

Machine Learning

[2506.22504] Patch2Loc: Learning to Localize Patches for Unsupervised Brain Lesion Detection

Abstract page for arXiv paper 2506.22504: Patch2Loc: Learning to Localize Patches for Unsupervised Brain Lesion Detection

arXiv - Machine Learning · 4 min · 3 days ago

Machine Learning

[2508.00307] Acoustic Imaging for Low-SNR UAV Detection: Dense Beamformed Energy Maps and U-Net SELD

Abstract page for arXiv paper 2508.00307: Acoustic Imaging for Low-SNR UAV Detection: Dense Beamformed Energy Maps and U-Net SELD

arXiv - AI · 4 min · 3 days ago

Computer Vision

[2603.25524] CHIRP dataset: towards long-term, individual-level, behavioral monitoring of bird populations in the wild

Abstract page for arXiv paper 2603.25524: CHIRP dataset: towards long-term, individual-level, behavioral monitoring of bird populations i...

arXiv - AI · 4 min · 3 days ago

All Content

Machine Learning

[2602.20731] Communication-Inspired Tokenization for Structured Image Representations

The paper presents COMmunication inspired Tokenization (COMiT), a novel framework for structured image representations that enhances obje...

arXiv - Machine Learning · 4 min · about 1 month ago

Machine Learning

[2602.20709] Onboard-Targeted Segmentation of Straylight in Space Camera Sensors

This paper presents an AI-driven methodology for segmenting straylight effects in space camera sensors, enhancing image analysis in resou...

arXiv - AI · 3 min · about 1 month ago

Llms

[2602.20658] Vision-Language Models for Ergonomic Assessment of Manual Lifting Tasks: Estimating Horizontal and Vertical Hand Distances from RGB Video

This article explores the use of vision-language models (VLMs) for non-invasive ergonomic assessment of manual lifting tasks, estimating ...

arXiv - Machine Learning · 4 min · about 1 month ago

Machine Learning

[2602.20650] Dataset Color Quantization: A Training-Oriented Framework for Dataset-Level Compression

The paper presents Dataset Color Quantization (DCQ), a framework designed to compress large-scale image datasets by reducing color-space ...

arXiv - AI · 3 min · about 1 month ago

Machine Learning

[2602.20636] SurgAtt-Tracker: Online Surgical Attention Tracking via Temporal Proposal Reranking and Motion-Aware Refinement

The paper presents SurgAtt-Tracker, a novel framework for online surgical attention tracking that enhances minimally invasive surgery thr...

arXiv - AI · 4 min · about 1 month ago

Ai Safety

[2602.20541] Maximin Share Guarantees via Limited Cost-Sensitive Sharing

This paper explores fair allocation of indivisible goods through limited cost-sensitive sharing, demonstrating how controlled sharing can...

arXiv - AI · 4 min · about 1 month ago

Llms

[2602.20520] How Do Inpainting Artifacts Propagate to Language?

This paper investigates how visual artifacts from diffusion-based inpainting affect language generation in vision-language models, reveal...

arXiv - AI · 3 min · about 1 month ago

Machine Learning

[2602.20497] LESA: Learnable Stage-Aware Predictors for Diffusion Model Acceleration

The paper introduces LESA, a framework for accelerating diffusion models using learnable stage-aware predictors, achieving significant sp...

arXiv - AI · 4 min · about 1 month ago

Llms

[2602.20330] Circuit Tracing in Vision-Language Models: Understanding the Internal Mechanisms of Multimodal Thinking

This article presents a framework for circuit tracing in vision-language models (VLMs), aiming to enhance understanding of their internal...

arXiv - Machine Learning · 3 min · about 1 month ago

Llms

[2602.20219] An Approach to Combining Video and Speech with Large Language Models in Human-Robot Interaction

This article presents a novel multimodal framework for human-robot interaction that integrates video and speech processing with large lan...

arXiv - AI · 3 min · about 1 month ago

Llms

[2602.20206] Mitigating "Epistemic Debt" in Generative AI-Scaffolded Novice Programming using Metacognitive Scripts

This paper explores the concept of 'Epistemic Debt' in novice programming using generative AI, proposing metacognitive scripts to enhance...

arXiv - AI · 4 min · about 1 month ago

Machine Learning

[2602.20200] Global Prior Meets Local Consistency: Dual-Memory Augmented Vision-Language-Action Model for Efficient Robotic Manipulation

The paper presents OptimusVLA, a dual-memory framework for robotic manipulation that enhances efficiency and robustness in action generat...

arXiv - AI · 4 min · about 1 month ago

Machine Learning

[2602.20187] AINet: Anchor Instances Learning for Regional Heterogeneity in Whole Slide Image

The paper introduces AINet, a novel framework for whole slide image analysis that addresses regional heterogeneity through anchor instanc...

arXiv - AI · 4 min · about 1 month ago

Machine Learning

[2310.15741] Interpretable Medical Image Classification using Prototype Learning and Privileged Information

This article presents a novel approach to medical image classification using prototype learning and privileged information, enhancing int...

arXiv - AI · 3 min · about 1 month ago

Machine Learning

[2602.21172] NoRD: A Data-Efficient Vision-Language-Action Model that Drives without Reasoning

The paper presents NoRD, a data-efficient Vision-Language-Action model that enhances autonomous driving without requiring extensive datas...

arXiv - AI · 3 min · about 1 month ago

Llms

[2602.20878] Diagnosing Causal Reasoning in Vision-Language Models via Structured Relevance Graphs

This article introduces Vision-Language Causal Graphs (VLCGs) to enhance causal reasoning in Vision-Language Models (LVLMs), addressing t...

arXiv - AI · 3 min · about 1 month ago

Machine Learning

[2602.20739] PyVision-RL: Forging Open Agentic Vision Models via RL

The paper introduces PyVision-RL, a reinforcement learning framework designed to enhance agentic multimodal models by preventing interact...

arXiv - AI · 3 min · about 1 month ago

Llms

[2602.20659] Recursive Belief Vision Language Model

The Recursive Belief Vision Language Model (RB-VLA) addresses limitations in current vision-language-action models by introducing a belie...

arXiv - AI · 4 min · about 1 month ago

Computer Vision

A retinal reboot for amblyopia | MIT Technology Review

A new study reveals that anesthetizing the retina of a 'lazy' eye for two days can restore vision in mice, offering hope for treating amb...

MIT Technology Review - AI · 3 min · about 1 month ago

Machine Learning

How the rail sector is adapting to an AI-enabled future

The rail sector is embracing AI to enhance data processing and operational efficiency, with initiatives like Great British Railways lever...

AI News - General · 14 min · about 1 month ago

Previous Page 18 Next

Stay updated with AI News

Get the latest news, tools, and insights delivered to your inbox.

Subscribe to Newsletter

Daily or weekly digest • Unsubscribe anytime

Computer Vision

Top This Week

[2506.22504] Patch2Loc: Learning to Localize Patches for Unsupervised Brain Lesion Detection

[2508.00307] Acoustic Imaging for Low-SNR UAV Detection: Dense Beamformed Energy Maps and U-Net SELD

[2603.25524] CHIRP dataset: towards long-term, individual-level, behavioral monitoring of bird populations in the wild

All Content

[2602.20731] Communication-Inspired Tokenization for Structured Image Representations

[2602.20709] Onboard-Targeted Segmentation of Straylight in Space Camera Sensors

[2602.20658] Vision-Language Models for Ergonomic Assessment of Manual Lifting Tasks: Estimating Horizontal and Vertical Hand Distances from RGB Video

[2602.20650] Dataset Color Quantization: A Training-Oriented Framework for Dataset-Level Compression

[2602.20636] SurgAtt-Tracker: Online Surgical Attention Tracking via Temporal Proposal Reranking and Motion-Aware Refinement

[2602.20541] Maximin Share Guarantees via Limited Cost-Sensitive Sharing

[2602.20520] How Do Inpainting Artifacts Propagate to Language?

[2602.20497] LESA: Learnable Stage-Aware Predictors for Diffusion Model Acceleration

[2602.20330] Circuit Tracing in Vision-Language Models: Understanding the Internal Mechanisms of Multimodal Thinking

[2602.20219] An Approach to Combining Video and Speech with Large Language Models in Human-Robot Interaction

[2602.20206] Mitigating "Epistemic Debt" in Generative AI-Scaffolded Novice Programming using Metacognitive Scripts

[2602.20200] Global Prior Meets Local Consistency: Dual-Memory Augmented Vision-Language-Action Model for Efficient Robotic Manipulation

[2602.20187] AINet: Anchor Instances Learning for Regional Heterogeneity in Whole Slide Image

[2310.15741] Interpretable Medical Image Classification using Prototype Learning and Privileged Information

[2602.21172] NoRD: A Data-Efficient Vision-Language-Action Model that Drives without Reasoning

[2602.20878] Diagnosing Causal Reasoning in Vision-Language Models via Structured Relevance Graphs

[2602.20739] PyVision-RL: Forging Open Agentic Vision Models via RL

[2602.20659] Recursive Belief Vision Language Model

A retinal reboot for amblyopia | MIT Technology Review

How the rail sector is adapting to an AI-enabled future

Related Topics

Stay updated with AI News