Computer Vision

Image recognition, detection, and visual AI

This Week's Best | Monthly Best | Guide | Trending

RSS

Top This Week

Machine Learning

[2511.09675] PriVi: Towards A General-Purpose Video Model For Primate Behavior In The Wild

Abstract page for arXiv paper 2511.09675: PriVi: Towards A General-Purpose Video Model For Primate Behavior In The Wild

arXiv - Machine Learning · 4 min · 2 minutes ago

Machine Learning

[2509.15219] Out-of-Sight Embodied Agents: Multimodal Tracking, Sensor Fusion, and Trajectory Forecasting

Abstract page for arXiv paper 2509.15219: Out-of-Sight Embodied Agents: Multimodal Tracking, Sensor Fusion, and Trajectory Forecasting

arXiv - Machine Learning · 4 min · 2 minutes ago

Machine Learning

[2603.26657] Tunable Soft Equivariance with Guarantees

Abstract page for arXiv paper 2603.26657: Tunable Soft Equivariance with Guarantees

arXiv - Machine Learning · 3 min · 2 minutes ago

All Content

Llms

[2602.20878] Diagnosing Causal Reasoning in Vision-Language Models via Structured Relevance Graphs

This article introduces Vision-Language Causal Graphs (VLCGs) to enhance causal reasoning in Vision-Language Models (LVLMs), addressing t...

arXiv - AI · 3 min · about 1 month ago

Machine Learning

[2602.20739] PyVision-RL: Forging Open Agentic Vision Models via RL

The paper introduces PyVision-RL, a reinforcement learning framework designed to enhance agentic multimodal models by preventing interact...

arXiv - AI · 3 min · about 1 month ago

Llms

[2602.20659] Recursive Belief Vision Language Model

The Recursive Belief Vision Language Model (RB-VLA) addresses limitations in current vision-language-action models by introducing a belie...

arXiv - AI · 4 min · about 1 month ago

Computer Vision

A retinal reboot for amblyopia | MIT Technology Review

A new study reveals that anesthetizing the retina of a 'lazy' eye for two days can restore vision in mice, offering hope for treating amb...

MIT Technology Review - AI · 3 min · about 1 month ago

Machine Learning

How the rail sector is adapting to an AI-enabled future

The rail sector is embracing AI to enhance data processing and operational efficiency, with initiatives like Great British Railways lever...

AI News - General · 14 min · about 1 month ago

Nlp

Anthropic Slams China for AI Theft, But Critics Say the Outrage Is Hypocritical

Anthropic accuses Chinese developers of stealing AI secrets from its Claude chatbot, sparking criticism over its own data scraping practi...

AI Tools & Products · 7 min · about 1 month ago

Machine Learning

[2602.08550] GOT-Edit: Geometry-Aware Generic Object Tracking via Online Model Editing

GOT-Edit introduces a novel approach to generic object tracking by integrating geometry-aware cues through online model editing, enhancin...

arXiv - Machine Learning · 4 min · about 1 month ago

Generative Ai

[2601.16210] PyraTok: Language-Aligned Pyramidal Tokenizer for Video Understanding and Generation

The paper introduces PyraTok, a language-aligned pyramidal tokenizer designed to enhance video understanding and generation by improving ...

arXiv - AI · 3 min · about 1 month ago

Llms

[2512.02700] VLM-Pruner: Buffering for Spatial Sparsity in an Efficient VLM Centrifugal Token Pruning Paradigm

The paper presents VLM-Pruner, a novel token pruning algorithm designed to enhance the efficiency of vision-language models (VLMs) by bal...

arXiv - Machine Learning · 4 min · about 1 month ago

Llms

[2512.13742] DL$^3$M: A Vision-to-Language Framework for Expert-Level Medical Reasoning through Deep Learning and Large Language Models

The DL$^3$M framework integrates deep learning and large language models to enhance medical reasoning from images, addressing limitations...

arXiv - AI · 4 min · about 1 month ago

Machine Learning

[2511.07399] StreamDiffusionV2: A Streaming System for Dynamic and Interactive Video Generation

StreamDiffusionV2 presents a novel system for dynamic and interactive video generation, enhancing live streaming capabilities through opt...

arXiv - Machine Learning · 4 min · about 1 month ago

Machine Learning

[2511.06450] Countering Multi-modal Representation Collapse through Rank-targeted Fusion

This paper presents a novel framework, Rank-enhancing Token Fuser, to address multi-modal representation collapse in machine learning, en...

arXiv - Machine Learning · 4 min · about 1 month ago

Machine Learning

[2511.16175] Mantis: A Versatile Vision-Language-Action Model with Disentangled Visual Foresight

The paper introduces Mantis, a Vision-Language-Action model that enhances visual foresight through a novel framework, achieving superior ...

arXiv - AI · 4 min · about 1 month ago

Machine Learning

[2511.02860] AI-driven Large-scale Electron Microscopy enables Whole-tissue Subcellular Digitization

The article presents DeepOrganelle, a deep learning tool that enhances large-scale electron microscopy for mapping organelle distribution...

arXiv - AI · 3 min · about 1 month ago

Machine Learning

[2510.06820] Efficient Discriminative Joint Encoders for Large Scale Vision-Language Reranking

The paper presents EDJE, an Efficient Discriminative Joint Encoder designed to enhance vision-language reranking by precomputing visual t...

arXiv - Machine Learning · 3 min · about 1 month ago

Machine Learning

[2509.26287] Flower: A Flow-Matching Solver for Inverse Problems

The paper introduces Flower, a novel solver for linear inverse problems that utilizes a pre-trained flow model to enhance reconstruction ...

arXiv - Machine Learning · 3 min · about 1 month ago

Llms

[2510.14979] From Pixels to Words -- Towards Native Vision-Language Primitives at Scale

The paper discusses the development of native Vision-Language Models (VLMs) that integrate vision and language capabilities more effectiv...

arXiv - AI · 4 min · about 1 month ago

Llms

[2510.02240] RewardMap: Tackling Sparse Rewards in Fine-grained Visual Reasoning via Multi-Stage Reinforcement Learning

The paper presents RewardMap, a multi-stage reinforcement learning framework aimed at improving fine-grained visual reasoning in multimod...

arXiv - AI · 4 min · about 1 month ago

Llms

[2505.17779] U2-BENCH: Benchmarking Large Vision-Language Models on Ultrasound Understanding

The paper introduces U2-BENCH, a benchmark for evaluating large vision-language models (LVLMs) on ultrasound understanding, addressing ch...

arXiv - Machine Learning · 4 min · about 1 month ago

Machine Learning

[2509.24526] CMT: Mid-Training for Efficient Learning of Consistency, Mean Flow, and Flow Map Models

The paper introduces Consistency Mid-Training (CMT), a novel method for enhancing the efficiency of training flow map models, achieving s...

arXiv - Machine Learning · 4 min · about 1 month ago

Previous Page 19 Next

Stay updated with AI News

Get the latest news, tools, and insights delivered to your inbox.

Subscribe to Newsletter

Daily or weekly digest • Unsubscribe anytime

Computer Vision

Top This Week

[2511.09675] PriVi: Towards A General-Purpose Video Model For Primate Behavior In The Wild

[2509.15219] Out-of-Sight Embodied Agents: Multimodal Tracking, Sensor Fusion, and Trajectory Forecasting

[2603.26657] Tunable Soft Equivariance with Guarantees

All Content

[2602.20878] Diagnosing Causal Reasoning in Vision-Language Models via Structured Relevance Graphs

[2602.20739] PyVision-RL: Forging Open Agentic Vision Models via RL

[2602.20659] Recursive Belief Vision Language Model

A retinal reboot for amblyopia | MIT Technology Review

How the rail sector is adapting to an AI-enabled future

Anthropic Slams China for AI Theft, But Critics Say the Outrage Is Hypocritical

[2602.08550] GOT-Edit: Geometry-Aware Generic Object Tracking via Online Model Editing

[2601.16210] PyraTok: Language-Aligned Pyramidal Tokenizer for Video Understanding and Generation

[2512.02700] VLM-Pruner: Buffering for Spatial Sparsity in an Efficient VLM Centrifugal Token Pruning Paradigm

[2512.13742] DL$^3$M: A Vision-to-Language Framework for Expert-Level Medical Reasoning through Deep Learning and Large Language Models

[2511.07399] StreamDiffusionV2: A Streaming System for Dynamic and Interactive Video Generation

[2511.06450] Countering Multi-modal Representation Collapse through Rank-targeted Fusion

[2511.16175] Mantis: A Versatile Vision-Language-Action Model with Disentangled Visual Foresight

[2511.02860] AI-driven Large-scale Electron Microscopy enables Whole-tissue Subcellular Digitization

[2510.06820] Efficient Discriminative Joint Encoders for Large Scale Vision-Language Reranking

[2509.26287] Flower: A Flow-Matching Solver for Inverse Problems

[2510.14979] From Pixels to Words -- Towards Native Vision-Language Primitives at Scale

[2510.02240] RewardMap: Tackling Sparse Rewards in Fine-grained Visual Reasoning via Multi-Stage Reinforcement Learning

[2505.17779] U2-BENCH: Benchmarking Large Vision-Language Models on Ultrasound Understanding

[2509.24526] CMT: Mid-Training for Efficient Learning of Consistency, Mean Flow, and Flow Map Models

Related Topics

Stay updated with AI News