Computer Vision

Image recognition, detection, and visual AI

Top This Week

[2506.22504] Patch2Loc: Learning to Localize Patches for Unsupervised Brain Lesion Detection
Machine Learning

[2506.22504] Patch2Loc: Learning to Localize Patches for Unsupervised Brain Lesion Detection

Abstract page for arXiv paper 2506.22504: Patch2Loc: Learning to Localize Patches for Unsupervised Brain Lesion Detection

arXiv - Machine Learning · 4 min ·
[2508.00307] Acoustic Imaging for Low-SNR UAV Detection: Dense Beamformed Energy Maps and U-Net SELD
Machine Learning

[2508.00307] Acoustic Imaging for Low-SNR UAV Detection: Dense Beamformed Energy Maps and U-Net SELD

Abstract page for arXiv paper 2508.00307: Acoustic Imaging for Low-SNR UAV Detection: Dense Beamformed Energy Maps and U-Net SELD

arXiv - AI · 4 min ·
[2603.25524] CHIRP dataset: towards long-term, individual-level, behavioral monitoring of bird populations in the wild
Computer Vision

[2603.25524] CHIRP dataset: towards long-term, individual-level, behavioral monitoring of bird populations in the wild

Abstract page for arXiv paper 2603.25524: CHIRP dataset: towards long-term, individual-level, behavioral monitoring of bird populations i...

arXiv - AI · 4 min ·

All Content

Ai Startups

Niantic: Bringing spatial intelligence to the industrial edge

Niantic is leveraging spatial intelligence to enhance industrial applications, focusing on integrating augmented reality with real-world ...

Reddit - Artificial Intelligence · 1 min ·
Computer Vision

I geolocated a blurry pic from the Paris protests down to the exact coordinates using AI

The article discusses the author's successful use of an AI geolocation tool to pinpoint the exact coordinates of a blurry image from the ...

Reddit - Artificial Intelligence · 1 min ·
[2509.11791] Synthetic vs. Real Training Data for Visual Navigation
Machine Learning

[2509.11791] Synthetic vs. Real Training Data for Visual Navigation

This paper examines the effectiveness of visual navigation policies trained in simulation versus those trained with real-world data, high...

arXiv - Machine Learning · 4 min ·
[2509.07477] MedicalPatchNet: A Patch-Based Self-Explainable AI Architecture for Chest X-ray Classification
Machine Learning

[2509.07477] MedicalPatchNet: A Patch-Based Self-Explainable AI Architecture for Chest X-ray Classification

MedicalPatchNet introduces a self-explainable AI architecture for chest X-ray classification, enhancing interpretability while maintainin...

arXiv - Machine Learning · 4 min ·
[2601.02439] WebGym: Scaling Training Environments for Visual Web Agents with Realistic Tasks
Machine Learning

[2601.02439] WebGym: Scaling Training Environments for Visual Web Agents with Realistic Tasks

WebGym is an innovative open-source environment designed for training visual web agents, featuring nearly 300,000 tasks and a high-throug...

arXiv - Machine Learning · 4 min ·
[2511.15487] NTK-Guided Implicit Neural Teaching
Machine Learning

[2511.15487] NTK-Guided Implicit Neural Teaching

The paper presents NTK-Guided Implicit Neural Teaching (NINT), a method that accelerates training of Implicit Neural Representations (INR...

arXiv - Machine Learning · 3 min ·
[2501.16443] Object-Centric World Models from Few-Shot Annotations for Sample-Efficient Reinforcement Learning
Machine Learning

[2501.16443] Object-Centric World Models from Few-Shot Annotations for Sample-Efficient Reinforcement Learning

The paper presents OC-STORM, an object-centric model-based reinforcement learning framework that enhances sample efficiency by leveraging...

arXiv - Machine Learning · 4 min ·
[2602.21873] GFPL: Generative Federated Prototype Learning for Resource-Constrained and Data-Imbalanced Vision Task
Machine Learning

[2602.21873] GFPL: Generative Federated Prototype Learning for Resource-Constrained and Data-Imbalanced Vision Task

The GFPL framework enhances federated learning by addressing data imbalance and communication overhead in resource-constrained vision tas...

arXiv - Machine Learning · 4 min ·
[2602.21707] Learning spatially adaptive sparsity level maps for arbitrary convolutional dictionaries
Machine Learning

[2602.21707] Learning spatially adaptive sparsity level maps for arbitrary convolutional dictionaries

This paper presents a novel approach to image reconstruction using spatially adaptive sparsity level maps within convolutional dictionari...

arXiv - Machine Learning · 4 min ·
[2602.21703] Brain Tumor Segmentation with Special Emphasis on the Non-Enhancing Brain Tumor Compartment
Machine Learning

[2602.21703] Brain Tumor Segmentation with Special Emphasis on the Non-Enhancing Brain Tumor Compartment

This article presents a U-Net based deep learning architecture for segmenting brain tumors in MRI scans, focusing on the often-overlooked...

arXiv - Machine Learning · 3 min ·
[2602.21428] PSF-Med: Measuring and Explaining Paraphrase Sensitivity in Medical Vision Language Models
Llms

[2602.21428] PSF-Med: Measuring and Explaining Paraphrase Sensitivity in Medical Vision Language Models

The paper introduces PSF-Med, a benchmark assessing paraphrase sensitivity in medical vision language models, revealing significant varia...

arXiv - Machine Learning · 4 min ·
[2602.21397] MMLoP: Multi-Modal Low-Rank Prompting for Efficient Vision-Language Adaptation
Llms

[2602.21397] MMLoP: Multi-Modal Low-Rank Prompting for Efficient Vision-Language Adaptation

The paper presents MMLoP, a framework for efficient vision-language adaptation using low-rank prompting, achieving high accuracy with sig...

arXiv - Machine Learning · 4 min ·
[2602.21961] Robustness in sparse artificial neural networks trained with adaptive topology
Machine Learning

[2602.21961] Robustness in sparse artificial neural networks trained with adaptive topology

This paper explores the robustness of sparse artificial neural networks with adaptive topology, demonstrating their competitive performan...

arXiv - Machine Learning · 3 min ·
[2602.21601] Deep Clustering based Boundary-Decoder Net for Inter and Intra Layer Stress Prediction of Heterogeneous Integrated IC Chip
Machine Learning

[2602.21601] Deep Clustering based Boundary-Decoder Net for Inter and Intra Layer Stress Prediction of Heterogeneous Integrated IC Chip

This article presents a novel approach using a Deep Clustering based Boundary-Decoder Net for predicting inter and intra-layer stress in ...

arXiv - Machine Learning · 4 min ·
[2602.21593] Breaking Semantic-Aware Watermarks via LLM-Guided Coherence-Preserving Semantic Injection
Llms

[2602.21593] Breaking Semantic-Aware Watermarks via LLM-Guided Coherence-Preserving Semantic Injection

The paper introduces a novel attack method, Coherence-Preserving Semantic Injection (CSI), that exploits vulnerabilities in semantic-awar...

arXiv - Machine Learning · 4 min ·
[2602.10359] Beyond Calibration: Confounding Pathology Limits Foundation Model Specificity in Abdominal Trauma CT
Llms

[2602.10359] Beyond Calibration: Confounding Pathology Limits Foundation Model Specificity in Abdominal Trauma CT

This study evaluates the performance of foundation models in detecting abdominal trauma, revealing that specificity deficits are influenc...

arXiv - AI · 4 min ·
[2602.09929] Monocular Normal Estimation via Shading Sequence Estimation
Machine Learning

[2602.09929] Monocular Normal Estimation via Shading Sequence Estimation

This paper presents a novel approach to monocular normal estimation by reformulating the problem as shading sequence estimation, enhancin...

arXiv - AI · 4 min ·
[2602.00462] LatentLens: Revealing Highly Interpretable Visual Tokens in LLMs
Llms

[2602.00462] LatentLens: Revealing Highly Interpretable Visual Tokens in LLMs

The paper introduces LatentLens, a method for mapping visual tokens to natural language descriptions in Vision-Language Models (VLMs), en...

arXiv - AI · 4 min ·
[2601.08026] FigEx2: Visual-Conditioned Panel Detection and Captioning for Scientific Compound Figures
Computer Vision

[2601.08026] FigEx2: Visual-Conditioned Panel Detection and Captioning for Scientific Compound Figures

The paper presents FigEx2, a framework for detecting and captioning panels in scientific compound figures, enhancing understanding and ac...

arXiv - AI · 4 min ·
[2512.08639] Aerial Vision-Language Navigation with a Unified Framework for Spatial, Temporal and Embodied Reasoning
Nlp

[2512.08639] Aerial Vision-Language Navigation with a Unified Framework for Spatial, Temporal and Embodied Reasoning

This article presents a unified framework for Aerial Vision-Language Navigation (VLN), enabling UAVs to interpret natural language and na...

arXiv - AI · 4 min ·
Previous Page 13 Next

Related Topics

Stay updated with AI News

Get the latest news, tools, and insights delivered to your inbox.

Daily or weekly digest • Unsubscribe anytime