Computer Vision

Image recognition, detection, and visual AI

Top This Week

[2506.22504] Patch2Loc: Learning to Localize Patches for Unsupervised Brain Lesion Detection
Machine Learning

[2506.22504] Patch2Loc: Learning to Localize Patches for Unsupervised Brain Lesion Detection

Abstract page for arXiv paper 2506.22504: Patch2Loc: Learning to Localize Patches for Unsupervised Brain Lesion Detection

arXiv - Machine Learning · 4 min ·
[2508.00307] Acoustic Imaging for Low-SNR UAV Detection: Dense Beamformed Energy Maps and U-Net SELD
Machine Learning

[2508.00307] Acoustic Imaging for Low-SNR UAV Detection: Dense Beamformed Energy Maps and U-Net SELD

Abstract page for arXiv paper 2508.00307: Acoustic Imaging for Low-SNR UAV Detection: Dense Beamformed Energy Maps and U-Net SELD

arXiv - AI · 4 min ·
[2603.25524] CHIRP dataset: towards long-term, individual-level, behavioral monitoring of bird populations in the wild
Computer Vision

[2603.25524] CHIRP dataset: towards long-term, individual-level, behavioral monitoring of bird populations in the wild

Abstract page for arXiv paper 2603.25524: CHIRP dataset: towards long-term, individual-level, behavioral monitoring of bird populations i...

arXiv - AI · 4 min ·

All Content

[2602.23509] SegReg: Latent Space Regularization for Improved Medical Image Segmentation
Machine Learning

[2602.23509] SegReg: Latent Space Regularization for Improved Medical Image Segmentation

Abstract page for arXiv paper 2602.23509: SegReg: Latent Space Regularization for Improved Medical Image Segmentation

arXiv - AI · 3 min ·
[2602.23372] Democratizing GraphRAG: Linear, CPU-Only Graph Retrieval for Multi-Hop QA
Llms

[2602.23372] Democratizing GraphRAG: Linear, CPU-Only Graph Retrieval for Multi-Hop QA

Abstract page for arXiv paper 2602.23372: Democratizing GraphRAG: Linear, CPU-Only Graph Retrieval for Multi-Hop QA

arXiv - AI · 3 min ·
[2602.23370] Toward General Semantic Chunking: A Discriminative Framework for Ultra-Long Documents
Llms

[2602.23370] Toward General Semantic Chunking: A Discriminative Framework for Ultra-Long Documents

Abstract page for arXiv paper 2602.23370: Toward General Semantic Chunking: A Discriminative Framework for Ultra-Long Documents

arXiv - AI · 4 min ·
Machine Learning

[D] got tired of "just vibes" testing for edge ML models, so I built automated quality gates

so about 6 months ago I was messing around with a vision model on a Snapdragon device as a side project. worked great on my laptop. deplo...

Reddit - Machine Learning · 1 min ·
Machine Learning

[R] CVPR'26 SPAR-3D Workshop Call For Papers

The SPAR-3D workshop at CVPR'26 invites submissions on 3D vision models, focusing on security, privacy, and robustness, with a deadline e...

Reddit - Machine Learning · 1 min ·
Machine Learning

[D] Works on flow matching where source distribution comes from dataset instead of Gaussian noise?

This Reddit discussion explores the feasibility of flow matching in image generation, questioning whether source distributions can extend...

Reddit - Machine Learning · 1 min ·
Llms

[R] Prompt to review manuscript for ML/CV conferences

The article discusses the author's interest in utilizing LLMs to review their manuscript for ML/CV conferences, highlighting concerns abo...

Reddit - Machine Learning · 1 min ·
Machine Learning

[D] MICCAI 2026 Submission guidelines

The MICCAI 2026 submission guidelines emphasize the importance of originality in submissions, stating that works must not be published or...

Reddit - Machine Learning · 1 min ·
Ai Agents

A new wearable AI system watches your hands through smart glasses, guiding experiments and stopping mistakes before they happen

A new AI wearable system utilizes smart glasses to monitor hand movements, enhancing experimental accuracy and preventing errors in real-...

Reddit - Artificial Intelligence · 1 min ·
Llms

[D] Edge AI Projects on Jetson Orin – Ideas?

A Reddit user seeks innovative project ideas for deploying AI on NVIDIA Jetson Orin devices, leveraging their experience in machine learn...

Reddit - Machine Learning · 1 min ·
Samsung’s Galaxy S26 AI camera features are a photography nightmare | The Verge
Ai Agents

Samsung’s Galaxy S26 AI camera features are a photography nightmare | The Verge

The Vergecast discusses Samsung's Galaxy S26 AI camera features, arguing they redefine photography and raise concerns about the essence o...

The Verge - AI · 5 min ·
[2511.05898] Q$^2$: Quantization-Aware Gradient Balancing and Attention Alignment for Low-Bit Quantization
Machine Learning

[2511.05898] Q$^2$: Quantization-Aware Gradient Balancing and Attention Alignment for Low-Bit Quantization

The paper presents Q$^2$, a novel framework addressing gradient imbalance in low-bit quantization for complex visual tasks, enhancing per...

arXiv - AI · 4 min ·
[2512.01292] Diffusion Model in Latent Space for Medical Image Segmentation Task
Machine Learning

[2512.01292] Diffusion Model in Latent Space for Medical Image Segmentation Task

This article presents MedSegLatDiff, a novel diffusion model for efficient medical image segmentation that enhances interpretability by g...

arXiv - AI · 4 min ·
[2510.19060] PoSh: Using Scene Graphs To Guide LLMs-as-a-Judge For Detailed Image Descriptions
Llms

[2510.19060] PoSh: Using Scene Graphs To Guide LLMs-as-a-Judge For Detailed Image Descriptions

The paper introduces PoSh, a new metric using scene graphs to enhance the evaluation of detailed image descriptions by LLMs, outperformin...

arXiv - AI · 4 min ·
[2602.02334] VQ-Style: Disentangling Style and Content in Motion with Residual Quantized Representations
Machine Learning

[2602.02334] VQ-Style: Disentangling Style and Content in Motion with Residual Quantized Representations

The paper presents VQ-Style, a method for disentangling style and content in human motion data using Residual Vector Quantized Variationa...

arXiv - Machine Learning · 4 min ·
[2601.23276] Denoising the Deep Sky: Physics-Based CCD Noise Formation for Astronomical Imaging
Machine Learning

[2601.23276] Denoising the Deep Sky: Physics-Based CCD Noise Formation for Astronomical Imaging

This article presents a physics-based framework for synthesizing CCD noise in astronomical imaging, addressing noise limitations in curre...

arXiv - Machine Learning · 4 min ·
[2508.20570] Dyslexify: A Mechanistic Defense Against Typographic Attacks in CLIP
Llms

[2508.20570] Dyslexify: A Mechanistic Defense Against Typographic Attacks in CLIP

The paper presents Dyslexify, a novel defense mechanism against typographic attacks in CLIP models, enhancing robustness without finetuni...

arXiv - AI · 4 min ·
[2507.12784] A Semi-Supervised Learning Method for the Identification of Bad Exposures in Large Imaging Surveys
Machine Learning

[2507.12784] A Semi-Supervised Learning Method for the Identification of Bad Exposures in Large Imaging Surveys

This article presents a semi-supervised learning method to identify poor-quality exposures in large astronomical imaging surveys, enhanci...

arXiv - AI · 4 min ·
[2506.01392] Sparse Imagination for Efficient Visual World Model Planning
Machine Learning

[2506.01392] Sparse Imagination for Efficient Visual World Model Planning

The paper presents a novel approach called Sparse Imagination for enhancing visual world model planning in robotics, improving computatio...

arXiv - AI · 3 min ·
[2510.01031] Secure and reversible face anonymization with diffusion models
Machine Learning

[2510.01031] Secure and reversible face anonymization with diffusion models

This paper presents a novel framework for secure and reversible face anonymization using diffusion models, addressing challenges in image...

arXiv - Machine Learning · 4 min ·
Previous Page 9 Next

Related Topics

Stay updated with AI News

Get the latest news, tools, and insights delivered to your inbox.

Daily or weekly digest • Unsubscribe anytime