Top Computer Vision This Month
The most engaging computer vision content from this month, curated by AI News.
-
1
[D] Edge AI Projects on Jetson Orin – Ideas?
A Reddit user seeks innovative project ideas for deploying AI on NVIDIA Jetson Orin devices, leveraging their experience in machine learning and real-time systems.
Reddit - Machine Learning · 28 days ago -
2
A new wearable AI system watches your hands through smart glasses, guiding experiments and stopping mistakes before they happen
A new AI wearable system utilizes smart glasses to monitor hand movements, enhancing experimental accuracy and preventing errors in real-time.
Reddit - Artificial Intelligence · 28 days ago -
3
[2602.22381] Enhancing Renal Tumor Malignancy Prediction: Deep Learning with Automatic 3D CT Organ Focused Attention
This article presents a novel deep learning framework for predicting malignancy in renal tumors using 3D CT images, eliminating the need for manual segmentation and improving predictive accuracy.
arXiv - AI · 28 days ago -
4
[2602.22570] Guidance Matters: Rethinking the Evaluation Pitfall for Text-to-Image Generation
The paper discusses the evaluation challenges in text-to-image generation, focusing on classifier-free guidance (CFG) and proposing a new evaluation framework to address biases in current methods.
arXiv - AI · 28 days ago -
5
[2602.22678] ViCLIP-OT: The First Foundation Vision-Language Model for Vietnamese Image-Text Retrieval with Optimal Transport
ViCLIP-OT introduces a novel vision-language model tailored for Vietnamese image-text retrieval, outperforming existing models in low-resource settings.
arXiv - AI · 28 days ago -
6
[2602.22716] SoPE: Spherical Coordinate-Based Positional Embedding for Enhancing Spatial Perception of 3D LVLMs
The paper presents SoPE, a novel Spherical Coordinate-Based Positional Embedding method aimed at improving the spatial perception capabilities of 3D Large Vision-Language Models (3D LVLMs) by addre...
arXiv - AI · 28 days ago -
7
[2602.23013] SubspaceAD: Training-Free Few-Shot Anomaly Detection via Subspace Modeling
The paper introduces SubspaceAD, a training-free method for few-shot anomaly detection that utilizes subspace modeling to achieve state-of-the-art results without complex training processes.
arXiv - Machine Learning · 28 days ago -
8
[2602.22740] AMLRIS: Alignment-aware Masked Learning for Referring Image Segmentation
The paper presents AMLRIS, a novel training strategy for Referring Image Segmentation (RIS) that enhances object segmentation through alignment-aware masked learning, achieving state-of-the-art res...
arXiv - AI · 28 days ago -
9
[2602.23192] FairQuant: Fairness-Aware Mixed-Precision Quantization for Medical Image Classification
The paper presents FairQuant, a framework for fairness-aware mixed-precision quantization in medical image classification, optimizing both performance and fairness metrics.
arXiv - Machine Learning · 28 days ago -
10
[2602.23214] Plug-and-Play Diffusion Meets ADMM: Dual-Variable Coupling for Robust Medical Image Reconstruction
This paper presents a novel approach to medical image reconstruction using Dual-Coupled Plug-and-Play Diffusion, addressing limitations in existing methods and achieving state-of-the-art results.
arXiv - Machine Learning · 28 days ago -
11
[2602.22955] MM-NeuroOnco: A Multimodal Benchmark and Instruction Dataset for MRI-Based Brain Tumor Diagnosis
The article presents MM-NeuroOnco, a comprehensive dataset aimed at improving MRI-based brain tumor diagnosis through multimodal instructions and benchmarks.
arXiv - AI · 28 days ago -
12
[2602.23117] Devling into Adversarial Transferability on Image Classification: Review, Benchmark, and Evaluation
This article reviews adversarial transferability in image classification, proposing a standardized framework for evaluating transfer-based attacks and categorizing existing approaches.
arXiv - AI · 28 days ago -
13
[2602.23172] Latent Gaussian Splatting for 4D Panoptic Occupancy Tracking
The paper presents Latent Gaussian Splatting (LaGS) for 4D panoptic occupancy tracking, enhancing robot perception in dynamic environments by integrating multi-view data into a cohesive 3D represen...
arXiv - AI · 28 days ago -
14
[2506.15190] Learning Task-Agnostic Motifs to Capture the Continuous Nature of Animal Behavior
The paper presents a novel framework, Motif-based Continuous Dynamics (MCD), to model animal behavior by identifying continuous motor motifs, enhancing the understanding of behavior dynamics beyond...
arXiv - Machine Learning · 28 days ago -
15
[2602.23203] ColoDiff: Integrating Dynamic Consistency With Content Awareness for Colonoscopy Video Generation
ColoDiff introduces a novel framework for generating colonoscopy videos that ensures dynamic consistency and content awareness, addressing data scarcity in clinical settings.
arXiv - AI · 28 days ago -
16
[2602.23235] Spatio-Temporal Token Pruning for Efficient High-Resolution GUI Agents
The paper presents GUIPruner, a framework for enhancing the efficiency of high-resolution GUI agents by addressing spatiotemporal redundancy through innovative pruning techniques.
arXiv - AI · 28 days ago -
17
[2602.23334] Bitwise Systolic Array Architecture for Runtime-Reconfigurable Multi-precision Quantized Multiplication on Hardware Accelerators
This paper presents a novel bitwise systolic array architecture designed for runtime-reconfigurable multi-precision quantized multiplication, enhancing performance in neural network accelerators.
arXiv - AI · 28 days ago -
18
[2602.23359] SeeThrough3D: Occlusion Aware 3D Control in Text-to-Image Generation
The paper introduces SeeThrough3D, a model for occlusion-aware 3D control in text-to-image generation, enhancing the realism of synthesized scenes with depth-consistent geometry.
arXiv - AI · 28 days ago -
19
[2408.17251] Abstracted Gaussian Prototypes for True One-Shot Concept Learning
This paper presents a novel framework for one-shot learning in computer vision, utilizing Abstracted Gaussian Prototypes to enhance image segmentation and concept learning.
arXiv - AI · 28 days ago -
20
[2412.20816] MomentMix Augmentation with Length-Aware DETR for Temporally Robust Moment Retrieval
The paper presents MomentMix, a novel augmentation technique using Length-Aware DETR to enhance video moment retrieval, particularly for short moments, achieving superior performance on benchmark d...
arXiv - AI · 28 days ago -
21
[2506.06092] LinGuinE: Longitudinal Guidance Estimation for Volumetric Tumour Segmentation
LinGuinE introduces a novel framework for longitudinal volumetric tumor segmentation, enhancing tracking and mask generation across multiple scans without requiring longitudinal training.
arXiv - Machine Learning · 28 days ago -
22
[2508.12691] Adaptive Hybrid Caching for Efficient Text-to-Video Diffusion Model Acceleration
This paper presents MixCache, a novel caching framework designed to enhance the efficiency of text-to-video diffusion models, significantly improving generation speed and quality.
arXiv - Machine Learning · 28 days ago -
23
[2505.02780] Beyond the Monitor: Mixed Reality Visualization and Multimodal AI for Enhanced Digital Pathology Workflow
This article presents PathVis, a mixed-reality platform designed to enhance digital pathology workflows by integrating multimodal AI and immersive visualization techniques.
arXiv - AI · 28 days ago -
24
[2510.01031] Secure and reversible face anonymization with diffusion models
This paper presents a novel framework for secure and reversible face anonymization using diffusion models, addressing challenges in image quality and unauthorized de-anonymization.
arXiv - Machine Learning · 28 days ago -
25
[2507.12784] A Semi-Supervised Learning Method for the Identification of Bad Exposures in Large Imaging Surveys
This article presents a semi-supervised learning method to identify poor-quality exposures in large astronomical imaging surveys, enhancing data quality control.
arXiv - AI · 28 days ago -
26
[2508.20570] Dyslexify: A Mechanistic Defense Against Typographic Attacks in CLIP
The paper presents Dyslexify, a novel defense mechanism against typographic attacks in CLIP models, enhancing robustness without finetuning while maintaining performance.
arXiv - AI · 28 days ago -
27
[2510.19060] PoSh: Using Scene Graphs To Guide LLMs-as-a-Judge For Detailed Image Descriptions
The paper introduces PoSh, a new metric using scene graphs to enhance the evaluation of detailed image descriptions by LLMs, outperforming existing metrics.
arXiv - AI · 28 days ago -
28
[2511.05898] Q$^2$: Quantization-Aware Gradient Balancing and Attention Alignment for Low-Bit Quantization
The paper presents Q$^2$, a novel framework addressing gradient imbalance in low-bit quantization for complex visual tasks, enhancing performance in object detection and image segmentation.
arXiv - AI · 28 days ago -
29
[R] CVPR'26 SPAR-3D Workshop Call For Papers
The SPAR-3D workshop at CVPR'26 invites submissions on 3D vision models, focusing on security, privacy, and robustness, with a deadline extension to March 21, 2026.
Reddit - Machine Learning · 26 days ago -
30
[2504.00037] ViT-Linearizer: Distilling Quadratic Knowledge into Linear-Time Vision Models
The paper introduces ViT-Linearizer, a framework that distills knowledge from Vision Transformers (ViTs) into efficient linear-time models, addressing the challenges of quadratic complexity in high...
arXiv - AI · 28 days ago
Stay updated with AI News
Get the latest news, tools, and insights delivered to your inbox.
Daily or weekly digest • Unsubscribe anytime