Top Computer Vision This Month

The most engaging computer vision content from this month, curated by AI News.

This Week This Month Guide Trending
  1. 1

    [D] Edge AI Projects on Jetson Orin – Ideas?

    A Reddit user seeks innovative project ideas for deploying AI on NVIDIA Jetson Orin devices, leveraging their experience in machine learning and real-time systems.

    Reddit - Machine Learning · 28 days ago
  2. 2

    A new wearable AI system watches your hands through smart glasses, guiding experiments and stopping mistakes before they happen

    A new AI wearable system utilizes smart glasses to monitor hand movements, enhancing experimental accuracy and preventing errors in real-time.

    Reddit - Artificial Intelligence · 28 days ago
  3. 3

    [2602.22381] Enhancing Renal Tumor Malignancy Prediction: Deep Learning with Automatic 3D CT Organ Focused Attention

    This article presents a novel deep learning framework for predicting malignancy in renal tumors using 3D CT images, eliminating the need for manual segmentation and improving predictive accuracy.

    arXiv - AI · 28 days ago
  4. 4

    [2602.22570] Guidance Matters: Rethinking the Evaluation Pitfall for Text-to-Image Generation

    The paper discusses the evaluation challenges in text-to-image generation, focusing on classifier-free guidance (CFG) and proposing a new evaluation framework to address biases in current methods.

    arXiv - AI · 28 days ago
  5. 5

    [2602.22678] ViCLIP-OT: The First Foundation Vision-Language Model for Vietnamese Image-Text Retrieval with Optimal Transport

    ViCLIP-OT introduces a novel vision-language model tailored for Vietnamese image-text retrieval, outperforming existing models in low-resource settings.

    arXiv - AI · 28 days ago
  6. 6

    [2602.22716] SoPE: Spherical Coordinate-Based Positional Embedding for Enhancing Spatial Perception of 3D LVLMs

    The paper presents SoPE, a novel Spherical Coordinate-Based Positional Embedding method aimed at improving the spatial perception capabilities of 3D Large Vision-Language Models (3D LVLMs) by addre...

    arXiv - AI · 28 days ago
  7. 7

    [2602.23013] SubspaceAD: Training-Free Few-Shot Anomaly Detection via Subspace Modeling

    The paper introduces SubspaceAD, a training-free method for few-shot anomaly detection that utilizes subspace modeling to achieve state-of-the-art results without complex training processes.

    arXiv - Machine Learning · 28 days ago
  8. 8

    [2602.22740] AMLRIS: Alignment-aware Masked Learning for Referring Image Segmentation

    The paper presents AMLRIS, a novel training strategy for Referring Image Segmentation (RIS) that enhances object segmentation through alignment-aware masked learning, achieving state-of-the-art res...

    arXiv - AI · 28 days ago
  9. 9

    [2602.23192] FairQuant: Fairness-Aware Mixed-Precision Quantization for Medical Image Classification

    The paper presents FairQuant, a framework for fairness-aware mixed-precision quantization in medical image classification, optimizing both performance and fairness metrics.

    arXiv - Machine Learning · 28 days ago
  10. 10

    [2602.23214] Plug-and-Play Diffusion Meets ADMM: Dual-Variable Coupling for Robust Medical Image Reconstruction

    This paper presents a novel approach to medical image reconstruction using Dual-Coupled Plug-and-Play Diffusion, addressing limitations in existing methods and achieving state-of-the-art results.

    arXiv - Machine Learning · 28 days ago
  11. 11

    [2602.22955] MM-NeuroOnco: A Multimodal Benchmark and Instruction Dataset for MRI-Based Brain Tumor Diagnosis

    The article presents MM-NeuroOnco, a comprehensive dataset aimed at improving MRI-based brain tumor diagnosis through multimodal instructions and benchmarks.

    arXiv - AI · 28 days ago
  12. 12

    [2602.23117] Devling into Adversarial Transferability on Image Classification: Review, Benchmark, and Evaluation

    This article reviews adversarial transferability in image classification, proposing a standardized framework for evaluating transfer-based attacks and categorizing existing approaches.

    arXiv - AI · 28 days ago
  13. 13

    [2602.23172] Latent Gaussian Splatting for 4D Panoptic Occupancy Tracking

    The paper presents Latent Gaussian Splatting (LaGS) for 4D panoptic occupancy tracking, enhancing robot perception in dynamic environments by integrating multi-view data into a cohesive 3D represen...

    arXiv - AI · 28 days ago
  14. 14

    [2506.15190] Learning Task-Agnostic Motifs to Capture the Continuous Nature of Animal Behavior

    The paper presents a novel framework, Motif-based Continuous Dynamics (MCD), to model animal behavior by identifying continuous motor motifs, enhancing the understanding of behavior dynamics beyond...

    arXiv - Machine Learning · 28 days ago
  15. 15

    [2602.23203] ColoDiff: Integrating Dynamic Consistency With Content Awareness for Colonoscopy Video Generation

    ColoDiff introduces a novel framework for generating colonoscopy videos that ensures dynamic consistency and content awareness, addressing data scarcity in clinical settings.

    arXiv - AI · 28 days ago
  16. 16

    [2602.23235] Spatio-Temporal Token Pruning for Efficient High-Resolution GUI Agents

    The paper presents GUIPruner, a framework for enhancing the efficiency of high-resolution GUI agents by addressing spatiotemporal redundancy through innovative pruning techniques.

    arXiv - AI · 28 days ago
  17. 17

    [2602.23334] Bitwise Systolic Array Architecture for Runtime-Reconfigurable Multi-precision Quantized Multiplication on Hardware Accelerators

    This paper presents a novel bitwise systolic array architecture designed for runtime-reconfigurable multi-precision quantized multiplication, enhancing performance in neural network accelerators.

    arXiv - AI · 28 days ago
  18. 18

    [2602.23359] SeeThrough3D: Occlusion Aware 3D Control in Text-to-Image Generation

    The paper introduces SeeThrough3D, a model for occlusion-aware 3D control in text-to-image generation, enhancing the realism of synthesized scenes with depth-consistent geometry.

    arXiv - AI · 28 days ago
  19. 19

    [2408.17251] Abstracted Gaussian Prototypes for True One-Shot Concept Learning

    This paper presents a novel framework for one-shot learning in computer vision, utilizing Abstracted Gaussian Prototypes to enhance image segmentation and concept learning.

    arXiv - AI · 28 days ago
  20. 20

    [2412.20816] MomentMix Augmentation with Length-Aware DETR for Temporally Robust Moment Retrieval

    The paper presents MomentMix, a novel augmentation technique using Length-Aware DETR to enhance video moment retrieval, particularly for short moments, achieving superior performance on benchmark d...

    arXiv - AI · 28 days ago
  21. 21

    [2506.06092] LinGuinE: Longitudinal Guidance Estimation for Volumetric Tumour Segmentation

    LinGuinE introduces a novel framework for longitudinal volumetric tumor segmentation, enhancing tracking and mask generation across multiple scans without requiring longitudinal training.

    arXiv - Machine Learning · 28 days ago
  22. 22

    [2508.12691] Adaptive Hybrid Caching for Efficient Text-to-Video Diffusion Model Acceleration

    This paper presents MixCache, a novel caching framework designed to enhance the efficiency of text-to-video diffusion models, significantly improving generation speed and quality.

    arXiv - Machine Learning · 28 days ago
  23. 23

    [2505.02780] Beyond the Monitor: Mixed Reality Visualization and Multimodal AI for Enhanced Digital Pathology Workflow

    This article presents PathVis, a mixed-reality platform designed to enhance digital pathology workflows by integrating multimodal AI and immersive visualization techniques.

    arXiv - AI · 28 days ago
  24. 24

    [2510.01031] Secure and reversible face anonymization with diffusion models

    This paper presents a novel framework for secure and reversible face anonymization using diffusion models, addressing challenges in image quality and unauthorized de-anonymization.

    arXiv - Machine Learning · 28 days ago
  25. 25

    [2507.12784] A Semi-Supervised Learning Method for the Identification of Bad Exposures in Large Imaging Surveys

    This article presents a semi-supervised learning method to identify poor-quality exposures in large astronomical imaging surveys, enhancing data quality control.

    arXiv - AI · 28 days ago
  26. 26

    [2508.20570] Dyslexify: A Mechanistic Defense Against Typographic Attacks in CLIP

    The paper presents Dyslexify, a novel defense mechanism against typographic attacks in CLIP models, enhancing robustness without finetuning while maintaining performance.

    arXiv - AI · 28 days ago
  27. 27

    [2510.19060] PoSh: Using Scene Graphs To Guide LLMs-as-a-Judge For Detailed Image Descriptions

    The paper introduces PoSh, a new metric using scene graphs to enhance the evaluation of detailed image descriptions by LLMs, outperforming existing metrics.

    arXiv - AI · 28 days ago
  28. 28

    [2511.05898] Q$^2$: Quantization-Aware Gradient Balancing and Attention Alignment for Low-Bit Quantization

    The paper presents Q$^2$, a novel framework addressing gradient imbalance in low-bit quantization for complex visual tasks, enhancing performance in object detection and image segmentation.

    arXiv - AI · 28 days ago
  29. 29

    [R] CVPR'26 SPAR-3D Workshop Call For Papers

    The SPAR-3D workshop at CVPR'26 invites submissions on 3D vision models, focusing on security, privacy, and robustness, with a deadline extension to March 21, 2026.

    Reddit - Machine Learning · 26 days ago
  30. 30

    [2504.00037] ViT-Linearizer: Distilling Quadratic Knowledge into Linear-Time Vision Models

    The paper introduces ViT-Linearizer, a framework that distills knowledge from Vision Transformers (ViTs) into efficient linear-time models, addressing the challenges of quadratic complexity in high...

    arXiv - AI · 28 days ago

Stay updated with AI News

Get the latest news, tools, and insights delivered to your inbox.

Daily or weekly digest • Unsubscribe anytime