Top Computer Vision This Month

1

[D] Edge AI Projects on Jetson Orin – Ideas?

A Reddit user seeks innovative project ideas for deploying AI on NVIDIA Jetson Orin devices, leveraging their experience in machine learning and real-time systems.

Reddit - Machine Learning · 28 days ago

2

A new wearable AI system watches your hands through smart glasses, guiding experiments and stopping mistakes before they happen

A new AI wearable system utilizes smart glasses to monitor hand movements, enhancing experimental accuracy and preventing errors in real-time.

Reddit - Artificial Intelligence · 28 days ago

3

[2602.22381] Enhancing Renal Tumor Malignancy Prediction: Deep Learning with Automatic 3D CT Organ Focused Attention

This article presents a novel deep learning framework for predicting malignancy in renal tumors using 3D CT images, eliminating the need for manual segmentation and improving predictive accuracy.

arXiv - AI · 28 days ago

4

[2602.22570] Guidance Matters: Rethinking the Evaluation Pitfall for Text-to-Image Generation

The paper discusses the evaluation challenges in text-to-image generation, focusing on classifier-free guidance (CFG) and proposing a new evaluation framework to address biases in current methods.

arXiv - AI · 28 days ago

5

[2602.22678] ViCLIP-OT: The First Foundation Vision-Language Model for Vietnamese Image-Text Retrieval with Optimal Transport

ViCLIP-OT introduces a novel vision-language model tailored for Vietnamese image-text retrieval, outperforming existing models in low-resource settings.

arXiv - AI · 28 days ago

6

[2602.22716] SoPE: Spherical Coordinate-Based Positional Embedding for Enhancing Spatial Perception of 3D LVLMs

The paper presents SoPE, a novel Spherical Coordinate-Based Positional Embedding method aimed at improving the spatial perception capabilities of 3D Large Vision-Language Models (3D LVLMs) by addre...

arXiv - AI · 28 days ago

7

[2602.23013] SubspaceAD: Training-Free Few-Shot Anomaly Detection via Subspace Modeling

The paper introduces SubspaceAD, a training-free method for few-shot anomaly detection that utilizes subspace modeling to achieve state-of-the-art results without complex training processes.

arXiv - Machine Learning · 28 days ago

8

[2602.22740] AMLRIS: Alignment-aware Masked Learning for Referring Image Segmentation

The paper presents AMLRIS, a novel training strategy for Referring Image Segmentation (RIS) that enhances object segmentation through alignment-aware masked learning, achieving state-of-the-art res...

arXiv - AI · 28 days ago

9

[2602.23192] FairQuant: Fairness-Aware Mixed-Precision Quantization for Medical Image Classification

The paper presents FairQuant, a framework for fairness-aware mixed-precision quantization in medical image classification, optimizing both performance and fairness metrics.

arXiv - Machine Learning · 28 days ago

10

[2602.23214] Plug-and-Play Diffusion Meets ADMM: Dual-Variable Coupling for Robust Medical Image Reconstruction

This paper presents a novel approach to medical image reconstruction using Dual-Coupled Plug-and-Play Diffusion, addressing limitations in existing methods and achieving state-of-the-art results.

arXiv - Machine Learning · 28 days ago

11

[2602.22955] MM-NeuroOnco: A Multimodal Benchmark and Instruction Dataset for MRI-Based Brain Tumor Diagnosis

The article presents MM-NeuroOnco, a comprehensive dataset aimed at improving MRI-based brain tumor diagnosis through multimodal instructions and benchmarks.

arXiv - AI · 28 days ago

12

[2602.23117] Devling into Adversarial Transferability on Image Classification: Review, Benchmark, and Evaluation

This article reviews adversarial transferability in image classification, proposing a standardized framework for evaluating transfer-based attacks and categorizing existing approaches.

arXiv - AI · 28 days ago

13

[2602.23172] Latent Gaussian Splatting for 4D Panoptic Occupancy Tracking

The paper presents Latent Gaussian Splatting (LaGS) for 4D panoptic occupancy tracking, enhancing robot perception in dynamic environments by integrating multi-view data into a cohesive 3D represen...

arXiv - AI · 28 days ago

14

[2506.15190] Learning Task-Agnostic Motifs to Capture the Continuous Nature of Animal Behavior

The paper presents a novel framework, Motif-based Continuous Dynamics (MCD), to model animal behavior by identifying continuous motor motifs, enhancing the understanding of behavior dynamics beyond...

arXiv - Machine Learning · 28 days ago

15

[2602.23203] ColoDiff: Integrating Dynamic Consistency With Content Awareness for Colonoscopy Video Generation

ColoDiff introduces a novel framework for generating colonoscopy videos that ensures dynamic consistency and content awareness, addressing data scarcity in clinical settings.

arXiv - AI · 28 days ago

16

[2602.23235] Spatio-Temporal Token Pruning for Efficient High-Resolution GUI Agents

The paper presents GUIPruner, a framework for enhancing the efficiency of high-resolution GUI agents by addressing spatiotemporal redundancy through innovative pruning techniques.

arXiv - AI · 28 days ago

17

[2602.23334] Bitwise Systolic Array Architecture for Runtime-Reconfigurable Multi-precision Quantized Multiplication on Hardware Accelerators

This paper presents a novel bitwise systolic array architecture designed for runtime-reconfigurable multi-precision quantized multiplication, enhancing performance in neural network accelerators.

arXiv - AI · 28 days ago

18

[2602.23359] SeeThrough3D: Occlusion Aware 3D Control in Text-to-Image Generation

The paper introduces SeeThrough3D, a model for occlusion-aware 3D control in text-to-image generation, enhancing the realism of synthesized scenes with depth-consistent geometry.

arXiv - AI · 28 days ago

19

[2408.17251] Abstracted Gaussian Prototypes for True One-Shot Concept Learning

This paper presents a novel framework for one-shot learning in computer vision, utilizing Abstracted Gaussian Prototypes to enhance image segmentation and concept learning.

arXiv - AI · 28 days ago

20

[2412.20816] MomentMix Augmentation with Length-Aware DETR for Temporally Robust Moment Retrieval

The paper presents MomentMix, a novel augmentation technique using Length-Aware DETR to enhance video moment retrieval, particularly for short moments, achieving superior performance on benchmark d...

arXiv - AI · 28 days ago

21

[2506.06092] LinGuinE: Longitudinal Guidance Estimation for Volumetric Tumour Segmentation

LinGuinE introduces a novel framework for longitudinal volumetric tumor segmentation, enhancing tracking and mask generation across multiple scans without requiring longitudinal training.

arXiv - Machine Learning · 28 days ago

22

[2508.12691] Adaptive Hybrid Caching for Efficient Text-to-Video Diffusion Model Acceleration

This paper presents MixCache, a novel caching framework designed to enhance the efficiency of text-to-video diffusion models, significantly improving generation speed and quality.

arXiv - Machine Learning · 28 days ago

23

[2505.02780] Beyond the Monitor: Mixed Reality Visualization and Multimodal AI for Enhanced Digital Pathology Workflow

This article presents PathVis, a mixed-reality platform designed to enhance digital pathology workflows by integrating multimodal AI and immersive visualization techniques.

arXiv - AI · 28 days ago

24

[2510.01031] Secure and reversible face anonymization with diffusion models

This paper presents a novel framework for secure and reversible face anonymization using diffusion models, addressing challenges in image quality and unauthorized de-anonymization.

arXiv - Machine Learning · 28 days ago

25

[2507.12784] A Semi-Supervised Learning Method for the Identification of Bad Exposures in Large Imaging Surveys

This article presents a semi-supervised learning method to identify poor-quality exposures in large astronomical imaging surveys, enhancing data quality control.

arXiv - AI · 28 days ago

26

[2508.20570] Dyslexify: A Mechanistic Defense Against Typographic Attacks in CLIP

The paper presents Dyslexify, a novel defense mechanism against typographic attacks in CLIP models, enhancing robustness without finetuning while maintaining performance.

arXiv - AI · 28 days ago

27

[2510.19060] PoSh: Using Scene Graphs To Guide LLMs-as-a-Judge For Detailed Image Descriptions

The paper introduces PoSh, a new metric using scene graphs to enhance the evaluation of detailed image descriptions by LLMs, outperforming existing metrics.

arXiv - AI · 28 days ago

28

[2511.05898] Q$^2$: Quantization-Aware Gradient Balancing and Attention Alignment for Low-Bit Quantization

The paper presents Q$^2$, a novel framework addressing gradient imbalance in low-bit quantization for complex visual tasks, enhancing performance in object detection and image segmentation.

arXiv - AI · 28 days ago

29

[R] CVPR'26 SPAR-3D Workshop Call For Papers

The SPAR-3D workshop at CVPR'26 invites submissions on 3D vision models, focusing on security, privacy, and robustness, with a deadline extension to March 21, 2026.

Reddit - Machine Learning · 26 days ago

30

[2504.00037] ViT-Linearizer: Distilling Quadratic Knowledge into Linear-Time Vision Models

The paper introduces ViT-Linearizer, a framework that distills knowledge from Vision Transformers (ViTs) into efficient linear-time models, addressing the challenges of quadratic complexity in high...

arXiv - AI · 28 days ago

Top Computer Vision This Month

Stay updated with AI News