Computer Vision

Image recognition, detection, and visual AI

Top This Week

[2506.22504] Patch2Loc: Learning to Localize Patches for Unsupervised Brain Lesion Detection
Machine Learning

[2506.22504] Patch2Loc: Learning to Localize Patches for Unsupervised Brain Lesion Detection

Abstract page for arXiv paper 2506.22504: Patch2Loc: Learning to Localize Patches for Unsupervised Brain Lesion Detection

arXiv - Machine Learning · 4 min ·
[2508.00307] Acoustic Imaging for Low-SNR UAV Detection: Dense Beamformed Energy Maps and U-Net SELD
Machine Learning

[2508.00307] Acoustic Imaging for Low-SNR UAV Detection: Dense Beamformed Energy Maps and U-Net SELD

Abstract page for arXiv paper 2508.00307: Acoustic Imaging for Low-SNR UAV Detection: Dense Beamformed Energy Maps and U-Net SELD

arXiv - AI · 4 min ·
[2603.25524] CHIRP dataset: towards long-term, individual-level, behavioral monitoring of bird populations in the wild
Computer Vision

[2603.25524] CHIRP dataset: towards long-term, individual-level, behavioral monitoring of bird populations in the wild

Abstract page for arXiv paper 2603.25524: CHIRP dataset: towards long-term, individual-level, behavioral monitoring of bird populations i...

arXiv - AI · 4 min ·

All Content

[2505.02780] Beyond the Monitor: Mixed Reality Visualization and Multimodal AI for Enhanced Digital Pathology Workflow
Nlp

[2505.02780] Beyond the Monitor: Mixed Reality Visualization and Multimodal AI for Enhanced Digital Pathology Workflow

This article presents PathVis, a mixed-reality platform designed to enhance digital pathology workflows by integrating multimodal AI and ...

arXiv - AI · 4 min ·
[2504.00037] ViT-Linearizer: Distilling Quadratic Knowledge into Linear-Time Vision Models
Machine Learning

[2504.00037] ViT-Linearizer: Distilling Quadratic Knowledge into Linear-Time Vision Models

The paper introduces ViT-Linearizer, a framework that distills knowledge from Vision Transformers (ViTs) into efficient linear-time model...

arXiv - AI · 3 min ·
[2508.12691] Adaptive Hybrid Caching for Efficient Text-to-Video Diffusion Model Acceleration
Machine Learning

[2508.12691] Adaptive Hybrid Caching for Efficient Text-to-Video Diffusion Model Acceleration

This paper presents MixCache, a novel caching framework designed to enhance the efficiency of text-to-video diffusion models, significant...

arXiv - Machine Learning · 4 min ·
[2508.04228] LayerT2V: A Unified Multi-Layer Video Generation Framework
Machine Learning

[2508.04228] LayerT2V: A Unified Multi-Layer Video Generation Framework

LayerT2V presents a novel framework for multi-layer video generation, enabling the creation of editable video layers that enhance profess...

arXiv - Machine Learning · 4 min ·
[2502.02088] Dual-IPO: Dual-Iterative Preference Optimization for Text-to-Video Generation
Machine Learning

[2502.02088] Dual-IPO: Dual-Iterative Preference Optimization for Text-to-Video Generation

The paper presents Dual-IPO, a novel framework for optimizing text-to-video generation by iteratively improving both the reward and video...

arXiv - AI · 4 min ·
[2506.06092] LinGuinE: Longitudinal Guidance Estimation for Volumetric Tumour Segmentation
Computer Vision

[2506.06092] LinGuinE: Longitudinal Guidance Estimation for Volumetric Tumour Segmentation

LinGuinE introduces a novel framework for longitudinal volumetric tumor segmentation, enhancing tracking and mask generation across multi...

arXiv - Machine Learning · 4 min ·
[2412.20816] MomentMix Augmentation with Length-Aware DETR for Temporally Robust Moment Retrieval
Machine Learning

[2412.20816] MomentMix Augmentation with Length-Aware DETR for Temporally Robust Moment Retrieval

The paper presents MomentMix, a novel augmentation technique using Length-Aware DETR to enhance video moment retrieval, particularly for ...

arXiv - AI · 4 min ·
[2411.18207] From Open Vocabulary to Open World: Teaching Vision Language Models to Detect Novel Objects
Llms

[2411.18207] From Open Vocabulary to Open World: Teaching Vision Language Models to Detect Novel Objects

This paper introduces a framework for open vocabulary object detection that allows vision language models to identify and learn novel obj...

arXiv - AI · 4 min ·
[2408.17251] Abstracted Gaussian Prototypes for True One-Shot Concept Learning
Machine Learning

[2408.17251] Abstracted Gaussian Prototypes for True One-Shot Concept Learning

This paper presents a novel framework for one-shot learning in computer vision, utilizing Abstracted Gaussian Prototypes to enhance image...

arXiv - AI · 4 min ·
[2602.23359] SeeThrough3D: Occlusion Aware 3D Control in Text-to-Image Generation
Machine Learning

[2602.23359] SeeThrough3D: Occlusion Aware 3D Control in Text-to-Image Generation

The paper introduces SeeThrough3D, a model for occlusion-aware 3D control in text-to-image generation, enhancing the realism of synthesiz...

arXiv - AI · 4 min ·
[2602.23334] Bitwise Systolic Array Architecture for Runtime-Reconfigurable Multi-precision Quantized Multiplication on Hardware Accelerators
Machine Learning

[2602.23334] Bitwise Systolic Array Architecture for Runtime-Reconfigurable Multi-precision Quantized Multiplication on Hardware Accelerators

This paper presents a novel bitwise systolic array architecture designed for runtime-reconfigurable multi-precision quantized multiplicat...

arXiv - AI · 3 min ·
[2602.23235] Spatio-Temporal Token Pruning for Efficient High-Resolution GUI Agents
Ai Safety

[2602.23235] Spatio-Temporal Token Pruning for Efficient High-Resolution GUI Agents

The paper presents GUIPruner, a framework for enhancing the efficiency of high-resolution GUI agents by addressing spatiotemporal redunda...

arXiv - AI · 4 min ·
[2602.23203] ColoDiff: Integrating Dynamic Consistency With Content Awareness for Colonoscopy Video Generation
Generative Ai

[2602.23203] ColoDiff: Integrating Dynamic Consistency With Content Awareness for Colonoscopy Video Generation

ColoDiff introduces a novel framework for generating colonoscopy videos that ensures dynamic consistency and content awareness, addressin...

arXiv - AI · 4 min ·
[2506.15190] Learning Task-Agnostic Motifs to Capture the Continuous Nature of Animal Behavior
Computer Vision

[2506.15190] Learning Task-Agnostic Motifs to Capture the Continuous Nature of Animal Behavior

The paper presents a novel framework, Motif-based Continuous Dynamics (MCD), to model animal behavior by identifying continuous motor mot...

arXiv - Machine Learning · 4 min ·
[2602.23172] Latent Gaussian Splatting for 4D Panoptic Occupancy Tracking
Robotics

[2602.23172] Latent Gaussian Splatting for 4D Panoptic Occupancy Tracking

The paper presents Latent Gaussian Splatting (LaGS) for 4D panoptic occupancy tracking, enhancing robot perception in dynamic environment...

arXiv - AI · 3 min ·
[2602.23153] Efficient Encoder-Free Fourier-based 3D Large Multimodal Model
Machine Learning

[2602.23153] Efficient Encoder-Free Fourier-based 3D Large Multimodal Model

This article presents Fase3D, an innovative encoder-free Fourier-based model for processing 3D multimodal data, enhancing efficiency and ...

arXiv - AI · 4 min ·
[2602.23117] Devling into Adversarial Transferability on Image Classification: Review, Benchmark, and Evaluation
Machine Learning

[2602.23117] Devling into Adversarial Transferability on Image Classification: Review, Benchmark, and Evaluation

This article reviews adversarial transferability in image classification, proposing a standardized framework for evaluating transfer-base...

arXiv - AI · 3 min ·
[2602.22955] MM-NeuroOnco: A Multimodal Benchmark and Instruction Dataset for MRI-Based Brain Tumor Diagnosis
Machine Learning

[2602.22955] MM-NeuroOnco: A Multimodal Benchmark and Instruction Dataset for MRI-Based Brain Tumor Diagnosis

The article presents MM-NeuroOnco, a comprehensive dataset aimed at improving MRI-based brain tumor diagnosis through multimodal instruct...

arXiv - AI · 4 min ·
[2411.11727] Aligning Few-Step Diffusion Models with Dense Reward Difference Learning
Machine Learning

[2411.11727] Aligning Few-Step Diffusion Models with Dense Reward Difference Learning

This paper presents Stepwise Diffusion Policy Optimization (SDPO), a novel reinforcement learning framework designed to enhance few-step ...

arXiv - Machine Learning · 4 min ·
[2602.23214] Plug-and-Play Diffusion Meets ADMM: Dual-Variable Coupling for Robust Medical Image Reconstruction
Machine Learning

[2602.23214] Plug-and-Play Diffusion Meets ADMM: Dual-Variable Coupling for Robust Medical Image Reconstruction

This paper presents a novel approach to medical image reconstruction using Dual-Coupled Plug-and-Play Diffusion, addressing limitations i...

arXiv - Machine Learning · 4 min ·
Previous Page 10 Next

Related Topics

Stay updated with AI News

Get the latest news, tools, and insights delivered to your inbox.

Daily or weekly digest • Unsubscribe anytime