Computer Vision

Image recognition, detection, and visual AI

Top This Week

[2602.09678] Administrative Law's Fourth Settlement: AI and the Capability-Accountability Trap
Computer Vision

[2602.09678] Administrative Law's Fourth Settlement: AI and the Capability-Accountability Trap

Abstract page for arXiv paper 2602.09678: Administrative Law's Fourth Settlement: AI and the Capability-Accountability Trap

arXiv - AI · 4 min ·
[2601.13622] CARPE: Context-Aware Image Representation Prioritization via Ensemble for Large Vision-Language Models
Llms

[2601.13622] CARPE: Context-Aware Image Representation Prioritization via Ensemble for Large Vision-Language Models

Abstract page for arXiv paper 2601.13622: CARPE: Context-Aware Image Representation Prioritization via Ensemble for Large Vision-Language...

arXiv - AI · 3 min ·
[2603.26551] Beyond MACs: Hardware Efficient Architecture Design for Vision Backbones
Computer Vision

[2603.26551] Beyond MACs: Hardware Efficient Architecture Design for Vision Backbones

Abstract page for arXiv paper 2603.26551: Beyond MACs: Hardware Efficient Architecture Design for Vision Backbones

arXiv - AI · 4 min ·

All Content

[2505.12254] MMS-VPR: Multimodal Street-Level Visual Place Recognition Dataset and Benchmark
Data Science

[2505.12254] MMS-VPR: Multimodal Street-Level Visual Place Recognition Dataset and Benchmark

The MMS-VPR paper introduces a comprehensive multimodal dataset for street-level visual place recognition, addressing gaps in existing da...

arXiv - AI · 4 min ·
[2602.15660] Bayesian Optimization for Design Parameters of 3D Image Data Analysis
Machine Learning

[2602.15660] Bayesian Optimization for Design Parameters of 3D Image Data Analysis

This paper presents a novel 3D data Analysis Optimization Pipeline that utilizes Bayesian Optimization to enhance segmentation and classi...

arXiv - AI · 4 min ·
[2505.05736] Multimodal Integrated Knowledge Transfer to Large Language Models through Preference Optimization with Biomedical Applications
Llms

[2505.05736] Multimodal Integrated Knowledge Transfer to Large Language Models through Preference Optimization with Biomedical Applications

The paper introduces MINT, a framework for optimizing large language models (LLMs) using multimodal biomedical data to enhance predictive...

arXiv - Machine Learning · 4 min ·
[2602.15579] Intracoronary Optical Coherence Tomography Image Processing and Vessel Classification Using Machine Learning
Machine Learning

[2602.15579] Intracoronary Optical Coherence Tomography Image Processing and Vessel Classification Using Machine Learning

This paper presents a machine learning-based pipeline for automated segmentation and classification of vessels in Intracoronary Optical C...

arXiv - AI · 3 min ·
[2602.15539] Dynamic Training-Free Fusion of Subject and Style LoRAs
Machine Learning

[2602.15539] Dynamic Training-Free Fusion of Subject and Style LoRAs

The paper presents a novel dynamic training-free fusion framework for combining subject and style LoRAs in generative models, enhancing c...

arXiv - AI · 4 min ·
[2602.15490] RPT-SR: Regional Prior attention Transformer for infrared image Super-Resolution
Machine Learning

[2602.15490] RPT-SR: Regional Prior attention Transformer for infrared image Super-Resolution

The paper presents RPT-SR, a novel transformer architecture designed for infrared image super-resolution, addressing inefficiencies in ex...

arXiv - AI · 4 min ·
[2511.19797] Terminal Velocity Matching
Machine Learning

[2511.19797] Terminal Velocity Matching

The paper introduces Terminal Velocity Matching (TVM), a novel approach to generative modeling that enhances performance in one- and few-...

arXiv - AI · 3 min ·
[2602.15339] Benchmarking Self-Supervised Models for Cardiac Ultrasound View Classification
Machine Learning

[2602.15339] Benchmarking Self-Supervised Models for Cardiac Ultrasound View Classification

This article evaluates self-supervised learning models for cardiac ultrasound view classification, comparing USF-MAE and MoCo v3 using th...

arXiv - AI · 4 min ·
[2602.15318] Sparrow: Text-Anchored Window Attention with Visual-Semantic Glimpsing for Speculative Decoding in Video LLMs
Llms

[2602.15318] Sparrow: Text-Anchored Window Attention with Visual-Semantic Glimpsing for Speculative Decoding in Video LLMs

The paper introduces Sparrow, a novel framework designed to enhance speculative decoding in Video Large Language Models (Vid-LLMs) by opt...

arXiv - AI · 4 min ·
[2602.15278] Visual Persuasion: What Influences Decisions of Vision-Language Models?
Llms

[2602.15278] Visual Persuasion: What Influences Decisions of Vision-Language Models?

This article explores how visual-language models (VLMs) make decisions based on image inputs, introducing a framework to analyze their pr...

arXiv - AI · 4 min ·
[2602.15257] How to Train Your Long-Context Visual Document Model
Llms

[2602.15257] How to Train Your Long-Context Visual Document Model

This article presents a comprehensive study on training long-context visual document models, achieving state-of-the-art performance in vi...

arXiv - AI · 3 min ·
[2411.12174] Just KIDDIN: Knowledge Infusion and Distillation for Detection of INdecent Memes
Llms

[2411.12174] Just KIDDIN: Knowledge Infusion and Distillation for Detection of INdecent Memes

The paper presents a novel framework, Just KIDDIN, that combines Knowledge Distillation and knowledge infusion to improve the detection o...

arXiv - AI · 4 min ·
[2602.15138] MB-DSMIL-CL-PL: Scalable Weakly Supervised Ovarian Cancer Subtype Classification and Localisation Using Contrastive and Prototype Learning with Frozen Patch Features
Machine Learning

[2602.15138] MB-DSMIL-CL-PL: Scalable Weakly Supervised Ovarian Cancer Subtype Classification and Localisation Using Contrastive and Prototype Learning with Frozen Patch Features

This paper presents a novel approach for classifying and localizing ovarian cancer subtypes using weakly supervised learning techniques, ...

arXiv - AI · 4 min ·
[2602.15828] Dex4D: Task-Agnostic Point Track Policy for Sim-to-Real Dexterous Manipulation
Machine Learning

[2602.15828] Dex4D: Task-Agnostic Point Track Policy for Sim-to-Real Dexterous Manipulation

The Dex4D framework enables task-agnostic dexterous manipulation by using simulation to learn generalist policies that can be applied to ...

arXiv - Machine Learning · 4 min ·
[2602.15072] GRAFNet: Multiscale Retinal Processing via Guided Cortical Attention Feedback for Enhancing Medical Image Polyp Segmentation
Machine Learning

[2602.15072] GRAFNet: Multiscale Retinal Processing via Guided Cortical Attention Feedback for Enhancing Medical Image Polyp Segmentation

GRAFNet introduces a novel architecture for polyp segmentation in colonoscopy, enhancing accuracy through biologically inspired multi-sca...

arXiv - AI · 4 min ·
[2602.15727] Spanning the Visual Analogy Space with a Weight Basis of LoRAs
Machine Learning

[2602.15727] Spanning the Visual Analogy Space with a Weight Basis of LoRAs

The paper presents LoRWeB, a novel approach to visual analogy learning that enhances image manipulation by dynamically selecting and weig...

arXiv - AI · 4 min ·
[2602.15382] The Vision Wormhole: Latent-Space Communication in Heterogeneous Multi-Agent Systems
Llms

[2602.15382] The Vision Wormhole: Latent-Space Communication in Heterogeneous Multi-Agent Systems

The paper introduces the Vision Wormhole, a framework for enabling efficient latent-space communication in heterogeneous multi-agent syst...

arXiv - Machine Learning · 4 min ·
[2602.15368] GMAIL: Generative Modality Alignment for generated Image Learning
Machine Learning

[2602.15368] GMAIL: Generative Modality Alignment for generated Image Learning

The paper presents GMAIL, a novel framework for aligning generated images with real images in machine learning, enhancing performance in ...

arXiv - Machine Learning · 4 min ·
[2602.15645] CARE Drive A Framework for Evaluating Reason-Responsiveness of Vision Language Models in Automated Driving
Llms

[2602.15645] CARE Drive A Framework for Evaluating Reason-Responsiveness of Vision Language Models in Automated Driving

The article presents CARE Drive, a framework for evaluating the reason-responsiveness of vision language models in automated driving, add...

arXiv - AI · 4 min ·
[2602.15580] How Vision Becomes Language: A Layer-wise Information-Theoretic Analysis of Multimodal Reasoning
Machine Learning

[2602.15580] How Vision Becomes Language: A Layer-wise Information-Theoretic Analysis of Multimodal Reasoning

This paper analyzes how multimodal Transformers integrate visual and linguistic information, revealing a layer-wise evolution of predicti...

arXiv - AI · 4 min ·
Previous Page 35 Next

Related Topics

Stay updated with AI News

Get the latest news, tools, and insights delivered to your inbox.

Daily or weekly digest • Unsubscribe anytime