Computer Vision

Image recognition, detection, and visual AI

Top This Week

[2602.09678] Administrative Law's Fourth Settlement: AI and the Capability-Accountability Trap
Computer Vision

[2602.09678] Administrative Law's Fourth Settlement: AI and the Capability-Accountability Trap

Abstract page for arXiv paper 2602.09678: Administrative Law's Fourth Settlement: AI and the Capability-Accountability Trap

arXiv - AI · 4 min ·
[2601.13622] CARPE: Context-Aware Image Representation Prioritization via Ensemble for Large Vision-Language Models
Llms

[2601.13622] CARPE: Context-Aware Image Representation Prioritization via Ensemble for Large Vision-Language Models

Abstract page for arXiv paper 2601.13622: CARPE: Context-Aware Image Representation Prioritization via Ensemble for Large Vision-Language...

arXiv - AI · 3 min ·
[2603.26551] Beyond MACs: Hardware Efficient Architecture Design for Vision Backbones
Computer Vision

[2603.26551] Beyond MACs: Hardware Efficient Architecture Design for Vision Backbones

Abstract page for arXiv paper 2603.26551: Beyond MACs: Hardware Efficient Architecture Design for Vision Backbones

arXiv - AI · 4 min ·

All Content

[2505.17064] Synthetic History: Evaluating Visual Representations of the Past in Diffusion Models
Machine Learning

[2505.17064] Synthetic History: Evaluating Visual Representations of the Past in Diffusion Models

This article evaluates how Text-to-Image diffusion models represent historical contexts, introducing a benchmark to assess their accuracy...

arXiv - Machine Learning · 4 min ·
[2503.16021] Imitating AI agents increase diversity in homogeneous information environments but can reduce it in heterogeneous ones
Llms

[2503.16021] Imitating AI agents increase diversity in homogeneous information environments but can reduce it in heterogeneous ones

This article explores how AI agents imitating human content affect information diversity, revealing context-dependent outcomes in homogen...

arXiv - AI · 4 min ·
[2505.17748] Soft-CAM: Making black box models self-explainable for medical image analysis
Machine Learning

[2505.17748] Soft-CAM: Making black box models self-explainable for medical image analysis

The paper introduces Soft-CAM, a method that enhances the interpretability of convolutional neural networks (CNNs) in medical image analy...

arXiv - Machine Learning · 4 min ·
[2505.11409] Visual Planning: Let's Think Only with Images
Llms

[2505.11409] Visual Planning: Let's Think Only with Images

The paper introduces 'Visual Planning', a new paradigm that utilizes images for reasoning in spatial tasks, enhancing planning capabiliti...

arXiv - Machine Learning · 4 min ·
[2412.13897] Data-Efficient Inference of Neural Fluid Fields via SciML Foundation Model
Llms

[2412.13897] Data-Efficient Inference of Neural Fluid Fields via SciML Foundation Model

This article presents a novel approach to data-efficient inference of neural fluid fields using SciML foundation models, demonstrating si...

arXiv - Machine Learning · 4 min ·
[1803.09319] SUNLayer: Stable denoising with generative networks
Machine Learning

[1803.09319] SUNLayer: Stable denoising with generative networks

The paper introduces SUNLayer, a theoretical framework for stable denoising using generative networks, focusing on activation functions a...

arXiv - Machine Learning · 3 min ·
[2602.18406] Latent Equivariant Operators for Robust Object Recognition: Promise and Challenges
Machine Learning

[2602.18406] Latent Equivariant Operators for Robust Object Recognition: Promise and Challenges

The paper discusses Latent Equivariant Operators as a novel approach to enhance object recognition in computer vision, addressing challen...

arXiv - Machine Learning · 3 min ·
[2411.08875] Causal Explanations for Image Classifiers
Ai Infrastructure

[2411.08875] Causal Explanations for Image Classifiers

This paper presents a novel approach to generating causal explanations for image classifiers, introducing a black-box algorithm grounded ...

arXiv - AI · 3 min ·
[2602.18377] Theory and interpretability of Quantum Extreme Learning Machines: a Pauli-transfer matrix approach
Machine Learning

[2602.18377] Theory and interpretability of Quantum Extreme Learning Machines: a Pauli-transfer matrix approach

This article presents a theoretical analysis of Quantum Extreme Learning Machines (QELMs) using the Pauli-transfer matrix approach, highl...

arXiv - Machine Learning · 4 min ·
[2602.18350] Quantum-enhanced satellite image classification
Computer Vision

[2602.18350] Quantum-enhanced satellite image classification

This paper presents a quantum feature extraction method that enhances multi-class image classification for satellite applications, achiev...

arXiv - Machine Learning · 3 min ·
[2602.18374] Zero-shot Interactive Perception
Robotics

[2602.18374] Zero-shot Interactive Perception

The paper presents Zero-Shot Interactive Perception (ZS-IP), a framework that enhances robotic manipulation through a memory-driven Visio...

arXiv - AI · 3 min ·
[2602.18252] On the Adversarial Robustness of Discrete Image Tokenizers
Machine Learning

[2602.18252] On the Adversarial Robustness of Discrete Image Tokenizers

This paper investigates the adversarial robustness of discrete image tokenizers, highlighting their vulnerabilities and proposing a novel...

arXiv - AI · 3 min ·
[2602.18083] Comparative Assessment of Multimodal Earth Observation Data for Soil Moisture Estimation
Machine Learning

[2602.18083] Comparative Assessment of Multimodal Earth Observation Data for Soil Moisture Estimation

This article presents a high-resolution framework for soil moisture estimation using multimodal Earth observation data, highlighting the ...

arXiv - Machine Learning · 4 min ·
[2602.18047] CityGuard: Graph-Aware Private Descriptors for Bias-Resilient Identity Search Across Urban Cameras
Machine Learning

[2602.18047] CityGuard: Graph-Aware Private Descriptors for Bias-Resilient Identity Search Across Urban Cameras

CityGuard introduces a novel framework for privacy-preserving identity retrieval across urban surveillance cameras, addressing challenges...

arXiv - Machine Learning · 4 min ·
[2602.17929] ZACH-ViT: Regime-Dependent Inductive Bias in Compact Vision Transformers for Medical Imaging
Machine Learning

[2602.17929] ZACH-ViT: Regime-Dependent Inductive Bias in Compact Vision Transformers for Medical Imaging

ZACH-ViT introduces a novel Vision Transformer architecture tailored for medical imaging, enhancing performance by removing fixed spatial...

arXiv - Machine Learning · 4 min ·
[2602.18119] RamanSeg: Interpretability-driven Deep Learning on Raman Spectra for Cancer Diagnosis
Machine Learning

[2602.18119] RamanSeg: Interpretability-driven Deep Learning on Raman Spectra for Cancer Diagnosis

The paper presents RamanSeg, an interpretable deep learning model for analyzing Raman spectra in cancer diagnosis, achieving significant ...

arXiv - Machine Learning · 3 min ·
[2602.17855] TopoGate: Quality-Aware Topology-Stabilized Gated Fusion for Longitudinal Low-Dose CT New-Lesion Prediction
Machine Learning

[2602.17855] TopoGate: Quality-Aware Topology-Stabilized Gated Fusion for Longitudinal Low-Dose CT New-Lesion Prediction

The paper presents TopoGate, a model designed to enhance new-lesion prediction in longitudinal low-dose CT scans by integrating quality-a...

arXiv - Machine Learning · 3 min ·
[2602.18094] OODBench: Out-of-Distribution Benchmark for Large Vision-Language Models
Llms

[2602.18094] OODBench: Out-of-Distribution Benchmark for Large Vision-Language Models

The paper introduces OODBench, a benchmark for evaluating large vision-language models' performance on out-of-distribution (OOD) data, hi...

arXiv - AI · 4 min ·
[2602.18089] DohaScript: A Large-Scale Multi-Writer Dataset for Continuous Handwritten Hindi Text
Data Science

[2602.18089] DohaScript: A Large-Scale Multi-Writer Dataset for Continuous Handwritten Hindi Text

DohaScript introduces a large-scale dataset for continuous handwritten Hindi text, addressing the lack of diverse and high-quality resour...

arXiv - Machine Learning · 4 min ·
[2602.17814] VQPP: Video Query Performance Prediction Benchmark
Nlp

[2602.17814] VQPP: Video Query Performance Prediction Benchmark

The paper introduces the Video Query Performance Prediction (VQPP) benchmark, addressing a gap in query performance prediction for video ...

arXiv - Machine Learning · 4 min ·
Previous Page 27 Next

Related Topics

Stay updated with AI News

Get the latest news, tools, and insights delivered to your inbox.

Daily or weekly digest • Unsubscribe anytime