Computer Vision

Image recognition, detection, and visual AI

Top This Week

[2602.09678] Administrative Law's Fourth Settlement: AI and the Capability-Accountability Trap
Computer Vision

[2602.09678] Administrative Law's Fourth Settlement: AI and the Capability-Accountability Trap

Abstract page for arXiv paper 2602.09678: Administrative Law's Fourth Settlement: AI and the Capability-Accountability Trap

arXiv - AI · 4 min ·
[2601.13622] CARPE: Context-Aware Image Representation Prioritization via Ensemble for Large Vision-Language Models
Llms

[2601.13622] CARPE: Context-Aware Image Representation Prioritization via Ensemble for Large Vision-Language Models

Abstract page for arXiv paper 2601.13622: CARPE: Context-Aware Image Representation Prioritization via Ensemble for Large Vision-Language...

arXiv - AI · 3 min ·
[2603.26551] Beyond MACs: Hardware Efficient Architecture Design for Vision Backbones
Computer Vision

[2603.26551] Beyond MACs: Hardware Efficient Architecture Design for Vision Backbones

Abstract page for arXiv paper 2603.26551: Beyond MACs: Hardware Efficient Architecture Design for Vision Backbones

arXiv - AI · 4 min ·

All Content

[2510.09201] Multimodal Prompt Optimization: Why Not Leverage Multiple Modalities for MLLMs
Llms

[2510.09201] Multimodal Prompt Optimization: Why Not Leverage Multiple Modalities for MLLMs

This article introduces the concept of multimodal prompt optimization for Multimodal Large Language Models (MLLMs), proposing a new frame...

arXiv - AI · 4 min ·
[2510.03352] Inference-Time Search Using Side Information for Diffusion-Based Image Reconstruction
Machine Learning

[2510.03352] Inference-Time Search Using Side Information for Diffusion-Based Image Reconstruction

This article presents a novel inference-time search algorithm that enhances diffusion-based image reconstruction by utilizing side inform...

arXiv - Machine Learning · 4 min ·
[2507.19634] MCIF: Multimodal Crosslingual Instruction-Following Benchmark from Scientific Talks
Llms

[2507.19634] MCIF: Multimodal Crosslingual Instruction-Following Benchmark from Scientific Talks

The MCIF benchmark introduces a novel framework for evaluating multimodal crosslingual instruction-following capabilities in large langua...

arXiv - AI · 4 min ·
[2505.12298] Attention-Enhanced U-Net for Accurate Segmentation of COVID-19 Infected Lung Regions in CT Scans
Machine Learning

[2505.12298] Attention-Enhanced U-Net for Accurate Segmentation of COVID-19 Infected Lung Regions in CT Scans

This article presents a novel approach using an Attention-Enhanced U-Net for the automatic segmentation of COVID-19 infected lung regions...

arXiv - AI · 3 min ·
[2504.21730] Cert-SSBD: Certified Backdoor Defense with Sample-Specific Smoothing Noises
Machine Learning

[2504.21730] Cert-SSBD: Certified Backdoor Defense with Sample-Specific Smoothing Noises

The paper presents Cert-SSBD, a novel method for defending against backdoor attacks in deep neural networks by optimizing noise levels sp...

arXiv - Machine Learning · 4 min ·
[2503.04121] Simple Self Organizing Map with Vision Transformers
Machine Learning

[2503.04121] Simple Self Organizing Map with Vision Transformers

This paper explores the integration of Self-Organizing Maps (SOMs) with Vision Transformers (ViTs) to enhance performance on small datase...

arXiv - Machine Learning · 4 min ·
[2412.02039] Multi-View 3D Reconstruction using Knowledge Distillation
Llms

[2412.02039] Multi-View 3D Reconstruction using Knowledge Distillation

This paper presents a knowledge distillation approach for Multi-View 3D reconstruction, utilizing a teacher-student model framework to en...

arXiv - Machine Learning · 4 min ·
[2508.12026] Bongard-RWR+: Real-World Representations of Fine-Grained Concepts in Bongard Problems
Machine Learning

[2508.12026] Bongard-RWR+: Real-World Representations of Fine-Grained Concepts in Bongard Problems

The paper presents Bongard-RWR+, a dataset designed to enhance fine-grained visual reasoning in Bongard Problems using real-world images ...

arXiv - Machine Learning · 4 min ·
[2507.23497] Sufficient, Necessary and Complete Causal Explanations in Image Classification
Machine Learning

[2507.23497] Sufficient, Necessary and Complete Causal Explanations in Image Classification

This paper explores causal explanations in image classification, demonstrating their formal properties and computability, while introduci...

arXiv - AI · 4 min ·
[2602.17484] Tracing Copied Pixels and Regularizing Patch Affinity in Copy Detection
Computer Vision

[2602.17484] Tracing Copied Pixels and Regularizing Patch Affinity in Copy Detection

The paper presents advancements in Image Copy Detection (ICD) by introducing PixTrace and CopyNCE, enhancing feature representation and i...

arXiv - AI · 3 min ·
[2602.17397] A High-Level Survey of Optical Remote Sensing
Computer Vision

[2602.17397] A High-Level Survey of Optical Remote Sensing

This article provides a comprehensive overview of optical remote sensing, highlighting advancements in computer vision and drone technolo...

arXiv - AI · 3 min ·
[2602.17395] SpectralGCD: Spectral Concept Selection and Cross-modal Representation Learning for Generalized Category Discovery
Machine Learning

[2602.17395] SpectralGCD: Spectral Concept Selection and Cross-modal Representation Learning for Generalized Category Discovery

The paper presents SpectralGCD, a novel approach for Generalized Category Discovery (GCD) that enhances multimodal learning by efficientl...

arXiv - Machine Learning · 4 min ·
[2602.17205] Deeper detection limits in astronomical imaging using self-supervised spatiotemporal denoising
Machine Learning

[2602.17205] Deeper detection limits in astronomical imaging using self-supervised spatiotemporal denoising

The paper presents ASTERIS, a self-supervised spatiotemporal denoising algorithm that enhances detection limits in astronomical imaging, ...

arXiv - AI · 4 min ·
[2602.17124] 3D Scene Rendering with Multimodal Gaussian Splatting
Computer Vision

[2602.17124] 3D Scene Rendering with Multimodal Gaussian Splatting

This paper presents a novel approach to 3D scene rendering using multimodal Gaussian splatting, integrating RF sensing for improved accur...

arXiv - AI · 4 min ·
[2602.17095] FLoRG: Federated Fine-tuning with Low-rank Gram Matrices and Procrustes Alignment
Llms

[2602.17095] FLoRG: Federated Fine-tuning with Low-rank Gram Matrices and Procrustes Alignment

The paper presents FLoRG, a federated fine-tuning framework that utilizes low-rank Gram matrices and Procrustes alignment to enhance the ...

arXiv - AI · 4 min ·
[2602.16968] DDiT: Dynamic Patch Scheduling for Efficient Diffusion Transformers
Machine Learning

[2602.16968] DDiT: Dynamic Patch Scheduling for Efficient Diffusion Transformers

The paper presents DDiT, a novel approach for dynamic patch scheduling in diffusion transformers, enhancing efficiency in image and video...

arXiv - AI · 3 min ·
[2602.16918] Xray-Visual Models: Scaling Vision models on Industry Scale Data
Machine Learning

[2602.16918] Xray-Visual Models: Scaling Vision models on Industry Scale Data

The paper presents Xray-Visual, a novel vision model architecture designed for large-scale image and video understanding, utilizing exten...

arXiv - AI · 4 min ·
[2602.16723] Is Mamba Reliable for Medical Imaging?
Machine Learning

[2602.16723] Is Mamba Reliable for Medical Imaging?

This paper evaluates the reliability of Mamba, a state-space model, for medical imaging under various attack scenarios, highlighting vuln...

arXiv - AI · 3 min ·
[2602.17566] A Hybrid Federated Learning Based Ensemble Approach for Lung Disease Diagnosis Leveraging Fusion of SWIN Transformer and CNN
Machine Learning

[2602.17566] A Hybrid Federated Learning Based Ensemble Approach for Lung Disease Diagnosis Leveraging Fusion of SWIN Transformer and CNN

This article presents a hybrid federated learning model that combines SWIN Transformer and CNN for diagnosing lung diseases, particularly...

arXiv - AI · 4 min ·
[2602.17386] Visual Model Checking: Graph-Based Inference of Visual Routines for Image Retrieval
Machine Learning

[2602.17386] Visual Model Checking: Graph-Based Inference of Visual Routines for Image Retrieval

The paper presents a novel framework integrating formal verification with deep learning for improved image retrieval, addressing the limi...

arXiv - AI · 4 min ·
Previous Page 30 Next

Related Topics

Stay updated with AI News

Get the latest news, tools, and insights delivered to your inbox.

Daily or weekly digest • Unsubscribe anytime