Computer Vision

Image recognition, detection, and visual AI

Top This Week

[2602.09678] Administrative Law's Fourth Settlement: AI and the Capability-Accountability Trap
Computer Vision

[2602.09678] Administrative Law's Fourth Settlement: AI and the Capability-Accountability Trap

Abstract page for arXiv paper 2602.09678: Administrative Law's Fourth Settlement: AI and the Capability-Accountability Trap

arXiv - AI · 4 min ·
[2601.13622] CARPE: Context-Aware Image Representation Prioritization via Ensemble for Large Vision-Language Models
Llms

[2601.13622] CARPE: Context-Aware Image Representation Prioritization via Ensemble for Large Vision-Language Models

Abstract page for arXiv paper 2601.13622: CARPE: Context-Aware Image Representation Prioritization via Ensemble for Large Vision-Language...

arXiv - AI · 3 min ·
[2603.26551] Beyond MACs: Hardware Efficient Architecture Design for Vision Backbones
Computer Vision

[2603.26551] Beyond MACs: Hardware Efficient Architecture Design for Vision Backbones

Abstract page for arXiv paper 2603.26551: Beyond MACs: Hardware Efficient Architecture Design for Vision Backbones

arXiv - AI · 4 min ·

All Content

[2602.19983] Contextual Safety Reasoning and Grounding for Open-World Robots
Robotics

[2602.19983] Contextual Safety Reasoning and Grounding for Open-World Robots

The paper presents CORE, a novel safety framework for open-world robots that enables contextual reasoning and enforcement of safety rules...

arXiv - AI · 4 min ·
[2602.19946] When Pretty Isn't Useful: Investigating Why Modern Text-to-Image Models Fail as Reliable Training Data Generators
Machine Learning

[2602.19946] When Pretty Isn't Useful: Investigating Why Modern Text-to-Image Models Fail as Reliable Training Data Generators

This paper investigates the limitations of modern text-to-image models as reliable training data generators, revealing a decline in class...

arXiv - AI · 4 min ·
[2602.20068] The Invisible Gorilla Effect in Out-of-distribution Detection
Machine Learning

[2602.20068] The Invisible Gorilla Effect in Out-of-distribution Detection

The paper explores the 'Invisible Gorilla Effect' in out-of-distribution (OOD) detection, revealing that detection performance varies bas...

arXiv - Machine Learning · 4 min ·
[2602.20046] Closing the gap in multimodal medical representation alignment
Nlp

[2602.20046] Closing the gap in multimodal medical representation alignment

This paper addresses the modality gap in multimodal medical representation alignment, proposing a framework to enhance alignment between ...

arXiv - Machine Learning · 3 min ·
[2602.19881] Make Some Noise: Unsupervised Remote Sensing Change Detection Using Latent Space Perturbations
Llms

[2602.19881] Make Some Noise: Unsupervised Remote Sensing Change Detection Using Latent Space Perturbations

The paper presents MaSoN, an innovative framework for unsupervised change detection in remote sensing that generates diverse changes in l...

arXiv - AI · 4 min ·
[2602.19872] GOAL: Geometrically Optimal Alignment for Continual Generalized Category Discovery
Ai Safety

[2602.19872] GOAL: Geometrically Optimal Alignment for Continual Generalized Category Discovery

The paper presents GOAL, a framework for Continual Generalized Category Discovery (C-GCD) that enhances class discovery while minimizing ...

arXiv - AI · 3 min ·
[2602.19822] Efficient endometrial carcinoma screening via cross-modal synthesis and gradient distillation
Machine Learning

[2602.19822] Efficient endometrial carcinoma screening via cross-modal synthesis and gradient distillation

This article presents a novel deep learning framework for efficient endometrial carcinoma screening, utilizing cross-modal synthesis and ...

arXiv - AI · 4 min ·
[2602.19907] Gradient based Severity Labeling for Biomarker Classification in OCT
Computer Vision

[2602.19907] Gradient based Severity Labeling for Biomarker Classification in OCT

This paper presents a novel strategy for contrastive learning in medical imaging, specifically for classifying biomarkers in OCT scans, i...

arXiv - Machine Learning · 3 min ·
[2602.19698] Iconographic Classification and Content-Based Recommendation for Digitized Artworks
Machine Learning

[2602.19698] Iconographic Classification and Content-Based Recommendation for Digitized Artworks

This article presents a proof-of-concept system for automating iconographic classification and content-based recommendations for digitize...

arXiv - AI · 3 min ·
[2602.19710] Universal Pose Pretraining for Generalizable Vision-Language-Action Policies
Machine Learning

[2602.19710] Universal Pose Pretraining for Generalizable Vision-Language-Action Policies

The paper presents Pose-VLA, a novel framework for Vision-Language-Action (VLA) models that separates pre-training and post-training phas...

arXiv - Machine Learning · 4 min ·
[2602.19679] TeHOR: Text-Guided 3D Human and Object Reconstruction with Textures
Robotics

[2602.19679] TeHOR: Text-Guided 3D Human and Object Reconstruction with Textures

TeHOR introduces a novel framework for 3D human and object reconstruction using text descriptions, addressing limitations in current meth...

arXiv - AI · 3 min ·
[2602.19631] Localized Concept Erasure in Text-to-Image Diffusion Models via High-Level Representation Misdirection
Machine Learning

[2602.19631] Localized Concept Erasure in Text-to-Image Diffusion Models via High-Level Representation Misdirection

This article discusses a novel approach to concept erasure in text-to-image diffusion models, focusing on High-Level Representation Misdi...

arXiv - AI · 4 min ·
[2602.19539] Can a Teenager Fool an AI? Evaluating Low-Cost Cosmetic Attacks on Age Estimation Systems
Computer Vision

[2602.19539] Can a Teenager Fool an AI? Evaluating Low-Cost Cosmetic Attacks on Age Estimation Systems

This paper evaluates the effectiveness of low-cost cosmetic modifications in deceiving AI age estimation systems, revealing significant v...

arXiv - Machine Learning · 4 min ·
[2602.19623] PedaCo-Gen: Scaffolding Pedagogical Agency in Human-AI Collaborative Video Authoring
Machine Learning

[2602.19623] PedaCo-Gen: Scaffolding Pedagogical Agency in Human-AI Collaborative Video Authoring

PedaCo-Gen is a novel AI system designed to enhance the quality of instructional video creation by integrating pedagogical principles and...

arXiv - AI · 3 min ·
[2602.19608] Satellite-Based Detection of Looted Archaeological Sites Using Machine Learning
Machine Learning

[2602.19608] Satellite-Based Detection of Looted Archaeological Sites Using Machine Learning

This article presents a machine learning approach to detect looted archaeological sites using satellite imagery, demonstrating significan...

arXiv - AI · 4 min ·
[2602.19605] CLCR: Cross-Level Semantic Collaborative Representation for Multimodal Learning
Ai Safety

[2602.19605] CLCR: Cross-Level Semantic Collaborative Representation for Multimodal Learning

The paper presents CLCR, a novel approach for multimodal learning that organizes features into a three-level semantic hierarchy to enhanc...

arXiv - AI · 4 min ·
[2602.19506] Relational Feature Caching for Accelerating Diffusion Transformers
Machine Learning

[2602.19506] Relational Feature Caching for Accelerating Diffusion Transformers

This paper introduces Relational Feature Caching (RFC) to enhance the efficiency of diffusion transformers by improving feature predictio...

arXiv - Machine Learning · 4 min ·
[2602.19461] Laplacian Multi-scale Flow Matching for Generative Modeling
Machine Learning

[2602.19461] Laplacian Multi-scale Flow Matching for Generative Modeling

The paper presents Laplacian Multi-scale Flow Matching (LapFlow), a new framework for image generative modeling that enhances flow matchi...

arXiv - Machine Learning · 3 min ·
[2602.19565] DICArt: Advancing Category-level Articulated Object Pose Estimation in Discrete State-Spaces
Generative Ai

[2602.19565] DICArt: Advancing Category-level Articulated Object Pose Estimation in Discrete State-Spaces

DICArt introduces a novel framework for category-level articulated object pose estimation, utilizing a discrete diffusion process to enha...

arXiv - AI · 4 min ·
[2602.19540] A Green Learning Approach to LDCT Image Restoration
Machine Learning

[2602.19540] A Green Learning Approach to LDCT Image Restoration

This paper presents a Green Learning approach for restoring low-dose computed tomography (LDCT) images, emphasizing mathematical transpar...

arXiv - AI · 3 min ·
Previous Page 22 Next

Related Topics

Stay updated with AI News

Get the latest news, tools, and insights delivered to your inbox.

Daily or weekly digest • Unsubscribe anytime