Computer Vision

Image recognition, detection, and visual AI

Top This Week

[2602.09678] Administrative Law's Fourth Settlement: AI and the Capability-Accountability Trap
Computer Vision

[2602.09678] Administrative Law's Fourth Settlement: AI and the Capability-Accountability Trap

Abstract page for arXiv paper 2602.09678: Administrative Law's Fourth Settlement: AI and the Capability-Accountability Trap

arXiv - AI · 4 min ·
[2601.13622] CARPE: Context-Aware Image Representation Prioritization via Ensemble for Large Vision-Language Models
Llms

[2601.13622] CARPE: Context-Aware Image Representation Prioritization via Ensemble for Large Vision-Language Models

Abstract page for arXiv paper 2601.13622: CARPE: Context-Aware Image Representation Prioritization via Ensemble for Large Vision-Language...

arXiv - AI · 3 min ·
[2603.26551] Beyond MACs: Hardware Efficient Architecture Design for Vision Backbones
Computer Vision

[2603.26551] Beyond MACs: Hardware Efficient Architecture Design for Vision Backbones

Abstract page for arXiv paper 2603.26551: Beyond MACs: Hardware Efficient Architecture Design for Vision Backbones

arXiv - AI · 4 min ·

All Content

[2602.17189] Texo: Formula Recognition within 20M Parameters
Machine Learning

[2602.17189] Texo: Formula Recognition within 20M Parameters

The paper presents Texo, a compact formula recognition model with 20 million parameters, achieving high performance comparable to larger ...

arXiv - AI · 3 min ·
[2602.17145] Bonsai: A Framework for Convolutional Neural Network Acceleration Using Criterion-Based Pruning
Machine Learning

[2602.17145] Bonsai: A Framework for Convolutional Neural Network Acceleration Using Criterion-Based Pruning

The paper introduces Bonsai, a framework for accelerating Convolutional Neural Networks (CNNs) through criterion-based pruning, demonstra...

arXiv - AI · 3 min ·
[2602.16931] Narrow fine-tuning erodes safety alignment in vision-language agents
Llms

[2602.16931] Narrow fine-tuning erodes safety alignment in vision-language agents

The paper explores how narrow fine-tuning of vision-language agents can lead to significant safety alignment issues, highlighting the ris...

arXiv - AI · 3 min ·
[2602.16714] AIdentifyAGE Ontology for Decision Support in Forensic Dental Age Assessment
Nlp

[2602.16714] AIdentifyAGE Ontology for Decision Support in Forensic Dental Age Assessment

The AIdentifyAGE ontology aims to enhance forensic dental age assessment by providing a standardized framework for integrating clinical, ...

arXiv - AI · 4 min ·
Generative Ai

Creative Freedom OR Creative Homogenization? #Pomelli

The article discusses the implications of Google's Pomelli feature, which generates product visuals using AI, raising questions about cre...

Reddit - Artificial Intelligence · 1 min ·
Nlp

I built a free local AI image search app — find images by typing what's in them

Makimus-AI is a free, open-source local app that enables users to search their image libraries using natural language queries, functionin...

Reddit - Artificial Intelligence · 1 min ·
Computer Vision

[D] CVPR Decisions

This Reddit thread serves as a community hub for discussions and updates regarding the decisions for CVPR‘26, a prominent conference in c...

Reddit - Machine Learning · 1 min ·
Machine Learning

[D] Native Vision-Language vs Modular: The Qwen Approach.

The Qwen3.5 model trains on visual-text tokens natively, potentially addressing the 'modality gap' found in CLIP-based models, enhancing ...

Reddit - Machine Learning · 1 min ·
Llms

It's only with me or your GPT 5.2 is completely crazy about one week till now?

The article discusses user frustrations with the recent performance issues of GPT-5.2, highlighting problems with OCR accuracy and file g...

Reddit - Artificial Intelligence · 1 min ·
[2511.14147] Imaging with super-resolution in changing random media
Data Science

[2511.14147] Imaging with super-resolution in changing random media

This article presents a novel imaging algorithm that utilizes strong scattering to achieve super-resolution in dynamic random media, enha...

arXiv - Machine Learning · 3 min ·
[2507.08831] View Invariant Learning for Vision-Language Navigation in Continuous Environments
Robotics

[2507.08831] View Invariant Learning for Vision-Language Navigation in Continuous Environments

This paper introduces View Invariant Learning (VIL) for enhancing Vision-Language Navigation in Continuous Environments (VLNCE), addressi...

arXiv - Machine Learning · 4 min ·
[2504.13519] Filter2Noise: A Framework for Interpretable and Zero-Shot Low-Dose CT Image Denoising
Machine Learning

[2504.13519] Filter2Noise: A Framework for Interpretable and Zero-Shot Low-Dose CT Image Denoising

The paper presents Filter2Noise, a novel framework for interpretable and zero-shot low-dose CT image denoising, achieving state-of-the-ar...

arXiv - Machine Learning · 4 min ·
[2602.12207] VIRENA: Virtual Arena for Research, Education, and Democratic Innovation
Computer Vision

[2602.12207] VIRENA: Virtual Arena for Research, Education, and Democratic Innovation

VIRENA is a novel platform designed for controlled experimentation in social media environments, enabling researchers to study human-AI i...

arXiv - AI · 4 min ·
[2503.20711] Demand Estimation with Text and Image Data
Machine Learning

[2503.20711] Demand Estimation with Text and Image Data

This article presents a novel demand estimation method that utilizes unstructured data from text and images to enhance substitution patte...

arXiv - Machine Learning · 3 min ·
[2602.07680] Vision and Language: Novel Representations and Artificial intelligence for Driving Scene Safety Assessment and Autonomous Vehicle Planning
Llms

[2602.07680] Vision and Language: Novel Representations and Artificial intelligence for Driving Scene Safety Assessment and Autonomous Vehicle Planning

This paper explores the integration of vision-language models in autonomous driving, focusing on safety assessment and decision-making th...

arXiv - Machine Learning · 4 min ·
[2412.00364] LMSeg: Unleashing the Power of Large-Scale Models for Open-Vocabulary Semantic Segmentation
Llms

[2412.00364] LMSeg: Unleashing the Power of Large-Scale Models for Open-Vocabulary Semantic Segmentation

The paper presents LMSeg, a novel approach for open-vocabulary semantic segmentation that enhances visual and linguistic feature alignmen...

arXiv - Machine Learning · 4 min ·
[2602.05023] Do Vision-Language Models Respect Contextual Integrity in Location Disclosure?
Llms

[2602.05023] Do Vision-Language Models Respect Contextual Integrity in Location Disclosure?

This article examines whether vision-language models (VLMs) respect contextual integrity when disclosing location information, highlighti...

arXiv - AI · 4 min ·
[2411.12070] Autoassociative Learning of Structural Representations for Modeling and Classification in Medical Imaging
Machine Learning

[2411.12070] Autoassociative Learning of Structural Representations for Modeling and Classification in Medical Imaging

This article presents a novel approach to medical imaging classification using autoassociative learning, demonstrating improved accuracy ...

arXiv - Machine Learning · 3 min ·
[2510.12768] Uncertainty Matters in Dynamic Gaussian Splatting for Monocular 4D Reconstruction
Machine Learning

[2510.12768] Uncertainty Matters in Dynamic Gaussian Splatting for Monocular 4D Reconstruction

This paper presents USplat4D, a novel framework for monocular 4D reconstruction that incorporates uncertainty in dynamic Gaussian splatti...

arXiv - AI · 4 min ·
[2602.08755] Align and Adapt: Multimodal Multiview Human Activity Recognition under Arbitrary View Combinations
Machine Learning

[2602.08755] Align and Adapt: Multimodal Multiview Human Activity Recognition under Arbitrary View Combinations

The paper presents AliAd, a model for multimodal multiview human activity recognition that enhances performance by integrating diverse vi...

arXiv - Machine Learning · 4 min ·
Previous Page 31 Next

Related Topics

Stay updated with AI News

Get the latest news, tools, and insights delivered to your inbox.

Daily or weekly digest • Unsubscribe anytime