Computer Vision

Image recognition, detection, and visual AI

This Week's Best | Monthly Best | Guide | Trending

Top This Week

Computer Vision

[2602.09678] Administrative Law's Fourth Settlement: AI and the Capability-Accountability Trap

Abstract page for arXiv paper 2602.09678: Administrative Law's Fourth Settlement: AI and the Capability-Accountability Trap

arXiv - AI · 4 min · about 17 hours ago

Llms

[2601.13622] CARPE: Context-Aware Image Representation Prioritization via Ensemble for Large Vision-Language Models

Abstract page for arXiv paper 2601.13622: CARPE: Context-Aware Image Representation Prioritization via Ensemble for Large Vision-Language...

arXiv - AI · 3 min · about 17 hours ago

Computer Vision

[2603.26551] Beyond MACs: Hardware Efficient Architecture Design for Vision Backbones

Abstract page for arXiv paper 2603.26551: Beyond MACs: Hardware Efficient Architecture Design for Vision Backbones

arXiv - AI · 4 min · about 17 hours ago

All Content

Machine Learning

[2602.17189] Texo: Formula Recognition within 20M Parameters

The paper presents Texo, a compact formula recognition model with 20 million parameters, achieving high performance comparable to larger ...

arXiv - AI · 3 min · about 1 month ago

Machine Learning

[2602.17145] Bonsai: A Framework for Convolutional Neural Network Acceleration Using Criterion-Based Pruning

The paper introduces Bonsai, a framework for accelerating Convolutional Neural Networks (CNNs) through criterion-based pruning, demonstra...

arXiv - AI · 3 min · about 1 month ago

Llms

[2602.16931] Narrow fine-tuning erodes safety alignment in vision-language agents

The paper explores how narrow fine-tuning of vision-language agents can lead to significant safety alignment issues, highlighting the ris...

arXiv - AI · 3 min · about 1 month ago

Nlp

[2602.16714] AIdentifyAGE Ontology for Decision Support in Forensic Dental Age Assessment

The AIdentifyAGE ontology aims to enhance forensic dental age assessment by providing a standardized framework for integrating clinical, ...

arXiv - AI · 4 min · about 1 month ago

Generative Ai

Creative Freedom OR Creative Homogenization? #Pomelli

The article discusses the implications of Google's Pomelli feature, which generates product visuals using AI, raising questions about cre...

Reddit - Artificial Intelligence · 1 min · about 1 month ago

Nlp

I built a free local AI image search app — find images by typing what's in them

Makimus-AI is a free, open-source local app that enables users to search their image libraries using natural language queries, functionin...

Reddit - Artificial Intelligence · 1 min · about 1 month ago

Computer Vision

[D] CVPR Decisions

This Reddit thread serves as a community hub for discussions and updates regarding the decisions for CVPR‘26, a prominent conference in c...

Reddit - Machine Learning · 1 min · about 1 month ago

Machine Learning

[D] Native Vision-Language vs Modular: The Qwen Approach.

The Qwen3.5 model trains on visual-text tokens natively, potentially addressing the 'modality gap' found in CLIP-based models, enhancing ...

Reddit - Machine Learning · 1 min · about 1 month ago

Llms

It's only with me or your GPT 5.2 is completely crazy about one week till now?

The article discusses user frustrations with the recent performance issues of GPT-5.2, highlighting problems with OCR accuracy and file g...

Reddit - Artificial Intelligence · 1 min · about 1 month ago

Data Science

[2511.14147] Imaging with super-resolution in changing random media

This article presents a novel imaging algorithm that utilizes strong scattering to achieve super-resolution in dynamic random media, enha...

arXiv - Machine Learning · 3 min · about 1 month ago

Robotics

[2507.08831] View Invariant Learning for Vision-Language Navigation in Continuous Environments

This paper introduces View Invariant Learning (VIL) for enhancing Vision-Language Navigation in Continuous Environments (VLNCE), addressi...

arXiv - Machine Learning · 4 min · about 1 month ago

Machine Learning

[2504.13519] Filter2Noise: A Framework for Interpretable and Zero-Shot Low-Dose CT Image Denoising

The paper presents Filter2Noise, a novel framework for interpretable and zero-shot low-dose CT image denoising, achieving state-of-the-ar...

arXiv - Machine Learning · 4 min · about 1 month ago

Computer Vision

[2602.12207] VIRENA: Virtual Arena for Research, Education, and Democratic Innovation

VIRENA is a novel platform designed for controlled experimentation in social media environments, enabling researchers to study human-AI i...

arXiv - AI · 4 min · about 1 month ago

Machine Learning

[2503.20711] Demand Estimation with Text and Image Data

This article presents a novel demand estimation method that utilizes unstructured data from text and images to enhance substitution patte...

arXiv - Machine Learning · 3 min · about 1 month ago

Llms

[2602.07680] Vision and Language: Novel Representations and Artificial intelligence for Driving Scene Safety Assessment and Autonomous Vehicle Planning

This paper explores the integration of vision-language models in autonomous driving, focusing on safety assessment and decision-making th...

arXiv - Machine Learning · 4 min · about 1 month ago

Llms

[2412.00364] LMSeg: Unleashing the Power of Large-Scale Models for Open-Vocabulary Semantic Segmentation

The paper presents LMSeg, a novel approach for open-vocabulary semantic segmentation that enhances visual and linguistic feature alignmen...

arXiv - Machine Learning · 4 min · about 1 month ago

Llms

[2602.05023] Do Vision-Language Models Respect Contextual Integrity in Location Disclosure?

This article examines whether vision-language models (VLMs) respect contextual integrity when disclosing location information, highlighti...

arXiv - AI · 4 min · about 1 month ago

Machine Learning

[2411.12070] Autoassociative Learning of Structural Representations for Modeling and Classification in Medical Imaging

This article presents a novel approach to medical imaging classification using autoassociative learning, demonstrating improved accuracy ...

arXiv - Machine Learning · 3 min · about 1 month ago

Machine Learning

[2510.12768] Uncertainty Matters in Dynamic Gaussian Splatting for Monocular 4D Reconstruction

This paper presents USplat4D, a novel framework for monocular 4D reconstruction that incorporates uncertainty in dynamic Gaussian splatti...

arXiv - AI · 4 min · about 1 month ago

Machine Learning

[2602.08755] Align and Adapt: Multimodal Multiview Human Activity Recognition under Arbitrary View Combinations

The paper presents AliAd, a model for multimodal multiview human activity recognition that enhances performance by integrating diverse vi...

arXiv - Machine Learning · 4 min · about 1 month ago

Previous Page 31 Next

Stay updated with AI News

Get the latest news, tools, and insights delivered to your inbox.

Subscribe to Newsletter

Daily or weekly digest • Unsubscribe anytime

Computer Vision

Top This Week

[2602.09678] Administrative Law's Fourth Settlement: AI and the Capability-Accountability Trap

[2601.13622] CARPE: Context-Aware Image Representation Prioritization via Ensemble for Large Vision-Language Models

[2603.26551] Beyond MACs: Hardware Efficient Architecture Design for Vision Backbones

All Content

[2602.17189] Texo: Formula Recognition within 20M Parameters

[2602.17145] Bonsai: A Framework for Convolutional Neural Network Acceleration Using Criterion-Based Pruning

[2602.16931] Narrow fine-tuning erodes safety alignment in vision-language agents

[2602.16714] AIdentifyAGE Ontology for Decision Support in Forensic Dental Age Assessment

Creative Freedom OR Creative Homogenization? #Pomelli

I built a free local AI image search app — find images by typing what's in them

[D] CVPR Decisions

[D] Native Vision-Language vs Modular: The Qwen Approach.

It's only with me or your GPT 5.2 is completely crazy about one week till now?

[2511.14147] Imaging with super-resolution in changing random media

[2507.08831] View Invariant Learning for Vision-Language Navigation in Continuous Environments

[2504.13519] Filter2Noise: A Framework for Interpretable and Zero-Shot Low-Dose CT Image Denoising

[2602.12207] VIRENA: Virtual Arena for Research, Education, and Democratic Innovation

[2503.20711] Demand Estimation with Text and Image Data

[2602.07680] Vision and Language: Novel Representations and Artificial intelligence for Driving Scene Safety Assessment and Autonomous Vehicle Planning

[2412.00364] LMSeg: Unleashing the Power of Large-Scale Models for Open-Vocabulary Semantic Segmentation

[2602.05023] Do Vision-Language Models Respect Contextual Integrity in Location Disclosure?

[2411.12070] Autoassociative Learning of Structural Representations for Modeling and Classification in Medical Imaging

[2510.12768] Uncertainty Matters in Dynamic Gaussian Splatting for Monocular 4D Reconstruction

[2602.08755] Align and Adapt: Multimodal Multiview Human Activity Recognition under Arbitrary View Combinations

Related Topics

Stay updated with AI News