Computer Vision

Image recognition, detection, and visual AI

Top This Week

[2602.09678] Administrative Law's Fourth Settlement: AI and the Capability-Accountability Trap
Computer Vision

[2602.09678] Administrative Law's Fourth Settlement: AI and the Capability-Accountability Trap

Abstract page for arXiv paper 2602.09678: Administrative Law's Fourth Settlement: AI and the Capability-Accountability Trap

arXiv - AI · 4 min ·
[2601.13622] CARPE: Context-Aware Image Representation Prioritization via Ensemble for Large Vision-Language Models
Llms

[2601.13622] CARPE: Context-Aware Image Representation Prioritization via Ensemble for Large Vision-Language Models

Abstract page for arXiv paper 2601.13622: CARPE: Context-Aware Image Representation Prioritization via Ensemble for Large Vision-Language...

arXiv - AI · 3 min ·
[2603.26551] Beyond MACs: Hardware Efficient Architecture Design for Vision Backbones
Computer Vision

[2603.26551] Beyond MACs: Hardware Efficient Architecture Design for Vision Backbones

Abstract page for arXiv paper 2603.26551: Beyond MACs: Hardware Efficient Architecture Design for Vision Backbones

arXiv - AI · 4 min ·

All Content

Machine Learning

[D] Submit to ECCV or opt in for CVPR findings?

The article discusses the dilemma of submitting a paper to ECCV or opting for CVPR Findings, highlighting confusion around the perception...

Reddit - Machine Learning · 1 min ·
Nlp

[R] Vision+Time Series data Encoder

The article discusses the need for a vision and time series data encoder, seeking recent research and pre-trained models for generating e...

Reddit - Machine Learning · 1 min ·
Ai Startups

TikTok creators’ Seedance 2.0 AI is hyperrealistic, arrived “seemingly out of nowhere,” and is spooking Hollywood

TikTok creators have unveiled Seedance 2.0 AI, a hyperrealistic technology that is causing concern in Hollywood due to its potential impa...

Reddit - Artificial Intelligence · 1 min ·
Llms

[R] Can Vision-Language Models See Squares? Text-Recognition Mediates Spatial Reasoning Across Three Model Families

This article discusses a study on Vision-Language Models (VLMs) that highlights their performance disparity in recognizing binary grids r...

Reddit - Machine Learning · 1 min ·
OpenAI’s first ChatGPT gadget could be a smart speaker with a camera | The Verge
Llms

OpenAI’s first ChatGPT gadget could be a smart speaker with a camera | The Verge

OpenAI is reportedly developing its first hardware product, a smart speaker with a camera, alongside potential smart glasses and a lamp, ...

The Verge - AI · 3 min ·
[2510.25166] A Study on Inference Latency for Vision Transformers on Mobile Devices
Machine Learning

[2510.25166] A Study on Inference Latency for Vision Transformers on Mobile Devices

This study quantitatively analyzes the inference latency of 190 vision transformers (ViTs) on mobile devices, comparing them to 102 convo...

arXiv - Machine Learning · 3 min ·
[2510.12581] LayerSync: Self-aligning Intermediate Layers
Machine Learning

[2510.12581] LayerSync: Self-aligning Intermediate Layers

LayerSync introduces a novel approach to enhance diffusion models by self-aligning intermediate layers, improving training efficiency and...

arXiv - Machine Learning · 4 min ·
[2505.11235] Efficient Orthogonal Fine-Tuning with Principal Subspace Adaptation
Machine Learning

[2505.11235] Efficient Orthogonal Fine-Tuning with Principal Subspace Adaptation

The paper presents Efficient Orthogonal Fine-Tuning with Principal Subspace Adaptation (PSOFT), a method that enhances parameter-efficien...

arXiv - Machine Learning · 4 min ·
[2602.16979] Characterizing the Predictive Impact of Modalities with Supervised Latent-Variable Modeling
Llms

[2602.16979] Characterizing the Predictive Impact of Modalities with Supervised Latent-Variable Modeling

The paper presents PRIMO, a supervised latent-variable model that addresses the challenges of incomplete multimodal data by quantifying t...

arXiv - Machine Learning · 4 min ·
[2602.16749] U-FedTomAtt: Ultra-lightweight Federated Learning with Attention for Tomato Disease Recognition
Machine Learning

[2602.16749] U-FedTomAtt: Ultra-lightweight Federated Learning with Attention for Tomato Disease Recognition

The paper presents U-FedTomAtt, an ultra-lightweight federated learning framework designed for tomato disease recognition, optimizing per...

arXiv - Machine Learning · 4 min ·
[2602.17642] A.R.I.S.: Automated Recycling Identification System for E-Waste Classification Using Deep Learning
Machine Learning

[2602.17642] A.R.I.S.: Automated Recycling Identification System for E-Waste Classification Using Deep Learning

The A.R.I.S. system utilizes deep learning to enhance e-waste recycling by accurately classifying materials in real-time, improving recov...

arXiv - Machine Learning · 3 min ·
[2602.17350] Shortcut learning in geometric knot classification
Machine Learning

[2602.17350] Shortcut learning in geometric knot classification

This paper explores the application of machine learning to classify geometric knots, addressing the challenge of identifying equivalent e...

arXiv - Machine Learning · 4 min ·
[2602.17321] The Sound of Death: Deep Learning Reveals Vascular Damage from Carotid Ultrasound
Machine Learning

[2602.17321] The Sound of Death: Deep Learning Reveals Vascular Damage from Carotid Ultrasound

This article presents a machine learning framework that analyzes carotid ultrasound videos to identify vascular damage, enhancing early d...

arXiv - Machine Learning · 4 min ·
[2602.17270] Unified Latents (UL): How to train your latents
Machine Learning

[2602.17270] Unified Latents (UL): How to train your latents

The paper introduces Unified Latents (UL), a framework for training latent representations using a diffusion prior, achieving competitive...

arXiv - Machine Learning · 3 min ·
[2602.17117] i-PhysGaussian: Implicit Physical Simulation for 3D Gaussian Splatting
Machine Learning

[2602.17117] i-PhysGaussian: Implicit Physical Simulation for 3D Gaussian Splatting

The paper introduces i-PhysGaussian, a novel framework for implicit physical simulation that enhances 3D Gaussian Splatting by minimizing...

arXiv - Machine Learning · 3 min ·
[2602.06355] Di3PO - Diptych Diffusion DPO for Targeted Improvements in Image Generation
Machine Learning

[2602.06355] Di3PO - Diptych Diffusion DPO for Targeted Improvements in Image Generation

The paper presents Di3PO, a novel method for improving image generation in text-to-image diffusion models by efficiently creating targete...

arXiv - AI · 3 min ·
[2601.01224] Improved Object-Centric Diffusion Learning with Registers and Contrastive Alignment
Machine Learning

[2601.01224] Improved Object-Centric Diffusion Learning with Registers and Contrastive Alignment

This paper presents Contrastive Object-centric Diffusion Alignment (CODA), an enhancement to object-centric learning that reduces slot en...

arXiv - AI · 4 min ·
[2512.19941] Block-Recurrent Dynamics in Vision Transformers
Machine Learning

[2512.19941] Block-Recurrent Dynamics in Vision Transformers

This article introduces the Block-Recurrent Hypothesis (BRH) for Vision Transformers, proposing a new framework for understanding their c...

arXiv - Machine Learning · 4 min ·
[2512.07984] Restrictive Hierarchical Semantic Segmentation for Stratified Tooth Layer Detection
Machine Learning

[2512.07984] Restrictive Hierarchical Semantic Segmentation for Stratified Tooth Layer Detection

This article presents a novel framework for hierarchical semantic segmentation aimed at improving the detection of stratified tooth layer...

arXiv - AI · 4 min ·
[2511.14654] Improving segmentation of retinal arteries and veins using cardiac signal in doppler holograms
Computer Vision

[2511.14654] Improving segmentation of retinal arteries and veins using cardiac signal in doppler holograms

This article presents a novel approach to segmenting retinal arteries and veins using cardiac signals in Doppler holograms, enhancing tra...

arXiv - AI · 3 min ·
Previous Page 29 Next

Related Topics

Stay updated with AI News

Get the latest news, tools, and insights delivered to your inbox.

Daily or weekly digest • Unsubscribe anytime