Computer Vision

Image recognition, detection, and visual AI

Top This Week

[2602.09678] Administrative Law's Fourth Settlement: AI and the Capability-Accountability Trap
Computer Vision

[2602.09678] Administrative Law's Fourth Settlement: AI and the Capability-Accountability Trap

Abstract page for arXiv paper 2602.09678: Administrative Law's Fourth Settlement: AI and the Capability-Accountability Trap

arXiv - AI · 4 min ·
[2601.13622] CARPE: Context-Aware Image Representation Prioritization via Ensemble for Large Vision-Language Models
Llms

[2601.13622] CARPE: Context-Aware Image Representation Prioritization via Ensemble for Large Vision-Language Models

Abstract page for arXiv paper 2601.13622: CARPE: Context-Aware Image Representation Prioritization via Ensemble for Large Vision-Language...

arXiv - AI · 3 min ·
[2603.26551] Beyond MACs: Hardware Efficient Architecture Design for Vision Backbones
Computer Vision

[2603.26551] Beyond MACs: Hardware Efficient Architecture Design for Vision Backbones

Abstract page for arXiv paper 2603.26551: Beyond MACs: Hardware Efficient Architecture Design for Vision Backbones

arXiv - AI · 4 min ·

All Content

[2602.15277] Accelerating Large-Scale Dataset Distillation via Exploration-Exploitation Optimization
Machine Learning

[2602.15277] Accelerating Large-Scale Dataset Distillation via Exploration-Exploitation Optimization

This paper presents Exploration-Exploitation Distillation (E^2D), a method for efficient large-scale dataset distillation that balances a...

arXiv - Machine Learning · 4 min ·
[2602.15181] Time-Archival Camera Virtualization for Sports and Visual Performances
Computer Vision

[2602.15181] Time-Archival Camera Virtualization for Sports and Visual Performances

This paper presents a novel approach to camera virtualization for sports and visual performances, enabling photorealistic rendering from ...

arXiv - Machine Learning · 4 min ·
[2602.15154] Loss Knows Best: Detecting Annotation Errors in Videos via Loss Trajectories
Machine Learning

[2602.15154] Loss Knows Best: Detecting Annotation Errors in Videos via Loss Trajectories

The paper presents a novel method for detecting annotation errors in video datasets by analyzing loss trajectories, enhancing model train...

arXiv - Machine Learning · 4 min ·
[2602.15294] EAA: Automating materials characterization with vision language model agents
Llms

[2602.15294] EAA: Automating materials characterization with vision language model agents

The paper introduces Experiment Automation Agents (EAA), a system leveraging vision-language models to automate complex microscopy workfl...

arXiv - AI · 3 min ·
[2602.15087] StrokeNeXt: A Siamese-encoder Approach for Brain Stroke Classification in Computed Tomography Imagery
Machine Learning

[2602.15087] StrokeNeXt: A Siamese-encoder Approach for Brain Stroke Classification in Computed Tomography Imagery

StrokeNeXt introduces a Siamese-encoder model for classifying brain strokes in CT images, achieving high accuracy and low misclassificati...

arXiv - Machine Learning · 3 min ·
[2306.17652] Accurate 2D Reconstruction for PET Scanners based on the Analytical White Image Model
Machine Learning

[2306.17652] Accurate 2D Reconstruction for PET Scanners based on the Analytical White Image Model

This paper presents a mathematical model for accurate 2D reconstruction in PET scanners, utilizing an Analytical White Image Model to enh...

arXiv - Machine Learning · 4 min ·
[2602.15067] Attention-gated U-Net model for semantic segmentation of brain tumors and feature extraction for survival prognosis
Machine Learning

[2602.15067] Attention-gated U-Net model for semantic segmentation of brain tumors and feature extraction for survival prognosis

The article presents an Attention-Gated U-Net model for semantic segmentation of brain tumors, enhancing treatment planning through impro...

arXiv - AI · 3 min ·
[2602.15648] Guided Diffusion by Optimized Loss Functions on Relaxed Parameters for Inverse Material Design
Generative Ai

[2602.15648] Guided Diffusion by Optimized Loss Functions on Relaxed Parameters for Inverse Material Design

This paper presents a novel method for inverse material design using guided diffusion and optimized loss functions, addressing challenges...

arXiv - Machine Learning · 4 min ·
[2602.15460] On the Out-of-Distribution Generalization of Reasoning in Multimodal LLMs for Simple Visual Planning Tasks
Llms

[2602.15460] On the Out-of-Distribution Generalization of Reasoning in Multimodal LLMs for Simple Visual Planning Tasks

This paper evaluates the out-of-distribution generalization of reasoning in multimodal large language models (LLMs) through a grid-based ...

arXiv - Machine Learning · 4 min ·
[2602.15393] Doubly Stochastic Mean-Shift Clustering
Nlp

[2602.15393] Doubly Stochastic Mean-Shift Clustering

The paper presents Doubly Stochastic Mean-Shift (DSMS), an innovative clustering algorithm that enhances standard Mean-Shift methods by i...

arXiv - Machine Learning · 3 min ·
[2602.15200] COMPOT: Calibration-Optimized Matrix Procrustes Orthogonalization for Transformers Compression
Machine Learning

[2602.15200] COMPOT: Calibration-Optimized Matrix Procrustes Orthogonalization for Transformers Compression

The paper presents COMPOT, a novel framework for compressing Transformer models using Calibration-Optimized Matrix Procrustes Orthogonali...

arXiv - Machine Learning · 3 min ·
[2602.15183] Seeing to Generalize: How Visual Data Corrects Binding Shortcuts
Llms

[2602.15183] Seeing to Generalize: How Visual Data Corrects Binding Shortcuts

This article explores how Vision Language Models (VLMs) enhance performance on text-only tasks by correcting binding shortcuts through vi...

arXiv - Machine Learning · 4 min ·
[2602.15155] Refine Now, Query Fast: A Decoupled Refinement Paradigm for Implicit Neural Fields
Machine Learning

[2602.15155] Refine Now, Query Fast: A Decoupled Refinement Paradigm for Implicit Neural Fields

The paper presents a Decoupled Representation Refinement (DRR) paradigm for Implicit Neural Representations (INRs), enhancing speed and f...

arXiv - Machine Learning · 4 min ·
Apple is reportedly planning to launch AI-powered glasses, a pendant, and AirPods | The Verge
Ai Startups

Apple is reportedly planning to launch AI-powered glasses, a pendant, and AirPods | The Verge

Apple is set to launch AI-powered smart glasses, a pendant, and upgraded AirPods, enhancing its AI hardware lineup with features like cam...

The Verge - AI · 5 min ·
Generative Ai

best ai headshot generator – which one really works?

This article explores the effectiveness of various AI headshot generators, focusing on the author's experience with Headshot Kiwi, highli...

Reddit - Artificial Intelligence · 1 min ·
[2602.11858] Zooming without Zooming: Region-to-Image Distillation for Fine-Grained Multimodal Perception
Llms

[2602.11858] Zooming without Zooming: Region-to-Image Distillation for Fine-Grained Multimodal Perception

The paper presents Region-to-Image Distillation, a novel approach to enhance fine-grained multimodal perception in MLLMs by internalizing...

arXiv - AI · 4 min ·
[2602.11554] HyperDet: 3D Object Detection with Hyper 4D Radar Point Clouds
Computer Vision

[2602.11554] HyperDet: 3D Object Detection with Hyper 4D Radar Point Clouds

The paper presents HyperDet, a novel framework for 3D object detection using hyper 4D radar point clouds, addressing limitations of tradi...

arXiv - Machine Learning · 4 min ·
[2602.11575] ReaDy-Go: Real-to-Sim Dynamic 3D Gaussian Splatting Simulation for Environment-Specific Visual Navigation with Moving Obstacles
Machine Learning

[2602.11575] ReaDy-Go: Real-to-Sim Dynamic 3D Gaussian Splatting Simulation for Environment-Specific Visual Navigation with Moving Obstacles

The paper presents ReaDy-Go, a novel simulation pipeline that enhances visual navigation in dynamic environments by integrating 3D Gaussi...

arXiv - AI · 4 min ·
[2602.10551] C^2ROPE: Causal Continuous Rotary Positional Encoding for 3D Large Multimodal-Models Reasoning
Llms

[2602.10551] C^2ROPE: Causal Continuous Rotary Positional Encoding for 3D Large Multimodal-Models Reasoning

The paper presents C^2ROPE, an advanced positional encoding method for 3D Large Multimodal Models, addressing limitations of existing Rot...

arXiv - AI · 4 min ·
[2602.07047] ShapBPT: Image Feature Attributions Using Data-Aware Binary Partition Trees
Machine Learning

[2602.07047] ShapBPT: Image Feature Attributions Using Data-Aware Binary Partition Trees

The paper introduces ShapBPT, a novel method for image feature attributions using data-aware binary partition trees, enhancing interpreta...

arXiv - Machine Learning · 4 min ·
Previous Page 36 Next

Related Topics

Stay updated with AI News

Get the latest news, tools, and insights delivered to your inbox.

Daily or weekly digest • Unsubscribe anytime