Computer Vision

Image recognition, detection, and visual AI

Top This Week

[2602.09678] Administrative Law's Fourth Settlement: AI and the Capability-Accountability Trap
Computer Vision

[2602.09678] Administrative Law's Fourth Settlement: AI and the Capability-Accountability Trap

Abstract page for arXiv paper 2602.09678: Administrative Law's Fourth Settlement: AI and the Capability-Accountability Trap

arXiv - AI · 4 min ·
[2601.13622] CARPE: Context-Aware Image Representation Prioritization via Ensemble for Large Vision-Language Models
Llms

[2601.13622] CARPE: Context-Aware Image Representation Prioritization via Ensemble for Large Vision-Language Models

Abstract page for arXiv paper 2601.13622: CARPE: Context-Aware Image Representation Prioritization via Ensemble for Large Vision-Language...

arXiv - AI · 3 min ·
[2603.26551] Beyond MACs: Hardware Efficient Architecture Design for Vision Backbones
Computer Vision

[2603.26551] Beyond MACs: Hardware Efficient Architecture Design for Vision Backbones

Abstract page for arXiv paper 2603.26551: Beyond MACs: Hardware Efficient Architecture Design for Vision Backbones

arXiv - AI · 4 min ·

All Content

[2602.18729] MiSCHiEF: A Benchmark in Minimal-Pairs of Safety and Culture for Holistic Evaluation of Fine-Grained Image-Caption Alignment
Llms

[2602.18729] MiSCHiEF: A Benchmark in Minimal-Pairs of Safety and Culture for Holistic Evaluation of Fine-Grained Image-Caption Alignment

The paper presents MiSCHiEF, a benchmark for evaluating fine-grained image-caption alignment, focusing on safety and cultural contexts, h...

arXiv - AI · 4 min ·
[2602.18702] Think with Grounding: Curriculum Reinforced Reasoning with Video Grounding for Long Video Understanding
Machine Learning

[2602.18702] Think with Grounding: Curriculum Reinforced Reasoning with Video Grounding for Long Video Understanding

The paper presents Video-TwG, a curriculum reinforced framework for improving long video understanding through selective video grounding ...

arXiv - AI · 4 min ·
[2602.19641] Evaluating the Impact of Data Anonymization on Image Retrieval
Nlp

[2602.19641] Evaluating the Impact of Data Anonymization on Image Retrieval

This article evaluates how data anonymization affects the performance of Content-Based Image Retrieval (CBIR) systems, highlighting the b...

arXiv - Machine Learning · 4 min ·
[2602.18589] DM4CT: Benchmarking Diffusion Models for Computed Tomography Reconstruction
Machine Learning

[2602.18589] DM4CT: Benchmarking Diffusion Models for Computed Tomography Reconstruction

The paper presents DM4CT, a benchmark for evaluating diffusion models in computed tomography (CT) reconstruction, addressing practical ch...

arXiv - AI · 4 min ·
[2602.18585] BloomNet: Exploring Single vs. Multiple Object Annotation for Flower Recognition Using YOLO Variants
Computer Vision

[2602.18585] BloomNet: Exploring Single vs. Multiple Object Annotation for Flower Recognition Using YOLO Variants

The paper explores the effectiveness of single versus multiple object annotation for flower recognition using various YOLO models, presen...

arXiv - AI · 4 min ·
[2602.19512] Variational Trajectory Optimization of Anisotropic Diffusion Schedules
Machine Learning

[2602.19512] Variational Trajectory Optimization of Anisotropic Diffusion Schedules

This paper presents a variational framework for optimizing anisotropic diffusion schedules in machine learning, enhancing performance acr...

arXiv - Machine Learning · 3 min ·
[2602.18540] Rodent-Bench
Llms

[2602.18540] Rodent-Bench

Rodent-Bench introduces a benchmark for evaluating Multimodal Large Language Models (MLLMs) in annotating rodent behavior videos, reveali...

arXiv - AI · 3 min ·
[2602.18532] VLANeXt: Recipes for Building Strong VLA Models
Llms

[2602.18532] VLANeXt: Recipes for Building Strong VLA Models

The paper presents VLANeXt, a framework for building effective Vision-Language-Action (VLA) models, addressing inconsistencies in trainin...

arXiv - AI · 4 min ·
[2602.18527] JAEGER: Joint 3D Audio-Visual Grounding and Reasoning in Simulated Physical Environments
Llms

[2602.18527] JAEGER: Joint 3D Audio-Visual Grounding and Reasoning in Simulated Physical Environments

The paper presents JAEGER, a framework for joint 3D audio-visual grounding and reasoning, addressing limitations of existing 2D models by...

arXiv - AI · 4 min ·
[2602.18520] Sketch2Feedback: Grammar-in-the-Loop Framework for Rubric-Aligned Feedback on Student STEM Diagrams
Machine Learning

[2602.18520] Sketch2Feedback: Grammar-in-the-Loop Framework for Rubric-Aligned Feedback on Student STEM Diagrams

The paper presents Sketch2Feedback, a framework that enhances feedback on student-drawn STEM diagrams by integrating grammar rules to red...

arXiv - AI · 4 min ·
[2602.18504] A Computer Vision Framework for Multi-Class Detection and Tracking in Soccer Broadcast Footage
Computer Vision

[2602.18504] A Computer Vision Framework for Multi-Class Detection and Tracking in Soccer Broadcast Footage

This paper presents a computer vision framework for detecting and tracking players and the ball in soccer broadcast footage using a singl...

arXiv - AI · 3 min ·
[2602.18296] Context-Aware Mapping of 2D Drawing Annotations to 3D CAD Features Using LLM-Assisted Reasoning for Manufacturing Automation
Llms

[2602.18296] Context-Aware Mapping of 2D Drawing Annotations to 3D CAD Features Using LLM-Assisted Reasoning for Manufacturing Automation

This article presents a framework for mapping 2D drawing annotations to 3D CAD features using context-aware reasoning, enhancing manufact...

arXiv - AI · 4 min ·
[2511.18765] NI-Tex: Non-isometric Image-based Garment Texture Generation
Computer Vision

[2511.18765] NI-Tex: Non-isometric Image-based Garment Texture Generation

The paper presents NI-Tex, a method for generating non-isometric garment textures using a new dataset and advanced techniques for cross-p...

arXiv - AI · 4 min ·
[2602.19033] A Markovian View of Iterative-Feedback Loops in Image Generative Models: Neural Resonance and Model Collapse
Machine Learning

[2602.19033] A Markovian View of Iterative-Feedback Loops in Image Generative Models: Neural Resonance and Model Collapse

This paper explores iterative feedback loops in image generative models, introducing the concept of neural resonance and its implications...

arXiv - AI · 4 min ·
[2507.19418] DEFNet: Multitasks-based Deep Evidential Fusion Network for Blind Image Quality Assessment
Computer Vision

[2507.19418] DEFNet: Multitasks-based Deep Evidential Fusion Network for Blind Image Quality Assessment

The paper introduces DEFNet, a multitask-based deep evidential fusion network designed to enhance blind image quality assessment (BIQA) b...

arXiv - AI · 3 min ·
[2602.19027] Pushing the Limits of Inverse Lithography with Generative Reinforcement Learning
Generative Ai

[2602.19027] Pushing the Limits of Inverse Lithography with Generative Reinforcement Learning

This article presents a novel approach to inverse lithography using generative reinforcement learning, significantly improving mask quali...

arXiv - AI · 4 min ·
[2602.18904] PCA-VAE: Differentiable Subspace Quantization without Codebook Collapse
Machine Learning

[2602.18904] PCA-VAE: Differentiable Subspace Quantization without Codebook Collapse

The paper introduces PCA-VAE, a novel approach to vector-quantized autoencoders that replaces traditional quantization methods with a dif...

arXiv - Machine Learning · 3 min ·
[2602.19562] A Multimodal Framework for Aligning Human Linguistic Descriptions with Visual Perceptual Data
Machine Learning

[2602.19562] A Multimodal Framework for Aligning Human Linguistic Descriptions with Visual Perceptual Data

This paper presents a computational framework that aligns human linguistic descriptions with visual perceptual data, enhancing understand...

arXiv - AI · 4 min ·
[2602.18825] Bayesian Lottery Ticket Hypothesis
Machine Learning

[2602.18825] Bayesian Lottery Ticket Hypothesis

The paper explores the Bayesian Lottery Ticket Hypothesis, demonstrating that sparse subnetworks in Bayesian neural networks can achieve ...

arXiv - Machine Learning · 4 min ·
[2602.19367] Time Series, Vision, and Language: Exploring the Limits of Alignment in Contrastive Representation Spaces
Machine Learning

[2602.19367] Time Series, Vision, and Language: Exploring the Limits of Alignment in Contrastive Representation Spaces

This paper investigates the alignment of representations from time series, vision, and language modalities, revealing insights into their...

arXiv - AI · 4 min ·
Previous Page 25 Next

Related Topics

Stay updated with AI News

Get the latest news, tools, and insights delivered to your inbox.

Daily or weekly digest • Unsubscribe anytime