Computer Vision

Image recognition, detection, and visual AI

This Week's Best | Monthly Best | Guide | Trending

Top This Week

Computer Vision

[2602.09678] Administrative Law's Fourth Settlement: AI and the Capability-Accountability Trap

Abstract page for arXiv paper 2602.09678: Administrative Law's Fourth Settlement: AI and the Capability-Accountability Trap

arXiv - AI · 4 min · about 8 hours ago

Llms

[2601.13622] CARPE: Context-Aware Image Representation Prioritization via Ensemble for Large Vision-Language Models

Abstract page for arXiv paper 2601.13622: CARPE: Context-Aware Image Representation Prioritization via Ensemble for Large Vision-Language...

arXiv - AI · 3 min · about 8 hours ago

Computer Vision

[2603.26551] Beyond MACs: Hardware Efficient Architecture Design for Vision Backbones

Abstract page for arXiv paper 2603.26551: Beyond MACs: Hardware Efficient Architecture Design for Vision Backbones

arXiv - AI · 4 min · about 8 hours ago

All Content

Llms

[2602.18729] MiSCHiEF: A Benchmark in Minimal-Pairs of Safety and Culture for Holistic Evaluation of Fine-Grained Image-Caption Alignment

The paper presents MiSCHiEF, a benchmark for evaluating fine-grained image-caption alignment, focusing on safety and cultural contexts, h...

arXiv - AI · 4 min · about 1 month ago

Machine Learning

[2602.18702] Think with Grounding: Curriculum Reinforced Reasoning with Video Grounding for Long Video Understanding

The paper presents Video-TwG, a curriculum reinforced framework for improving long video understanding through selective video grounding ...

arXiv - AI · 4 min · about 1 month ago

Nlp

[2602.19641] Evaluating the Impact of Data Anonymization on Image Retrieval

This article evaluates how data anonymization affects the performance of Content-Based Image Retrieval (CBIR) systems, highlighting the b...

arXiv - Machine Learning · 4 min · about 1 month ago

Machine Learning

[2602.18589] DM4CT: Benchmarking Diffusion Models for Computed Tomography Reconstruction

The paper presents DM4CT, a benchmark for evaluating diffusion models in computed tomography (CT) reconstruction, addressing practical ch...

arXiv - AI · 4 min · about 1 month ago

Computer Vision

[2602.18585] BloomNet: Exploring Single vs. Multiple Object Annotation for Flower Recognition Using YOLO Variants

The paper explores the effectiveness of single versus multiple object annotation for flower recognition using various YOLO models, presen...

arXiv - AI · 4 min · about 1 month ago

Machine Learning

[2602.19512] Variational Trajectory Optimization of Anisotropic Diffusion Schedules

This paper presents a variational framework for optimizing anisotropic diffusion schedules in machine learning, enhancing performance acr...

arXiv - Machine Learning · 3 min · about 1 month ago

Llms

[2602.18540] Rodent-Bench

Rodent-Bench introduces a benchmark for evaluating Multimodal Large Language Models (MLLMs) in annotating rodent behavior videos, reveali...

arXiv - AI · 3 min · about 1 month ago

Llms

[2602.18532] VLANeXt: Recipes for Building Strong VLA Models

The paper presents VLANeXt, a framework for building effective Vision-Language-Action (VLA) models, addressing inconsistencies in trainin...

arXiv - AI · 4 min · about 1 month ago

Llms

[2602.18527] JAEGER: Joint 3D Audio-Visual Grounding and Reasoning in Simulated Physical Environments

The paper presents JAEGER, a framework for joint 3D audio-visual grounding and reasoning, addressing limitations of existing 2D models by...

arXiv - AI · 4 min · about 1 month ago

Machine Learning

[2602.18520] Sketch2Feedback: Grammar-in-the-Loop Framework for Rubric-Aligned Feedback on Student STEM Diagrams

The paper presents Sketch2Feedback, a framework that enhances feedback on student-drawn STEM diagrams by integrating grammar rules to red...

arXiv - AI · 4 min · about 1 month ago

Computer Vision

[2602.18504] A Computer Vision Framework for Multi-Class Detection and Tracking in Soccer Broadcast Footage

This paper presents a computer vision framework for detecting and tracking players and the ball in soccer broadcast footage using a singl...

arXiv - AI · 3 min · about 1 month ago

Llms

[2602.18296] Context-Aware Mapping of 2D Drawing Annotations to 3D CAD Features Using LLM-Assisted Reasoning for Manufacturing Automation

This article presents a framework for mapping 2D drawing annotations to 3D CAD features using context-aware reasoning, enhancing manufact...

arXiv - AI · 4 min · about 1 month ago

Computer Vision

[2511.18765] NI-Tex: Non-isometric Image-based Garment Texture Generation

The paper presents NI-Tex, a method for generating non-isometric garment textures using a new dataset and advanced techniques for cross-p...

arXiv - AI · 4 min · about 1 month ago

Machine Learning

[2602.19033] A Markovian View of Iterative-Feedback Loops in Image Generative Models: Neural Resonance and Model Collapse

This paper explores iterative feedback loops in image generative models, introducing the concept of neural resonance and its implications...

arXiv - AI · 4 min · about 1 month ago

Computer Vision

[2507.19418] DEFNet: Multitasks-based Deep Evidential Fusion Network for Blind Image Quality Assessment

The paper introduces DEFNet, a multitask-based deep evidential fusion network designed to enhance blind image quality assessment (BIQA) b...

arXiv - AI · 3 min · about 1 month ago

Generative Ai

[2602.19027] Pushing the Limits of Inverse Lithography with Generative Reinforcement Learning

This article presents a novel approach to inverse lithography using generative reinforcement learning, significantly improving mask quali...

arXiv - AI · 4 min · about 1 month ago

Machine Learning

[2602.18904] PCA-VAE: Differentiable Subspace Quantization without Codebook Collapse

The paper introduces PCA-VAE, a novel approach to vector-quantized autoencoders that replaces traditional quantization methods with a dif...

arXiv - Machine Learning · 3 min · about 1 month ago

Machine Learning

[2602.19562] A Multimodal Framework for Aligning Human Linguistic Descriptions with Visual Perceptual Data

This paper presents a computational framework that aligns human linguistic descriptions with visual perceptual data, enhancing understand...

arXiv - AI · 4 min · about 1 month ago

Machine Learning

[2602.18825] Bayesian Lottery Ticket Hypothesis

The paper explores the Bayesian Lottery Ticket Hypothesis, demonstrating that sparse subnetworks in Bayesian neural networks can achieve ...

arXiv - Machine Learning · 4 min · about 1 month ago

Machine Learning

[2602.19367] Time Series, Vision, and Language: Exploring the Limits of Alignment in Contrastive Representation Spaces

This paper investigates the alignment of representations from time series, vision, and language modalities, revealing insights into their...

arXiv - AI · 4 min · about 1 month ago

Previous Page 25 Next

Stay updated with AI News

Get the latest news, tools, and insights delivered to your inbox.

Subscribe to Newsletter

Daily or weekly digest • Unsubscribe anytime

Computer Vision

Top This Week

[2602.09678] Administrative Law's Fourth Settlement: AI and the Capability-Accountability Trap

[2601.13622] CARPE: Context-Aware Image Representation Prioritization via Ensemble for Large Vision-Language Models

[2603.26551] Beyond MACs: Hardware Efficient Architecture Design for Vision Backbones

All Content

[2602.18729] MiSCHiEF: A Benchmark in Minimal-Pairs of Safety and Culture for Holistic Evaluation of Fine-Grained Image-Caption Alignment

[2602.18702] Think with Grounding: Curriculum Reinforced Reasoning with Video Grounding for Long Video Understanding

[2602.19641] Evaluating the Impact of Data Anonymization on Image Retrieval

[2602.18589] DM4CT: Benchmarking Diffusion Models for Computed Tomography Reconstruction

[2602.18585] BloomNet: Exploring Single vs. Multiple Object Annotation for Flower Recognition Using YOLO Variants

[2602.19512] Variational Trajectory Optimization of Anisotropic Diffusion Schedules

[2602.18540] Rodent-Bench

[2602.18532] VLANeXt: Recipes for Building Strong VLA Models

[2602.18527] JAEGER: Joint 3D Audio-Visual Grounding and Reasoning in Simulated Physical Environments

[2602.18520] Sketch2Feedback: Grammar-in-the-Loop Framework for Rubric-Aligned Feedback on Student STEM Diagrams

[2602.18504] A Computer Vision Framework for Multi-Class Detection and Tracking in Soccer Broadcast Footage

[2602.18296] Context-Aware Mapping of 2D Drawing Annotations to 3D CAD Features Using LLM-Assisted Reasoning for Manufacturing Automation

[2511.18765] NI-Tex: Non-isometric Image-based Garment Texture Generation

[2602.19033] A Markovian View of Iterative-Feedback Loops in Image Generative Models: Neural Resonance and Model Collapse

[2507.19418] DEFNet: Multitasks-based Deep Evidential Fusion Network for Blind Image Quality Assessment

[2602.19027] Pushing the Limits of Inverse Lithography with Generative Reinforcement Learning

[2602.18904] PCA-VAE: Differentiable Subspace Quantization without Codebook Collapse

[2602.19562] A Multimodal Framework for Aligning Human Linguistic Descriptions with Visual Perceptual Data

[2602.18825] Bayesian Lottery Ticket Hypothesis

[2602.19367] Time Series, Vision, and Language: Exploring the Limits of Alignment in Contrastive Representation Spaces

Related Topics

Stay updated with AI News