Computer Vision

Image recognition, detection, and visual AI

This Week's Best | Monthly Best | Guide | Trending

Top This Week

Computer Vision

[2602.09678] Administrative Law's Fourth Settlement: AI and the Capability-Accountability Trap

Abstract page for arXiv paper 2602.09678: Administrative Law's Fourth Settlement: AI and the Capability-Accountability Trap

arXiv - AI · 4 min · about 7 hours ago

Llms

[2601.13622] CARPE: Context-Aware Image Representation Prioritization via Ensemble for Large Vision-Language Models

Abstract page for arXiv paper 2601.13622: CARPE: Context-Aware Image Representation Prioritization via Ensemble for Large Vision-Language...

arXiv - AI · 3 min · about 7 hours ago

Computer Vision

[2603.26551] Beyond MACs: Hardware Efficient Architecture Design for Vision Backbones

Abstract page for arXiv paper 2603.26551: Beyond MACs: Hardware Efficient Architecture Design for Vision Backbones

arXiv - AI · 4 min · about 7 hours ago

All Content

Machine Learning

[2602.19171] HistCAD: Geometrically Constrained Parametric History-based CAD Dataset

The paper presents HistCAD, a comprehensive dataset for parametric CAD modeling that incorporates geometric constraints and functional se...

arXiv - AI · 3 min · about 1 month ago

Machine Learning

[2602.18690] Neural Fields as World Models

The paper explores how neural fields can serve as world models, preserving sensory topology for better prediction of physical outcomes, w...

arXiv - Machine Learning · 3 min · about 1 month ago

Machine Learning

[2602.19156] Artefact-Aware Fungal Detection in Dermatophytosis: A Real-Time Transformer-Based Approach for KOH Microscopy

This study presents a transformer-based framework for detecting fungal elements in dermatophytosis using KOH microscopy, achieving high a...

arXiv - AI · 4 min · about 1 month ago

Machine Learning

[2602.18573] Multiclass Calibration Assessment and Recalibration of Probability Predictions via the Linear Log Odds Calibration Function

The paper presents a novel method for assessing and recalibrating probability predictions in multiclass classification tasks, addressing ...

arXiv - Machine Learning · 4 min · about 1 month ago

Machine Learning

[2602.18536] Triggering hallucinations in model-based MRI reconstruction via adversarial perturbations

This paper investigates how adversarial perturbations can induce hallucinations in generative models used for MRI reconstruction, highlig...

arXiv - Machine Learning · 4 min · about 1 month ago

Machine Learning

[2602.18525] Do Generative Metrics Predict YOLO Performance? An Evaluation Across Models, Augmentation Ratios, and Dataset Complexity

This paper evaluates the effectiveness of generative metrics in predicting the performance of YOLO object detection models across various...

arXiv - Machine Learning · 4 min · about 1 month ago

Machine Learning

[2602.18502] Mitigating Shortcut Learning via Feature Disentanglement in Medical Imaging: A Benchmark Study

This study evaluates feature disentanglement methods to mitigate shortcut learning in medical imaging, enhancing model robustness and cla...

arXiv - Machine Learning · 4 min · about 1 month ago

Llms

[2602.19040] Adaptive Multi-Agent Reasoning for Text-to-Video Retrieval

The paper presents an adaptive multi-agent framework for improving text-to-video retrieval systems, addressing challenges in query-depend...

arXiv - AI · 4 min · about 1 month ago

Llms

[2602.19022] An interpretable framework using foundation models for fish sex identification

The paper presents FishProtoNet, a non-invasive computer vision framework for accurately identifying the sex of delta smelt, an endangere...

arXiv - AI · 4 min · about 1 month ago

Llms

[2602.18439] Replication Study: Federated Text-Driven Prompt Generation for Vision-Language Models

This article presents a replication study of the FedTPG model, which enhances vision-language model performance in federated learning sce...

arXiv - Machine Learning · 3 min · about 1 month ago

Computer Vision

[2602.18882] SceneTok: A Compressed, Diffusable Token Space for 3D Scenes

SceneTok introduces a novel tokenizer that compresses 3D scene representations into a set of diffusable tokens, achieving superior compre...

arXiv - Machine Learning · 3 min · about 1 month ago

Llms

[2602.18880] FOCA: Frequency-Oriented Cross-Domain Forgery Detection, Localization and Explanation via Multi-Modal Large Language Model

The paper presents FOCA, a novel framework for detecting and localizing image forgery using a multi-modal large language model that integ...

arXiv - AI · 3 min · about 1 month ago

Generative Ai

[2602.18874] Structure-Level Disentangled Diffusion for Few-Shot Chinese Font Generation

This article presents the Structure-Level Disentangled Diffusion Model (SLD-Font) for few-shot Chinese font generation, enhancing style f...

arXiv - AI · 4 min · about 1 month ago

Machine Learning

[2602.19982] A Computationally Efficient Multidimensional Vision Transformer

This paper presents a novel tensor-based framework for Vision Transformers, enhancing computational efficiency while maintaining competit...

arXiv - Machine Learning · 3 min · about 1 month ago

Generative Ai

[2602.18873] BiMotion: B-spline Motion for Text-guided Dynamic 3D Character Generation

BiMotion introduces a novel approach to dynamic 3D character generation using B-spline curves, enhancing motion quality and alignment wit...

arXiv - AI · 3 min · about 1 month ago

Machine Learning

[2602.19931] Expanding the Role of Diffusion Models for Robust Classifier Training

This article explores the use of diffusion models to enhance adversarial training for robust image classifiers, demonstrating improved pe...

arXiv - Machine Learning · 3 min · about 1 month ago

Llms

[2602.19926] Rethinking LoRA for Privacy-Preserving Federated Learning in Large Models

The paper presents LA-LoRA, a novel approach for fine-tuning large models in privacy-preserving federated learning, addressing key challe...

arXiv - AI · 4 min · about 1 month ago

Llms

[2602.18763] TAG: Thinking with Action Unit Grounding for Facial Expression Recognition

The paper introduces TAG, a vision-language framework for Facial Expression Recognition (FER) that enhances reasoning by grounding predic...

arXiv - AI · 4 min · about 1 month ago

Llms

[2602.18745] Synthesizing Multimodal Geometry Datasets from Scratch and Enabling Visual Alignment via Plotting Code

The paper presents a novel pipeline for synthesizing multimodal geometry datasets, introducing the GeoCode dataset which enhances visual-...

arXiv - AI · 3 min · about 1 month ago

Llms

[2602.18742] RoboCurate: Harnessing Diversity with Action-Verified Neural Trajectory for Robot Learning

The paper presents RoboCurate, a framework for generating synthetic robot data that enhances action quality through simulation replay and...

arXiv - AI · 4 min · about 1 month ago

Previous Page 24 Next

Stay updated with AI News

Get the latest news, tools, and insights delivered to your inbox.

Subscribe to Newsletter

Daily or weekly digest • Unsubscribe anytime

Computer Vision

Top This Week

[2602.09678] Administrative Law's Fourth Settlement: AI and the Capability-Accountability Trap

[2601.13622] CARPE: Context-Aware Image Representation Prioritization via Ensemble for Large Vision-Language Models

[2603.26551] Beyond MACs: Hardware Efficient Architecture Design for Vision Backbones

All Content

[2602.19171] HistCAD: Geometrically Constrained Parametric History-based CAD Dataset

[2602.18690] Neural Fields as World Models

[2602.19156] Artefact-Aware Fungal Detection in Dermatophytosis: A Real-Time Transformer-Based Approach for KOH Microscopy

[2602.18573] Multiclass Calibration Assessment and Recalibration of Probability Predictions via the Linear Log Odds Calibration Function

[2602.18536] Triggering hallucinations in model-based MRI reconstruction via adversarial perturbations

[2602.18525] Do Generative Metrics Predict YOLO Performance? An Evaluation Across Models, Augmentation Ratios, and Dataset Complexity

[2602.18502] Mitigating Shortcut Learning via Feature Disentanglement in Medical Imaging: A Benchmark Study

[2602.19040] Adaptive Multi-Agent Reasoning for Text-to-Video Retrieval

[2602.19022] An interpretable framework using foundation models for fish sex identification

[2602.18439] Replication Study: Federated Text-Driven Prompt Generation for Vision-Language Models

[2602.18882] SceneTok: A Compressed, Diffusable Token Space for 3D Scenes

[2602.18880] FOCA: Frequency-Oriented Cross-Domain Forgery Detection, Localization and Explanation via Multi-Modal Large Language Model

[2602.18874] Structure-Level Disentangled Diffusion for Few-Shot Chinese Font Generation

[2602.19982] A Computationally Efficient Multidimensional Vision Transformer

[2602.18873] BiMotion: B-spline Motion for Text-guided Dynamic 3D Character Generation

[2602.19931] Expanding the Role of Diffusion Models for Robust Classifier Training

[2602.19926] Rethinking LoRA for Privacy-Preserving Federated Learning in Large Models

[2602.18763] TAG: Thinking with Action Unit Grounding for Facial Expression Recognition

[2602.18745] Synthesizing Multimodal Geometry Datasets from Scratch and Enabling Visual Alignment via Plotting Code

[2602.18742] RoboCurate: Harnessing Diversity with Action-Verified Neural Trajectory for Robot Learning

Related Topics

Stay updated with AI News