Computer Vision

Image recognition, detection, and visual AI

Top This Week

[2602.09678] Administrative Law's Fourth Settlement: AI and the Capability-Accountability Trap
Computer Vision

[2602.09678] Administrative Law's Fourth Settlement: AI and the Capability-Accountability Trap

Abstract page for arXiv paper 2602.09678: Administrative Law's Fourth Settlement: AI and the Capability-Accountability Trap

arXiv - AI · 4 min ·
[2601.13622] CARPE: Context-Aware Image Representation Prioritization via Ensemble for Large Vision-Language Models
Llms

[2601.13622] CARPE: Context-Aware Image Representation Prioritization via Ensemble for Large Vision-Language Models

Abstract page for arXiv paper 2601.13622: CARPE: Context-Aware Image Representation Prioritization via Ensemble for Large Vision-Language...

arXiv - AI · 3 min ·
[2603.26551] Beyond MACs: Hardware Efficient Architecture Design for Vision Backbones
Computer Vision

[2603.26551] Beyond MACs: Hardware Efficient Architecture Design for Vision Backbones

Abstract page for arXiv paper 2603.26551: Beyond MACs: Hardware Efficient Architecture Design for Vision Backbones

arXiv - AI · 4 min ·

All Content

[2602.19171] HistCAD: Geometrically Constrained Parametric History-based CAD Dataset
Machine Learning

[2602.19171] HistCAD: Geometrically Constrained Parametric History-based CAD Dataset

The paper presents HistCAD, a comprehensive dataset for parametric CAD modeling that incorporates geometric constraints and functional se...

arXiv - AI · 3 min ·
[2602.18690] Neural Fields as World Models
Machine Learning

[2602.18690] Neural Fields as World Models

The paper explores how neural fields can serve as world models, preserving sensory topology for better prediction of physical outcomes, w...

arXiv - Machine Learning · 3 min ·
[2602.19156] Artefact-Aware Fungal Detection in Dermatophytosis: A Real-Time Transformer-Based Approach for KOH Microscopy
Machine Learning

[2602.19156] Artefact-Aware Fungal Detection in Dermatophytosis: A Real-Time Transformer-Based Approach for KOH Microscopy

This study presents a transformer-based framework for detecting fungal elements in dermatophytosis using KOH microscopy, achieving high a...

arXiv - AI · 4 min ·
[2602.18573] Multiclass Calibration Assessment and Recalibration of Probability Predictions via the Linear Log Odds Calibration Function
Machine Learning

[2602.18573] Multiclass Calibration Assessment and Recalibration of Probability Predictions via the Linear Log Odds Calibration Function

The paper presents a novel method for assessing and recalibrating probability predictions in multiclass classification tasks, addressing ...

arXiv - Machine Learning · 4 min ·
[2602.18536] Triggering hallucinations in model-based MRI reconstruction via adversarial perturbations
Machine Learning

[2602.18536] Triggering hallucinations in model-based MRI reconstruction via adversarial perturbations

This paper investigates how adversarial perturbations can induce hallucinations in generative models used for MRI reconstruction, highlig...

arXiv - Machine Learning · 4 min ·
[2602.18525] Do Generative Metrics Predict YOLO Performance? An Evaluation Across Models, Augmentation Ratios, and Dataset Complexity
Machine Learning

[2602.18525] Do Generative Metrics Predict YOLO Performance? An Evaluation Across Models, Augmentation Ratios, and Dataset Complexity

This paper evaluates the effectiveness of generative metrics in predicting the performance of YOLO object detection models across various...

arXiv - Machine Learning · 4 min ·
[2602.18502] Mitigating Shortcut Learning via Feature Disentanglement in Medical Imaging: A Benchmark Study
Machine Learning

[2602.18502] Mitigating Shortcut Learning via Feature Disentanglement in Medical Imaging: A Benchmark Study

This study evaluates feature disentanglement methods to mitigate shortcut learning in medical imaging, enhancing model robustness and cla...

arXiv - Machine Learning · 4 min ·
[2602.19040] Adaptive Multi-Agent Reasoning for Text-to-Video Retrieval
Llms

[2602.19040] Adaptive Multi-Agent Reasoning for Text-to-Video Retrieval

The paper presents an adaptive multi-agent framework for improving text-to-video retrieval systems, addressing challenges in query-depend...

arXiv - AI · 4 min ·
[2602.19022] An interpretable framework using foundation models for fish sex identification
Llms

[2602.19022] An interpretable framework using foundation models for fish sex identification

The paper presents FishProtoNet, a non-invasive computer vision framework for accurately identifying the sex of delta smelt, an endangere...

arXiv - AI · 4 min ·
[2602.18439] Replication Study: Federated Text-Driven Prompt Generation for Vision-Language Models
Llms

[2602.18439] Replication Study: Federated Text-Driven Prompt Generation for Vision-Language Models

This article presents a replication study of the FedTPG model, which enhances vision-language model performance in federated learning sce...

arXiv - Machine Learning · 3 min ·
[2602.18882] SceneTok: A Compressed, Diffusable Token Space for 3D Scenes
Computer Vision

[2602.18882] SceneTok: A Compressed, Diffusable Token Space for 3D Scenes

SceneTok introduces a novel tokenizer that compresses 3D scene representations into a set of diffusable tokens, achieving superior compre...

arXiv - Machine Learning · 3 min ·
[2602.18880] FOCA: Frequency-Oriented Cross-Domain Forgery Detection, Localization and Explanation via Multi-Modal Large Language Model
Llms

[2602.18880] FOCA: Frequency-Oriented Cross-Domain Forgery Detection, Localization and Explanation via Multi-Modal Large Language Model

The paper presents FOCA, a novel framework for detecting and localizing image forgery using a multi-modal large language model that integ...

arXiv - AI · 3 min ·
[2602.18874] Structure-Level Disentangled Diffusion for Few-Shot Chinese Font Generation
Generative Ai

[2602.18874] Structure-Level Disentangled Diffusion for Few-Shot Chinese Font Generation

This article presents the Structure-Level Disentangled Diffusion Model (SLD-Font) for few-shot Chinese font generation, enhancing style f...

arXiv - AI · 4 min ·
[2602.19982] A Computationally Efficient Multidimensional Vision Transformer
Machine Learning

[2602.19982] A Computationally Efficient Multidimensional Vision Transformer

This paper presents a novel tensor-based framework for Vision Transformers, enhancing computational efficiency while maintaining competit...

arXiv - Machine Learning · 3 min ·
[2602.18873] BiMotion: B-spline Motion for Text-guided Dynamic 3D Character Generation
Generative Ai

[2602.18873] BiMotion: B-spline Motion for Text-guided Dynamic 3D Character Generation

BiMotion introduces a novel approach to dynamic 3D character generation using B-spline curves, enhancing motion quality and alignment wit...

arXiv - AI · 3 min ·
[2602.19931] Expanding the Role of Diffusion Models for Robust Classifier Training
Machine Learning

[2602.19931] Expanding the Role of Diffusion Models for Robust Classifier Training

This article explores the use of diffusion models to enhance adversarial training for robust image classifiers, demonstrating improved pe...

arXiv - Machine Learning · 3 min ·
[2602.19926] Rethinking LoRA for Privacy-Preserving Federated Learning in Large Models
Llms

[2602.19926] Rethinking LoRA for Privacy-Preserving Federated Learning in Large Models

The paper presents LA-LoRA, a novel approach for fine-tuning large models in privacy-preserving federated learning, addressing key challe...

arXiv - AI · 4 min ·
[2602.18763] TAG: Thinking with Action Unit Grounding for Facial Expression Recognition
Llms

[2602.18763] TAG: Thinking with Action Unit Grounding for Facial Expression Recognition

The paper introduces TAG, a vision-language framework for Facial Expression Recognition (FER) that enhances reasoning by grounding predic...

arXiv - AI · 4 min ·
[2602.18745] Synthesizing Multimodal Geometry Datasets from Scratch and Enabling Visual Alignment via Plotting Code
Llms

[2602.18745] Synthesizing Multimodal Geometry Datasets from Scratch and Enabling Visual Alignment via Plotting Code

The paper presents a novel pipeline for synthesizing multimodal geometry datasets, introducing the GeoCode dataset which enhances visual-...

arXiv - AI · 3 min ·
[2602.18742] RoboCurate: Harnessing Diversity with Action-Verified Neural Trajectory for Robot Learning
Llms

[2602.18742] RoboCurate: Harnessing Diversity with Action-Verified Neural Trajectory for Robot Learning

The paper presents RoboCurate, a framework for generating synthetic robot data that enhances action quality through simulation replay and...

arXiv - AI · 4 min ·
Previous Page 24 Next

Related Topics

Stay updated with AI News

Get the latest news, tools, and insights delivered to your inbox.

Daily or weekly digest • Unsubscribe anytime