Computer Vision

Image recognition, detection, and visual AI

Top This Week

[2602.09678] Administrative Law's Fourth Settlement: AI and the Capability-Accountability Trap
Computer Vision

[2602.09678] Administrative Law's Fourth Settlement: AI and the Capability-Accountability Trap

Abstract page for arXiv paper 2602.09678: Administrative Law's Fourth Settlement: AI and the Capability-Accountability Trap

arXiv - AI · 4 min ·
[2601.13622] CARPE: Context-Aware Image Representation Prioritization via Ensemble for Large Vision-Language Models
Llms

[2601.13622] CARPE: Context-Aware Image Representation Prioritization via Ensemble for Large Vision-Language Models

Abstract page for arXiv paper 2601.13622: CARPE: Context-Aware Image Representation Prioritization via Ensemble for Large Vision-Language...

arXiv - AI · 3 min ·
[2603.26551] Beyond MACs: Hardware Efficient Architecture Design for Vision Backbones
Computer Vision

[2603.26551] Beyond MACs: Hardware Efficient Architecture Design for Vision Backbones

Abstract page for arXiv paper 2603.26551: Beyond MACs: Hardware Efficient Architecture Design for Vision Backbones

arXiv - AI · 4 min ·

All Content

[2602.16217] Multi-Class Boundary Extraction from Implicit Representations
Machine Learning

[2602.16217] Multi-Class Boundary Extraction from Implicit Representations

This paper presents a novel algorithm for multi-class boundary extraction from implicit representations, emphasizing topological correctn...

arXiv - Machine Learning · 3 min ·
[2602.16050] Evidence-Grounded Subspecialty Reasoning: Evaluating a Curated Clinical Intelligence Layer on the 2025 Endocrinology Board-Style Examination
Llms

[2602.16050] Evidence-Grounded Subspecialty Reasoning: Evaluating a Curated Clinical Intelligence Layer on the 2025 Endocrinology Board-Style Examination

This article evaluates the performance of the January Mirror, an evidence-grounded clinical reasoning system, against leading large langu...

arXiv - AI · 4 min ·
[2602.16120] Feature-based morphological analysis of shape graph data
Data Science

[2602.16120] Feature-based morphological analysis of shape graph data

This paper presents a computational pipeline for analyzing shape graph datasets, focusing on geometric and topological features to enhanc...

arXiv - Machine Learning · 3 min ·
[2602.16057] Extracting and Analyzing Rail Crossing Behavior Signatures from Videos using Tensor Methods
Machine Learning

[2602.16057] Extracting and Analyzing Rail Crossing Behavior Signatures from Videos using Tensor Methods

This article presents a novel multi-view tensor decomposition framework to analyze rail crossing behaviors from video data, revealing sig...

arXiv - Machine Learning · 4 min ·
[2602.15971] B-DENSE: Branching For Dense Ensemble Network Learning
Machine Learning

[2602.15971] B-DENSE: Branching For Dense Ensemble Network Learning

The paper presents B-DENSE, a novel framework for improving dense ensemble network learning by leveraging multi-branch trajectory alignme...

arXiv - AI · 3 min ·
World Labs lands $200M from Autodesk to bring world models into 3D workflows | TechCrunch
Machine Learning

World Labs lands $200M from Autodesk to bring world models into 3D workflows | TechCrunch

World Labs has secured a $200 million investment from Autodesk to integrate its AI-generated 3D models with Autodesk's design tools, focu...

TechCrunch - AI · 6 min ·
Indian AI lab Sarvam's new models are a major bet on the viability of open-source AI | TechCrunch
Machine Learning

Indian AI lab Sarvam's new models are a major bet on the viability of open-source AI | TechCrunch

Indian AI lab Sarvam launches new large language models, including 30B and 105B parameter models, aiming to challenge foreign AI systems ...

TechCrunch - AI · 5 min ·
[2601.03100] Text-Guided Layer Fusion Mitigates Hallucination in Multimodal LLMs
Llms

[2601.03100] Text-Guided Layer Fusion Mitigates Hallucination in Multimodal LLMs

The paper presents TGIF, a novel approach to mitigate hallucinations in multimodal large language models (MLLMs) by leveraging a text-gui...

arXiv - AI · 4 min ·
[2511.05705] Long Grounded Thoughts: Synthesizing Visual Problems and Reasoning Chains at Scale
Data Science

[2511.05705] Long Grounded Thoughts: Synthesizing Visual Problems and Reasoning Chains at Scale

The paper presents a novel framework for synthesizing vision-centric problems and reasoning chains, generating over 1 million high-qualit...

arXiv - AI · 4 min ·
[2510.02001] Generating Findings for Jaw Cysts in Dental Panoramic Radiographs Using a GPT-Based VLM: A Preliminary Study on Building a Two-Stage Self-Correction Loop with Structured Output (SLSO) Framework
Llms

[2510.02001] Generating Findings for Jaw Cysts in Dental Panoramic Radiographs Using a GPT-Based VLM: A Preliminary Study on Building a Two-Stage Self-Correction Loop with Structured Output (SLSO) Framework

This study explores a new Self-correction Loop with Structured Output (SLSO) framework to enhance the accuracy of AI-generated findings f...

arXiv - AI · 4 min ·
[2501.12369] DARB-Splatting: Generalizing Splatting with Decaying Anisotropic Radial Basis Functions
Computer Vision

[2501.12369] DARB-Splatting: Generalizing Splatting with Decaying Anisotropic Radial Basis Functions

This paper introduces DARB-Splatting, a novel approach to 3D reconstruction using Decaying Anisotropic Radial Basis Functions, enhancing ...

arXiv - AI · 4 min ·
[2511.10853] Advanced Assistance for Traffic Crash Analysis: An AI-Driven Multi-Agent Approach to Pre-Crash Reconstruction
Nlp

[2511.10853] Advanced Assistance for Traffic Crash Analysis: An AI-Driven Multi-Agent Approach to Pre-Crash Reconstruction

This article presents an AI-driven multi-agent framework for reconstructing traffic crash scenarios, enhancing the accuracy of pre-crash ...

arXiv - AI · 4 min ·
[2602.15811] Task-Agnostic Continual Learning for Chest Radiograph Classification
Machine Learning

[2602.15811] Task-Agnostic Continual Learning for Chest Radiograph Classification

This article presents CARL-XRay, a novel continual learning framework for chest radiograph classification that adapts to new datasets wit...

arXiv - AI · 4 min ·
[2509.21609] VLCE: A Knowledge-Enhanced Framework for Image Description in Disaster Assessment
Computer Vision

[2509.21609] VLCE: A Knowledge-Enhanced Framework for Image Description in Disaster Assessment

The paper presents VLCE, a framework that enhances image description for disaster assessment by integrating external semantic knowledge, ...

arXiv - Machine Learning · 4 min ·
[2602.15733] MeshMimic: Geometry-Aware Humanoid Motion Learning through 3D Scene Reconstruction
Robotics

[2602.15733] MeshMimic: Geometry-Aware Humanoid Motion Learning through 3D Scene Reconstruction

MeshMimic introduces a novel framework for humanoid motion learning by integrating 3D scene reconstruction with motion control, enhancing...

arXiv - AI · 4 min ·
[2602.15724] Learning to Retrieve Navigable Candidates for Efficient Vision-and-Language Navigation
Llms

[2602.15724] Learning to Retrieve Navigable Candidates for Efficient Vision-and-Language Navigation

This paper presents a retrieval-augmented framework to enhance efficiency in Vision-and-Language Navigation (VLN) by leveraging large lan...

arXiv - AI · 4 min ·
[2602.15712] Criteria-first, semantics-later: reproducible structure discovery in image-based sciences
Computer Vision

[2602.15712] Criteria-first, semantics-later: reproducible structure discovery in image-based sciences

This article presents a novel approach to structure discovery in image-based sciences, advocating for a criteria-first methodology that s...

arXiv - AI · 4 min ·
[2602.15689] A Content-Based Framework for Cybersecurity Refusal Decisions in Large Language Models
Llms

[2602.15689] A Content-Based Framework for Cybersecurity Refusal Decisions in Large Language Models

This paper presents a content-based framework for cybersecurity refusal decisions in large language models, emphasizing the need for expl...

arXiv - AI · 3 min ·
[2507.01110] A LoD of Gaussians: Unified Training and Rendering for Ultra-Large Scale Reconstruction with External Memory
Machine Learning

[2507.01110] A LoD of Gaussians: Unified Training and Rendering for Ultra-Large Scale Reconstruction with External Memory

The paper presents a novel framework, A LoD of Gaussians, for ultra-large-scale scene reconstruction and rendering using Gaussian splatti...

arXiv - Machine Learning · 4 min ·
[2505.22914] cadrille: Multi-modal CAD Reconstruction with Reinforcement Learning
Machine Learning

[2505.22914] cadrille: Multi-modal CAD Reconstruction with Reinforcement Learning

The paper presents 'cadrille', a multi-modal CAD reconstruction model utilizing reinforcement learning to process diverse input data, ach...

arXiv - Machine Learning · 4 min ·
Previous Page 34 Next

Related Topics

Stay updated with AI News

Get the latest news, tools, and insights delivered to your inbox.

Daily or weekly digest • Unsubscribe anytime