Computer Vision

Image recognition, detection, and visual AI

Top This Week

[2511.21428] From Observation to Action: Latent Action-based Primitive Segmentation for VLA Pre-training in Industrial Settings
Machine Learning

[2511.21428] From Observation to Action: Latent Action-based Primitive Segmentation for VLA Pre-training in Industrial Settings

Abstract page for arXiv paper 2511.21428: From Observation to Action: Latent Action-based Primitive Segmentation for VLA Pre-training in ...

arXiv - AI · 4 min ·
[2511.16719] SAM 3: Segment Anything with Concepts
Machine Learning

[2511.16719] SAM 3: Segment Anything with Concepts

Abstract page for arXiv paper 2511.16719: SAM 3: Segment Anything with Concepts

arXiv - AI · 4 min ·
[2603.28594] Detection of Adversarial Attacks in Robotic Perception
Machine Learning

[2603.28594] Detection of Adversarial Attacks in Robotic Perception

Abstract page for arXiv paper 2603.28594: Detection of Adversarial Attacks in Robotic Perception

arXiv - AI · 3 min ·

All Content

[2509.10766] MetaSeal: Defending Against Image Attribution Forgery Through Content-Dependent Cryptographic Watermarks
Machine Learning

[2509.10766] MetaSeal: Defending Against Image Attribution Forgery Through Content-Dependent Cryptographic Watermarks

The paper presents MetaSeal, a novel framework for content-dependent cryptographic watermarks designed to combat image attribution forger...

arXiv - AI · 4 min ·
[2507.03262] Investigating Redundancy in Multimodal Large Language Models with Multiple Vision Encoders
Llms

[2507.03262] Investigating Redundancy in Multimodal Large Language Models with Multiple Vision Encoders

This article investigates redundancy in multimodal large language models (MLLMs) with multiple vision encoders, revealing that more encod...

arXiv - AI · 4 min ·
[2504.20101] PlanetServe: A Decentralized, Scalable, and Privacy-Preserving Overlay for Democratizing Large Language Model Serving
Llms

[2504.20101] PlanetServe: A Decentralized, Scalable, and Privacy-Preserving Overlay for Democratizing Large Language Model Serving

The paper presents PlanetServe, a decentralized overlay for scalable and privacy-preserving serving of large language models (LLMs), addr...

arXiv - AI · 4 min ·
[2412.07909] Explaining and Mitigating the Modality Gap in Contrastive Multimodal Learning
Machine Learning

[2412.07909] Explaining and Mitigating the Modality Gap in Contrastive Multimodal Learning

This paper explores the modality gap in contrastive multimodal learning, analyzing its causes and proposing methods to mitigate it for im...

arXiv - Machine Learning · 4 min ·
[2410.03952] Pixel-Based Similarities as an Alternative to Neural Data for Improving Convolutional Neural Network Adversarial Robustness
Machine Learning

[2410.03952] Pixel-Based Similarities as an Alternative to Neural Data for Improving Convolutional Neural Network Adversarial Robustness

This paper presents a novel approach to enhancing the adversarial robustness of Convolutional Neural Networks (CNNs) by utilizing pixel-b...

arXiv - Machine Learning · 4 min ·
[2510.00664] Batch-CAM: Introduction to better reasoning in convolutional deep learning models
Machine Learning

[2510.00664] Batch-CAM: Introduction to better reasoning in convolutional deep learning models

The paper introduces Batch-CAM, a training framework for convolutional deep learning models that enhances interpretability by aligning mo...

arXiv - AI · 4 min ·
[2508.07388] Invert4TVG: A Temporal Video Grounding Framework with Inversion Tasks Preserving Action Understanding Ability
Ai Infrastructure

[2508.07388] Invert4TVG: A Temporal Video Grounding Framework with Inversion Tasks Preserving Action Understanding Ability

The paper presents Invert4TVG, a novel framework for Temporal Video Grounding (TVG) that enhances action understanding through inversion ...

arXiv - AI · 4 min ·
[2505.14381] SCAN: Semantic Document Layout Analysis for Textual and Visual Retrieval-Augmented Generation
Llms

[2505.14381] SCAN: Semantic Document Layout Analysis for Textual and Visual Retrieval-Augmented Generation

The paper presents SCAN, a novel approach for Semantic Document Layout Analysis that enhances Retrieval-Augmented Generation (RAG) system...

arXiv - AI · 4 min ·
[2602.13191] CoPE-VideoLM: Codec Primitives For Efficient Video Language Models
Llms

[2602.13191] CoPE-VideoLM: Codec Primitives For Efficient Video Language Models

The paper presents CoPE-VideoLM, a novel approach that utilizes codec primitives to enhance the efficiency of video language models, sign...

arXiv - AI · 4 min ·
[2602.13088] How cyborg propaganda reshapes collective action
Computer Vision

[2602.13088] How cyborg propaganda reshapes collective action

This paper explores the emergence of 'cyborg propaganda,' where human and AI collaboration reshapes collective action, blurring lines bet...

arXiv - AI · 4 min ·
[2602.13055] Curriculum-DPO++: Direct Preference Optimization via Data and Model Curricula for Text-to-Image Generation
Machine Learning

[2602.13055] Curriculum-DPO++: Direct Preference Optimization via Data and Model Curricula for Text-to-Image Generation

The paper presents Curriculum-DPO++, an advanced method for text-to-image generation that optimizes preference learning through a dual cu...

arXiv - Machine Learning · 4 min ·
[2602.12983] Detecting Object Tracking Failure via Sequential Hypothesis Testing
Machine Learning

[2602.12983] Detecting Object Tracking Failure via Sequential Hypothesis Testing

This paper presents a method for detecting object tracking failures using sequential hypothesis testing, enhancing safety in computer vis...

arXiv - AI · 4 min ·
[2602.12933] Deep-Learning Atlas Registration for Melanoma Brain Metastases: Preserving Pathology While Enabling Cohort-Level Analyses
Ai Infrastructure

[2602.12933] Deep-Learning Atlas Registration for Melanoma Brain Metastases: Preserving Pathology While Enabling Cohort-Level Analyses

This article presents a deep-learning framework for registering melanoma brain metastases (MBM) to a common atlas, enhancing cohort-level...

arXiv - AI · 4 min ·
[2602.12919] EPRBench: A High-Quality Benchmark Dataset for Event Stream Based Visual Place Recognition
Data Science

[2602.12919] EPRBench: A High-Quality Benchmark Dataset for Event Stream Based Visual Place Recognition

EPRBench introduces a benchmark dataset for event stream-based visual place recognition, addressing challenges in low-light and high-spee...

arXiv - AI · 4 min ·
[2602.12902] Robustness of Object Detection of Autonomous Vehicles in Adverse Weather Conditions
Machine Learning

[2602.12902] Robustness of Object Detection of Autonomous Vehicles in Adverse Weather Conditions

This paper evaluates the robustness of object detection models used in autonomous vehicles under adverse weather conditions, proposing a ...

arXiv - Machine Learning · 4 min ·
[2602.12869] X-VORTEX: Spatio-Temporal Contrastive Learning for Wake Vortex Trajectory Forecasting
Machine Learning

[2602.12869] X-VORTEX: Spatio-Temporal Contrastive Learning for Wake Vortex Trajectory Forecasting

The paper presents X-VORTEX, a novel spatio-temporal contrastive learning framework designed to enhance wake vortex trajectory forecastin...

arXiv - Machine Learning · 4 min ·
[2602.12758] VineetVC: Adaptive Video Conferencing Under Severe Bandwidth Constraints Using Audio-Driven Talking-Head Reconstruction
Computer Vision

[2602.12758] VineetVC: Adaptive Video Conferencing Under Severe Bandwidth Constraints Using Audio-Driven Talking-Head Reconstruction

The paper presents VineetVC, an adaptive video conferencing system designed to function effectively under severe bandwidth constraints by...

arXiv - AI · 4 min ·
[2602.12675] SLA2: Sparse-Linear Attention with Learnable Routing and QAT
Machine Learning

[2602.12675] SLA2: Sparse-Linear Attention with Learnable Routing and QAT

The paper presents SLA2, an advanced Sparse-Linear Attention model that enhances video generation efficiency by introducing a learnable r...

arXiv - Machine Learning · 3 min ·
[2602.12659] IndicFairFace: Balanced Indian Face Dataset for Auditing and Mitigating Geographical Bias in Vision-Language Models
Llms

[2602.12659] IndicFairFace: Balanced Indian Face Dataset for Auditing and Mitigating Geographical Bias in Vision-Language Models

The paper introduces IndicFairFace, a balanced dataset aimed at addressing geographical bias in Vision-Language Models (VLMs) by represen...

arXiv - AI · 4 min ·
[2602.12649] Multi-Task Learning with Additive U-Net for Image Denoising and Classification
Ai Infrastructure

[2602.12649] Multi-Task Learning with Additive U-Net for Image Denoising and Classification

This article presents the Additive U-Net architecture for image denoising and classification, highlighting its advantages in multi-task l...

arXiv - Machine Learning · 3 min ·
Previous Page 46 Next

Related Topics

Stay updated with AI News

Get the latest news, tools, and insights delivered to your inbox.

Daily or weekly digest • Unsubscribe anytime