Computer Vision

Image recognition, detection, and visual AI

This Week's Best | Monthly Best | Guide | Trending

Top This Week

Machine Learning

[2511.21428] From Observation to Action: Latent Action-based Primitive Segmentation for VLA Pre-training in Industrial Settings

Abstract page for arXiv paper 2511.21428: From Observation to Action: Latent Action-based Primitive Segmentation for VLA Pre-training in ...

arXiv - AI · 4 min · about 15 hours ago

Machine Learning

[2511.16719] SAM 3: Segment Anything with Concepts

Abstract page for arXiv paper 2511.16719: SAM 3: Segment Anything with Concepts

arXiv - AI · 4 min · about 15 hours ago

Machine Learning

[2603.28594] Detection of Adversarial Attacks in Robotic Perception

Abstract page for arXiv paper 2603.28594: Detection of Adversarial Attacks in Robotic Perception

arXiv - AI · 3 min · about 15 hours ago

All Content

Machine Learning

[2509.10766] MetaSeal: Defending Against Image Attribution Forgery Through Content-Dependent Cryptographic Watermarks

The paper presents MetaSeal, a novel framework for content-dependent cryptographic watermarks designed to combat image attribution forger...

arXiv - AI · 4 min · about 1 month ago

Llms

[2507.03262] Investigating Redundancy in Multimodal Large Language Models with Multiple Vision Encoders

This article investigates redundancy in multimodal large language models (MLLMs) with multiple vision encoders, revealing that more encod...

arXiv - AI · 4 min · about 1 month ago

Llms

[2504.20101] PlanetServe: A Decentralized, Scalable, and Privacy-Preserving Overlay for Democratizing Large Language Model Serving

The paper presents PlanetServe, a decentralized overlay for scalable and privacy-preserving serving of large language models (LLMs), addr...

arXiv - AI · 4 min · about 1 month ago

Machine Learning

[2412.07909] Explaining and Mitigating the Modality Gap in Contrastive Multimodal Learning

This paper explores the modality gap in contrastive multimodal learning, analyzing its causes and proposing methods to mitigate it for im...

arXiv - Machine Learning · 4 min · about 1 month ago

Machine Learning

[2410.03952] Pixel-Based Similarities as an Alternative to Neural Data for Improving Convolutional Neural Network Adversarial Robustness

This paper presents a novel approach to enhancing the adversarial robustness of Convolutional Neural Networks (CNNs) by utilizing pixel-b...

arXiv - Machine Learning · 4 min · about 1 month ago

Machine Learning

[2510.00664] Batch-CAM: Introduction to better reasoning in convolutional deep learning models

The paper introduces Batch-CAM, a training framework for convolutional deep learning models that enhances interpretability by aligning mo...

arXiv - AI · 4 min · about 1 month ago

Ai Infrastructure

[2508.07388] Invert4TVG: A Temporal Video Grounding Framework with Inversion Tasks Preserving Action Understanding Ability

The paper presents Invert4TVG, a novel framework for Temporal Video Grounding (TVG) that enhances action understanding through inversion ...

arXiv - AI · 4 min · about 1 month ago

Llms

[2505.14381] SCAN: Semantic Document Layout Analysis for Textual and Visual Retrieval-Augmented Generation

The paper presents SCAN, a novel approach for Semantic Document Layout Analysis that enhances Retrieval-Augmented Generation (RAG) system...

arXiv - AI · 4 min · about 1 month ago

Llms

[2602.13191] CoPE-VideoLM: Codec Primitives For Efficient Video Language Models

The paper presents CoPE-VideoLM, a novel approach that utilizes codec primitives to enhance the efficiency of video language models, sign...

arXiv - AI · 4 min · about 1 month ago

Computer Vision

[2602.13088] How cyborg propaganda reshapes collective action

This paper explores the emergence of 'cyborg propaganda,' where human and AI collaboration reshapes collective action, blurring lines bet...

arXiv - AI · 4 min · about 1 month ago

Machine Learning

[2602.13055] Curriculum-DPO++: Direct Preference Optimization via Data and Model Curricula for Text-to-Image Generation

The paper presents Curriculum-DPO++, an advanced method for text-to-image generation that optimizes preference learning through a dual cu...

arXiv - Machine Learning · 4 min · about 1 month ago

Machine Learning

[2602.12983] Detecting Object Tracking Failure via Sequential Hypothesis Testing

This paper presents a method for detecting object tracking failures using sequential hypothesis testing, enhancing safety in computer vis...

arXiv - AI · 4 min · about 1 month ago

Ai Infrastructure

[2602.12933] Deep-Learning Atlas Registration for Melanoma Brain Metastases: Preserving Pathology While Enabling Cohort-Level Analyses

This article presents a deep-learning framework for registering melanoma brain metastases (MBM) to a common atlas, enhancing cohort-level...

arXiv - AI · 4 min · about 1 month ago

Data Science

[2602.12919] EPRBench: A High-Quality Benchmark Dataset for Event Stream Based Visual Place Recognition

EPRBench introduces a benchmark dataset for event stream-based visual place recognition, addressing challenges in low-light and high-spee...

arXiv - AI · 4 min · about 1 month ago

Machine Learning

[2602.12902] Robustness of Object Detection of Autonomous Vehicles in Adverse Weather Conditions

This paper evaluates the robustness of object detection models used in autonomous vehicles under adverse weather conditions, proposing a ...

arXiv - Machine Learning · 4 min · about 1 month ago

Machine Learning

[2602.12869] X-VORTEX: Spatio-Temporal Contrastive Learning for Wake Vortex Trajectory Forecasting

The paper presents X-VORTEX, a novel spatio-temporal contrastive learning framework designed to enhance wake vortex trajectory forecastin...

arXiv - Machine Learning · 4 min · about 1 month ago

Computer Vision

[2602.12758] VineetVC: Adaptive Video Conferencing Under Severe Bandwidth Constraints Using Audio-Driven Talking-Head Reconstruction

The paper presents VineetVC, an adaptive video conferencing system designed to function effectively under severe bandwidth constraints by...

arXiv - AI · 4 min · about 1 month ago

Machine Learning

[2602.12675] SLA2: Sparse-Linear Attention with Learnable Routing and QAT

The paper presents SLA2, an advanced Sparse-Linear Attention model that enhances video generation efficiency by introducing a learnable r...

arXiv - Machine Learning · 3 min · about 1 month ago

Llms

[2602.12659] IndicFairFace: Balanced Indian Face Dataset for Auditing and Mitigating Geographical Bias in Vision-Language Models

The paper introduces IndicFairFace, a balanced dataset aimed at addressing geographical bias in Vision-Language Models (VLMs) by represen...

arXiv - AI · 4 min · about 1 month ago

Ai Infrastructure

[2602.12649] Multi-Task Learning with Additive U-Net for Image Denoising and Classification

This article presents the Additive U-Net architecture for image denoising and classification, highlighting its advantages in multi-task l...

arXiv - Machine Learning · 3 min · about 1 month ago

Previous Page 46 Next

Stay updated with AI News

Get the latest news, tools, and insights delivered to your inbox.

Subscribe to Newsletter

Daily or weekly digest • Unsubscribe anytime

Computer Vision

Top This Week

[2511.21428] From Observation to Action: Latent Action-based Primitive Segmentation for VLA Pre-training in Industrial Settings

[2511.16719] SAM 3: Segment Anything with Concepts

[2603.28594] Detection of Adversarial Attacks in Robotic Perception

All Content

[2509.10766] MetaSeal: Defending Against Image Attribution Forgery Through Content-Dependent Cryptographic Watermarks

[2507.03262] Investigating Redundancy in Multimodal Large Language Models with Multiple Vision Encoders

[2504.20101] PlanetServe: A Decentralized, Scalable, and Privacy-Preserving Overlay for Democratizing Large Language Model Serving

[2412.07909] Explaining and Mitigating the Modality Gap in Contrastive Multimodal Learning

[2410.03952] Pixel-Based Similarities as an Alternative to Neural Data for Improving Convolutional Neural Network Adversarial Robustness

[2510.00664] Batch-CAM: Introduction to better reasoning in convolutional deep learning models

[2508.07388] Invert4TVG: A Temporal Video Grounding Framework with Inversion Tasks Preserving Action Understanding Ability

[2505.14381] SCAN: Semantic Document Layout Analysis for Textual and Visual Retrieval-Augmented Generation

[2602.13191] CoPE-VideoLM: Codec Primitives For Efficient Video Language Models

[2602.13088] How cyborg propaganda reshapes collective action

[2602.13055] Curriculum-DPO++: Direct Preference Optimization via Data and Model Curricula for Text-to-Image Generation

[2602.12983] Detecting Object Tracking Failure via Sequential Hypothesis Testing

[2602.12933] Deep-Learning Atlas Registration for Melanoma Brain Metastases: Preserving Pathology While Enabling Cohort-Level Analyses

[2602.12919] EPRBench: A High-Quality Benchmark Dataset for Event Stream Based Visual Place Recognition

[2602.12902] Robustness of Object Detection of Autonomous Vehicles in Adverse Weather Conditions

[2602.12869] X-VORTEX: Spatio-Temporal Contrastive Learning for Wake Vortex Trajectory Forecasting

[2602.12758] VineetVC: Adaptive Video Conferencing Under Severe Bandwidth Constraints Using Audio-Driven Talking-Head Reconstruction

[2602.12675] SLA2: Sparse-Linear Attention with Learnable Routing and QAT

[2602.12659] IndicFairFace: Balanced Indian Face Dataset for Auditing and Mitigating Geographical Bias in Vision-Language Models

[2602.12649] Multi-Task Learning with Additive U-Net for Image Denoising and Classification

Related Topics

Stay updated with AI News