[R] VOID: Video Object and Interaction Deletion (physically-consistent video inpainting)
We present VOID, a model for video object removal that aims to handle *physical interactions*, not just appearance. Most existing video i...
ML algorithms, training, and inference
We present VOID, a model for video object removal that aims to handle *physical interactions*, not just appearance. Most existing video i...
I sketched a cow and tested how different models interpret it into a realistic image for downstream 3D generation, turns out some models ...
Abstract page for arXiv paper 2603.25311: Practical Efficient Global Optimization is No-regret
Abstract page for arXiv paper 2603.25226: WebTestBench: Evaluating Computer-Use Agents towards End-to-End Automated Web Testing
Abstract page for arXiv paper 2603.25216: A Wireless World Model for AI-Native 6G Networks
Abstract page for arXiv paper 2603.25257: Mitigating Evasion Attacks in Fog Computing Resource Provisioning Through Proactive Hardening
Abstract page for arXiv paper 2603.25209: Free-Lunch Long Video Generation via Layer-Adaptive O.O.D Correction
Abstract page for arXiv paper 2603.25196: A Decade-Scale Benchmark Evaluating LLMs' Clinical Practice Guidelines Detection and Adherence ...
Abstract page for arXiv paper 2603.25251: Does Explanation Correctness Matter? Linking Computational XAI Evaluation to Human Understanding
Abstract page for arXiv paper 2603.25187: Probing the Lack of Stable Internal Beliefs in LLMs
Abstract page for arXiv paper 2603.25229: An Image Dataset of Common Skin Diseases of Bangladesh and Benchmarking Performance with Machin...
Abstract page for arXiv paper 2603.25250: Activation Matters: Test-time Activated Negative Labels for OOD Detection with Vision-Language ...
Abstract page for arXiv paper 2603.25170: Knowledge-Guided Adversarial Training for Infrared Object Detection via Thermal Radiation Modeling
Abstract page for arXiv paper 2603.25164: PIDP-Attack: Combining Prompt Injection with Database Poisoning Attacks on Retrieval-Augmented ...
Abstract page for arXiv paper 2603.25145: Learning to Rank Caption Chains for Video-Text Alignment
Abstract page for arXiv paper 2603.25155: Photon: Speedup Volume Understanding with Efficient Multimodal Large Language Models
Abstract page for arXiv paper 2603.25150: Goodness-of-pronunciation without phoneme time alignment
Abstract page for arXiv paper 2603.25146: Factors Influencing the Quality of AI-Generated Code: A Synthesis of Empirical Evidence
Abstract page for arXiv paper 2603.25144: FD$^2$: A Dedicated Framework for Fine-Grained Dataset Distillation
Abstract page for arXiv paper 2603.25068: Ultra-fast Traffic Nowcasting and Control via Differentiable Agent-based Simulation
Abstract page for arXiv paper 2603.25015: Imperative Interference: Social Register Shapes Instruction Topology in Large Language Models
Abstract page for arXiv paper 2603.25126: MCLMR: A Model-Agnostic Causal Learning Framework for Multi-Behavior Recommendation
Get the latest news, tools, and insights delivered to your inbox.
Daily or weekly digest • Unsubscribe anytime