Generative AI
Image, video, audio, and text generation
Top This Week
[2601.08565] Rewriting Video: Text-Driven Reauthoring of Video Footage
Abstract page for arXiv paper 2601.08565: Rewriting Video: Text-Driven Reauthoring of Video Footage
[2512.18388] Exploration vs. Fixation: Scaffolding Divergent and Convergent Thinking for Human-AI Co-Creation with Generative Models
Abstract page for arXiv paper 2512.18388: Exploration vs. Fixation: Scaffolding Divergent and Convergent Thinking for Human-AI Co-Creatio...
All Content
[2602.03837] Accelerating Scientific Research with Gemini: Case Studies and Common Techniques
This article explores how Google's Gemini models enhance scientific research through case studies, showcasing effective human-AI collabor...
[2602.01023] Unifying Ranking and Generation in Query Auto-Completion via Retrieval-Augmented Generation and Multi-Objective Alignment
This paper presents a unified framework for Query Auto-Completion (QAC) that integrates Retrieval-Augmented Generation (RAG) and multi-ob...
[2601.21812] A Decomposable Forward Process in Diffusion Models for Time-Series Forecasting
This paper presents a novel forward diffusion process for time-series forecasting that effectively decomposes signals into spectral compo...
[2511.20974] RosettaSpeech: Zero-Shot Speech-to-Speech Translation without Parallel Speech
RosettaSpeech introduces a zero-shot framework for speech-to-speech translation, overcoming the need for parallel speech data by using mo...
[2601.09982] Context Volume Drives Performance: Tackling Domain Shift in Extremely Low-Resource Translation via RAG
This article presents a hybrid framework for improving neural machine translation performance in low-resource languages, specifically add...
[2512.22420] Nightjar: Dynamic Adaptive Speculative Decoding for Large Language Models Serving
The paper presents Nightjar, a novel algorithm for dynamic adaptive speculative decoding in large language models, enhancing throughput a...
[2510.08431] Large Scale Diffusion Distillation via Score-Regularized Continuous-Time Consistency
This paper presents a novel approach to large-scale diffusion distillation using a score-regularized continuous-time consistency model, a...
[2512.14166] IntentMiner: Intent Inversion Attack via Tool Call Analysis in the Model Context Protocol
The paper introduces IntentMiner, a novel approach to detect Intent Inversion Attacks in Large Language Models (LLMs) by analyzing tool c...
[2512.13697] Writing in Symbiosis: Mapping Human Creative Agency in the AI Era
This article explores the evolving relationship between human creativity and AI, particularly in writing, highlighting how authors adapt ...
[2512.09185] Learning Patient-Specific Disease Dynamics with Latent Flow Matching for Longitudinal Imaging Generation
The paper presents a novel framework, $ ext{Δ}$-LFM, for modeling patient-specific disease dynamics using latent flow matching, enhancing...
[2512.04552] RRPO: Robust Reward Policy Optimization for LLM-based Emotional TTS
The paper presents Robust Reward Policy Optimization (RRPO), a novel framework designed to enhance emotional text-to-speech (TTS) systems...
[2509.20928] Conditionally Whitened Generative Models for Probabilistic Time Series Forecasting
The paper introduces Conditionally Whitened Generative Models (CW-Gen) for probabilistic time series forecasting, addressing challenges l...
[2510.22876] Batch Speculative Decoding Done Right
The paper presents a novel framework for batch speculative decoding, addressing critical failures in existing methods and achieving signi...
[2507.07139] Image Can Bring Your Memory Back: A Novel Multi-Modal Guided Attack against Image Generation Model Unlearning
The paper presents Recall, a novel adversarial framework that targets the robustness of image generation model unlearning, revealing vuln...
[2510.04398] SECA: Semantically Equivalent and Coherent Attacks for Eliciting LLM Hallucinations
The paper presents SECA, a method for eliciting hallucinations in large language models (LLMs) through semantically equivalent and cohere...
[2510.02356] Measuring Physical-World Privacy Awareness of Large Language Models: An Evaluation Benchmark
This article presents EAPrivacy, a benchmark for evaluating the physical-world privacy awareness of large language models (LLMs), reveali...
[2510.00232] BiasFreeBench: a Benchmark for Mitigating Bias in Large Language Model Responses
The paper introduces BiasFreeBench, a benchmark designed to evaluate bias mitigation techniques in large language models (LLMs) by provid...
[2509.18776] AECBench: A Hierarchical Benchmark for Knowledge Evaluation of Large Language Models in the AEC Field
The paper introduces AECBench, a benchmark for evaluating large language models (LLMs) in the Architecture, Engineering, and Construction...
[2503.10522] AudioX: A Unified Framework for Anything-to-Audio Generation
AudioX presents a unified framework for generating audio from various multimodal inputs, enhancing the quality and flexibility of audio g...
[2411.01629] Denoising Diffusions with Optimal Transport: Localization, Curvature, and Multi-Scale Complexity
This paper explores denoising diffusions using optimal transport, focusing on localization, curvature, and multi-scale complexity in gene...
Related Topics
Stay updated with AI News
Get the latest news, tools, and insights delivered to your inbox.
Daily or weekly digest • Unsubscribe anytime