Generative AI
Image, video, audio, and text generation
Top This Week
[2601.08565] Rewriting Video: Text-Driven Reauthoring of Video Footage
Abstract page for arXiv paper 2601.08565: Rewriting Video: Text-Driven Reauthoring of Video Footage
[2512.18388] Exploration vs. Fixation: Scaffolding Divergent and Convergent Thinking for Human-AI Co-Creation with Generative Models
Abstract page for arXiv paper 2512.18388: Exploration vs. Fixation: Scaffolding Divergent and Convergent Thinking for Human-AI Co-Creatio...
All Content
[2602.14464] CoCoDiff: Correspondence-Consistent Diffusion Model for Fine-grained Style Transfer
The paper presents CoCoDiff, a novel framework for fine-grained style transfer in images, emphasizing semantic correspondence and achievi...
[2602.14433] Synthetic Reader Panels: Tournament-Based Ideation with LLM Personas for Autonomous Publishing
The paper discusses a novel system for autonomous book ideation using synthetic reader panels composed of LLM personas to evaluate book c...
[2602.14381] Adapting VACE for Real-Time Autoregressive Video Diffusion
This article presents an adaptation of VACE for real-time autoregressive video generation, enhancing video control while addressing laten...
[2602.14374] Differentially Private Retrieval-Augmented Generation
The paper presents DP-KSA, a novel algorithm that integrates differential privacy into retrieval-augmented generation (RAG) systems, addr...
[2602.14270] A Rational Analysis of the Effects of Sycophantic AI
This article analyzes the impact of sycophantic AI on human belief systems, revealing how overly agreeable AI can distort reality and inf...
[2602.14237] AbracADDbra: Touch-Guided Object Addition by Decoupling Placement and Editing Subtasks
The paper presents AbracADDbra, a framework that enhances object addition in computer vision by decoupling placement and editing tasks th...
[2602.14211] SkillJect: Automating Stealthy Skill-Based Prompt Injection for Coding Agents with Trace-Driven Closed-Loop Refinement
The paper presents SkillJect, an automated framework for stealthy skill-based prompt injection in coding agents, addressing security vuln...
[2602.14189] Knowing When Not to Answer: Abstention-Aware Scientific Reasoning
The paper discusses an abstention-aware framework for scientific reasoning, emphasizing the importance of knowing when to abstain from an...
[2602.14188] GPT-5 vs Other LLMs in Long Short-Context Performance
This paper evaluates the performance of GPT-5 and other LLMs on long short-context tasks, revealing significant gaps between theoretical ...
[2602.14178] UniWeTok: An Unified Binary Tokenizer with Codebook Size $\mathit{2^{128}}$ for Unified Multimodal Large Language Model
The paper presents UniWeTok, a unified binary tokenizer with a massive codebook size of 2^128, designed to enhance multimodal large langu...
[2602.14157] When Test-Time Guidance Is Enough: Fast Image and Video Editing with Diffusion Guidance
The paper explores a novel approach to image and video editing using test-time guidance with diffusion models, achieving performance comp...
[2602.14158] A Multi-Agent Framework for Medical AI: Leveraging Fine-Tuned GPT, LLaMA, and DeepSeek R1 for Evidence-Based and Bias-Aware Clinical Query Processing
This article presents a multi-agent framework for medical AI that enhances clinical query processing by leveraging fine-tuned language mo...
[2602.13942] A Theoretical Framework for LLM Fine-tuning Using Early Stopping for Non-random Initialization
This article presents a theoretical framework for fine-tuning large language models (LLMs) using early stopping and non-random initializa...
[2602.14106] Anticipating Adversary Behavior in DevSecOps Scenarios through Large Language Models
This paper explores the integration of Large Language Models (LLMs) in anticipating adversary behavior within DevSecOps environments, pro...
[2602.14080] Empty Shelves or Lost Keys? Recall Is the Bottleneck for Parametric Factuality
The paper explores the limitations of factuality evaluations in large language models (LLMs), identifying recall as a key bottleneck in a...
[2602.14043] Beyond Static Snapshots: Dynamic Modeling and Forecasting of Group-Level Value Evolution with Large Language Models
This article presents a novel framework for dynamic modeling and forecasting of group-level value evolution using large language models (...
[2602.14041] BitDance: Scaling Autoregressive Generative Models with Binary Tokens
BitDance introduces a novel autoregressive image generator that utilizes binary tokens for enhanced efficiency and performance in generat...
[2602.13818] VAR-3D: View-aware Auto-Regressive Model for Text-to-3D Generation via a 3D Tokenizer
The VAR-3D model introduces a novel approach to text-to-3D generation, addressing challenges in discrete 3D representation and enhancing ...
[2602.13954] Eureka-Audio: Triggering Audio Intelligence in Compact Language Models
Eureka-Audio presents a compact audio language model that outperforms larger models in various audio understanding tasks, showcasing effi...
[2602.13543] LiveNewsBench: Evaluating LLM Web Search Capabilities with Freshly Curated News
The paper introduces LiveNewsBench, a benchmark for evaluating the web search capabilities of Large Language Models (LLMs) using freshly ...
Related Topics
Stay updated with AI News
Get the latest news, tools, and insights delivered to your inbox.
Daily or weekly digest • Unsubscribe anytime