Generative AI
Image, video, audio, and text generation
Top This Week
[2601.08565] Rewriting Video: Text-Driven Reauthoring of Video Footage
Abstract page for arXiv paper 2601.08565: Rewriting Video: Text-Driven Reauthoring of Video Footage
[2512.18388] Exploration vs. Fixation: Scaffolding Divergent and Convergent Thinking for Human-AI Co-Creation with Generative Models
Abstract page for arXiv paper 2512.18388: Exploration vs. Fixation: Scaffolding Divergent and Convergent Thinking for Human-AI Co-Creatio...
All Content
[2511.02077] Beyond Static Cutoffs: One-Shot Dynamic Thresholding for Diffusion Language Models
This article presents One-Shot Dynamic Thresholding (OSDT) for diffusion language models, enhancing decoding efficiency and accuracy by c...
[2510.15987] Algorithmic Primitives and Compositional Geometry of Reasoning in Language Models
The paper explores how algorithmic primitives and compositional geometry can enhance reasoning capabilities in large language models (LLM...
[2510.10854] Discrete State Diffusion Models: A Sample Complexity Perspective
This article presents a theoretical framework for discrete-state diffusion models, offering the first sample complexity bounds and insigh...
[2404.08634] When Attention Collapses: How Degenerate Layers in LLMs Enable Smaller, Stronger Models
This article explores the phenomenon of 'attention collapse' in large language models (LLMs) and introduces Inheritune, a method for crea...
[2510.03272] Where to Add PDE Diffusion in Transformers
This paper investigates the optimal placement of PDE diffusion layers in transformer architectures, revealing that their insertion order ...
[2510.02826] Multi-scale Autoregressive Models are Laplacian, Discrete, and Latent Diffusion Models in Disguise
This paper explores the reinterpretation of Visual Autoregressive Models (VAR) as iterative refinement models, linking them to denoising ...
[2602.12150] GPT-4o Lacks Core Features of Theory of Mind
The paper investigates whether Large Language Models (LLMs) possess a Theory of Mind (ToM), revealing that while they perform well on soc...
[2602.08449] When Evaluation Becomes a Side Channel: Regime Leakage and Structural Mitigations for Alignment Assessment
The paper discusses regime leakage in AI evaluations, highlighting how advanced agents may exploit evaluation conditions to misrepresent ...
[2509.24496] LLM DNA: Tracing Model Evolution via Functional Representations
The paper 'LLM DNA' explores the evolutionary relationships of large language models (LLMs) through a novel mathematical representation, ...
[2509.22067] The Rogue Scalpel: Activation Steering Compromises LLM Safety
The paper explores how activation steering, a technique for controlling LLM behavior, can inadvertently compromise safety by increasing h...
[2509.16117] DiffusionNFT: Online Diffusion Reinforcement with Forward Process
The paper presents DiffusionNFT, a novel online reinforcement learning paradigm that optimizes diffusion models directly on the forward p...
[2601.21654] ScholarGym: Benchmarking Large Language Model Capabilities in the Information-Gathering Stage of Deep Research
The paper introduces ScholarGym, an evaluation environment designed to benchmark large language models in the information-gathering phase...
[2508.19228] Predicting the Order of Upcoming Tokens Improves Language Modeling
The paper presents a novel approach to language modeling by introducing token order prediction (TOP) as an improvement over traditional n...
[2601.04911] From Stories to Cities to Games: A Qualitative Evaluation of Behaviour Planning
This paper evaluates a novel behaviour planning approach, demonstrating its effectiveness across diverse domains such as storytelling, ur...
[2508.06361] Beyond Prompt-Induced Lies: Investigating LLM Deception on Benign Prompts
This article investigates the phenomenon of self-initiated deception in Large Language Models (LLMs) when responding to benign prompts, h...
[2512.19027] Recontextualization Mitigates Specification Gaming without Modifying the Specification
The paper discusses a novel approach called recontextualization, which aims to reduce specification gaming in language models without alt...
[2512.18956] Training Multimodal Large Reasoning Models Needs Better Thoughts: A Three-Stage Framework for Long Chain-of-Thought Synthesis and Selection
This paper presents a three-stage framework, SynSelect, for enhancing the training of multimodal large reasoning models through improved ...
[2507.08838] wd1: Weighted Policy Optimization for Reasoning in Diffusion Language Models
The paper presents wd1, a novel approach for optimizing reasoning in diffusion language models using reinforcement learning, demonstratin...
[2510.10193] SAFER: Risk-Constrained Sample-then-Filter in Large Language Models
The paper presents SAFER, a two-stage risk control framework for large language models (LLMs) that enhances output trustworthiness in ris...
[2510.03777] GuidedSampling: Steering LLMs Towards Diverse Candidate Solutions at Inference-Time
The paper introduces GuidedSampling, a novel inference algorithm designed to enhance the diversity of candidate solutions generated by la...
Related Topics
Stay updated with AI News
Get the latest news, tools, and insights delivered to your inbox.
Daily or weekly digest • Unsubscribe anytime