Generative AI

Image, video, audio, and text generation

This Week's Best | Monthly Best | Guide | Trending

RSS

Top This Week

Generative Ai

Will Generative AI apps remain a revenue powerhouse in 2026?

AI Tools & Products · 1 min · about 8 hours ago

Machine Learning

[2601.08565] Rewriting Video: Text-Driven Reauthoring of Video Footage

Abstract page for arXiv paper 2601.08565: Rewriting Video: Text-Driven Reauthoring of Video Footage

arXiv - AI · 3 min · about 9 hours ago

Machine Learning

[2512.18388] Exploration vs. Fixation: Scaffolding Divergent and Convergent Thinking for Human-AI Co-Creation with Generative Models

Abstract page for arXiv paper 2512.18388: Exploration vs. Fixation: Scaffolding Divergent and Convergent Thinking for Human-AI Co-Creatio...

arXiv - AI · 4 min · about 9 hours ago

All Content

Machine Learning

[2602.14464] CoCoDiff: Correspondence-Consistent Diffusion Model for Fine-grained Style Transfer

The paper presents CoCoDiff, a novel framework for fine-grained style transfer in images, emphasizing semantic correspondence and achievi...

arXiv - AI · 3 min · about 2 months ago

Llms

[2602.14433] Synthetic Reader Panels: Tournament-Based Ideation with LLM Personas for Autonomous Publishing

The paper discusses a novel system for autonomous book ideation using synthetic reader panels composed of LLM personas to evaluate book c...

arXiv - AI · 4 min · about 2 months ago

Generative Ai

[2602.14381] Adapting VACE for Real-Time Autoregressive Video Diffusion

This article presents an adaptation of VACE for real-time autoregressive video generation, enhancing video control while addressing laten...

arXiv - AI · 3 min · about 2 months ago

Llms

[2602.14374] Differentially Private Retrieval-Augmented Generation

The paper presents DP-KSA, a novel algorithm that integrates differential privacy into retrieval-augmented generation (RAG) systems, addr...

arXiv - AI · 4 min · about 2 months ago

Llms

[2602.14270] A Rational Analysis of the Effects of Sycophantic AI

This article analyzes the impact of sycophantic AI on human belief systems, revealing how overly agreeable AI can distort reality and inf...

arXiv - AI · 3 min · about 2 months ago

Machine Learning

[2602.14237] AbracADDbra: Touch-Guided Object Addition by Decoupling Placement and Editing Subtasks

The paper presents AbracADDbra, a framework that enhances object addition in computer vision by decoupling placement and editing tasks th...

arXiv - AI · 3 min · about 2 months ago

Ai Agents

[2602.14211] SkillJect: Automating Stealthy Skill-Based Prompt Injection for Coding Agents with Trace-Driven Closed-Loop Refinement

The paper presents SkillJect, an automated framework for stealthy skill-based prompt injection in coding agents, addressing security vuln...

arXiv - AI · 4 min · about 2 months ago

Llms

[2602.14189] Knowing When Not to Answer: Abstention-Aware Scientific Reasoning

The paper discusses an abstention-aware framework for scientific reasoning, emphasizing the importance of knowing when to abstain from an...

arXiv - AI · 4 min · about 2 months ago

Llms

[2602.14188] GPT-5 vs Other LLMs in Long Short-Context Performance

This paper evaluates the performance of GPT-5 and other LLMs on long short-context tasks, revealing significant gaps between theoretical ...

arXiv - AI · 4 min · about 2 months ago

$[2602.14178] UniWeTok: An Unified Binary Tokenizer with Codebook Size $\mathit{2^{128}}$ for Unified Multimodal Large Language Model$

Llms

[2602.14178] UniWeTok: An Unified Binary Tokenizer with Codebook Size $\mathit{2^{128}}$ for Unified Multimodal Large Language Model

The paper presents UniWeTok, a unified binary tokenizer with a massive codebook size of 2^128, designed to enhance multimodal large langu...

arXiv - AI · 4 min · about 2 months ago

Machine Learning

[2602.14157] When Test-Time Guidance Is Enough: Fast Image and Video Editing with Diffusion Guidance

The paper explores a novel approach to image and video editing using test-time guidance with diffusion models, achieving performance comp...

arXiv - AI · 3 min · about 2 months ago

Llms

[2602.14158] A Multi-Agent Framework for Medical AI: Leveraging Fine-Tuned GPT, LLaMA, and DeepSeek R1 for Evidence-Based and Bias-Aware Clinical Query Processing

This article presents a multi-agent framework for medical AI that enhances clinical query processing by leveraging fine-tuned language mo...

arXiv - AI · 4 min · about 2 months ago

Llms

[2602.13942] A Theoretical Framework for LLM Fine-tuning Using Early Stopping for Non-random Initialization

This article presents a theoretical framework for fine-tuning large language models (LLMs) using early stopping and non-random initializa...

arXiv - Machine Learning · 3 min · about 2 months ago

Llms

[2602.14106] Anticipating Adversary Behavior in DevSecOps Scenarios through Large Language Models

This paper explores the integration of Large Language Models (LLMs) in anticipating adversary behavior within DevSecOps environments, pro...

arXiv - AI · 4 min · about 2 months ago

Llms

[2602.14080] Empty Shelves or Lost Keys? Recall Is the Bottleneck for Parametric Factuality

The paper explores the limitations of factuality evaluations in large language models (LLMs), identifying recall as a key bottleneck in a...

arXiv - AI · 4 min · about 2 months ago

Llms

[2602.14043] Beyond Static Snapshots: Dynamic Modeling and Forecasting of Group-Level Value Evolution with Large Language Models

This article presents a novel framework for dynamic modeling and forecasting of group-level value evolution using large language models (...

arXiv - AI · 4 min · about 2 months ago

Machine Learning

[2602.14041] BitDance: Scaling Autoregressive Generative Models with Binary Tokens

BitDance introduces a novel autoregressive image generator that utilizes binary tokens for enhanced efficiency and performance in generat...

arXiv - AI · 4 min · about 2 months ago

Machine Learning

[2602.13818] VAR-3D: View-aware Auto-Regressive Model for Text-to-3D Generation via a 3D Tokenizer

The VAR-3D model introduces a novel approach to text-to-3D generation, addressing challenges in discrete 3D representation and enhancing ...

arXiv - Machine Learning · 3 min · about 2 months ago

Llms

[2602.13954] Eureka-Audio: Triggering Audio Intelligence in Compact Language Models

Eureka-Audio presents a compact audio language model that outperforms larger models in various audio understanding tasks, showcasing effi...

arXiv - AI · 4 min · about 2 months ago

Llms

[2602.13543] LiveNewsBench: Evaluating LLM Web Search Capabilities with Freshly Curated News

The paper introduces LiveNewsBench, a benchmark for evaluating the web search capabilities of Large Language Models (LLMs) using freshly ...

arXiv - Machine Learning · 4 min · about 2 months ago

Previous Page 96 Next

Stay updated with AI News

Get the latest news, tools, and insights delivered to your inbox.

Subscribe to Newsletter

Daily or weekly digest • Unsubscribe anytime

Generative AI

Top This Week

Will Generative AI apps remain a revenue powerhouse in 2026?

[2601.08565] Rewriting Video: Text-Driven Reauthoring of Video Footage

[2512.18388] Exploration vs. Fixation: Scaffolding Divergent and Convergent Thinking for Human-AI Co-Creation with Generative Models

All Content

[2602.14464] CoCoDiff: Correspondence-Consistent Diffusion Model for Fine-grained Style Transfer

[2602.14433] Synthetic Reader Panels: Tournament-Based Ideation with LLM Personas for Autonomous Publishing

[2602.14381] Adapting VACE for Real-Time Autoregressive Video Diffusion

[2602.14374] Differentially Private Retrieval-Augmented Generation

[2602.14270] A Rational Analysis of the Effects of Sycophantic AI

[2602.14237] AbracADDbra: Touch-Guided Object Addition by Decoupling Placement and Editing Subtasks

[2602.14211] SkillJect: Automating Stealthy Skill-Based Prompt Injection for Coding Agents with Trace-Driven Closed-Loop Refinement

[2602.14189] Knowing When Not to Answer: Abstention-Aware Scientific Reasoning

[2602.14188] GPT-5 vs Other LLMs in Long Short-Context Performance

[2602.14178] UniWeTok: An Unified Binary Tokenizer with Codebook Size $\mathit{2^{128}}$ for Unified Multimodal Large Language Model

[2602.14157] When Test-Time Guidance Is Enough: Fast Image and Video Editing with Diffusion Guidance

[2602.14158] A Multi-Agent Framework for Medical AI: Leveraging Fine-Tuned GPT, LLaMA, and DeepSeek R1 for Evidence-Based and Bias-Aware Clinical Query Processing

[2602.13942] A Theoretical Framework for LLM Fine-tuning Using Early Stopping for Non-random Initialization

[2602.14106] Anticipating Adversary Behavior in DevSecOps Scenarios through Large Language Models

[2602.14080] Empty Shelves or Lost Keys? Recall Is the Bottleneck for Parametric Factuality

[2602.14043] Beyond Static Snapshots: Dynamic Modeling and Forecasting of Group-Level Value Evolution with Large Language Models

[2602.14041] BitDance: Scaling Autoregressive Generative Models with Binary Tokens

[2602.13818] VAR-3D: View-aware Auto-Regressive Model for Text-to-3D Generation via a 3D Tokenizer

[2602.13954] Eureka-Audio: Triggering Audio Intelligence in Compact Language Models

[2602.13543] LiveNewsBench: Evaluating LLM Web Search Capabilities with Freshly Curated News

Related Topics

Stay updated with AI News