Generative AI

Image, video, audio, and text generation

This Week's Best | Monthly Best | Guide | Trending

Top This Week

Machine Learning

AI video generation seems fundamentally more expensive than text, not just less optimized

There’s been a lot of discussion recently about how expensive AI video generation is compared to text, and it feels like this is more tha...

Reddit - Artificial Intelligence · 1 min · about 7 hours ago

Machine Learning

Accelerating science with AI and simulations

MIT Professor Rafael Gómez-Bombarelli discusses the transformative potential of AI in scientific research, emphasizing its role in materi...

AI News - General · 10 min · about 19 hours ago

Machine Learning

[2603.10202] Hybrid Hidden Markov Model for Modeling Equity Excess Growth Rate Dynamics: A Discrete-State Approach with Jump-Diffusion

Abstract page for arXiv paper 2603.10202: Hybrid Hidden Markov Model for Modeling Equity Excess Growth Rate Dynamics: A Discrete-State Ap...

arXiv - Machine Learning · 4 min · about 19 hours ago

All Content

Llms

[2510.13632] Closing the Gap Between Text and Speech Understanding in LLMs

This paper addresses the performance gap between text and speech understanding in large language models (LLMs), proposing a new method, S...

arXiv - AI · 4 min · about 1 month ago

Llms

[2510.10987] DITTO: A Spoofing Attack Framework on Watermarked LLMs via Knowledge Distillation

The paper introduces DITTO, a spoofing attack framework that exploits vulnerabilities in watermarked large language models (LLMs) via kno...

arXiv - AI · 4 min · about 1 month ago

Llms

[2507.11768] LLMs are Bayesian, In Expectation, Not in Realization

This paper explores the Bayesian nature of large language models (LLMs) in expectation rather than realization, highlighting the impact o...

arXiv - Machine Learning · 4 min · about 1 month ago

Llms

[2510.04891] SocialHarmBench: Revealing LLM Vulnerabilities to Socially Harmful Requests

The paper introduces SocialHarmBench, a dataset designed to evaluate the vulnerabilities of large language models (LLMs) to socially harm...

arXiv - Machine Learning · 4 min · about 1 month ago

Llms

[2506.01928] Esoteric Language Models: Bridging Autoregressive and Masked Diffusion LLMs

The paper introduces Eso-LMs, a novel language model that integrates autoregressive and masked diffusion paradigms, enhancing inference e...

arXiv - Machine Learning · 4 min · about 1 month ago

Machine Learning

[2509.24526] CMT: Mid-Training for Efficient Learning of Consistency, Mean Flow, and Flow Map Models

The paper introduces Consistency Mid-Training (CMT), a novel method for enhancing the efficiency of training flow map models, achieving s...

arXiv - Machine Learning · 4 min · about 1 month ago

Machine Learning

[2504.10507] PinRec: Unified Generative Retrieval for Pinterest Recommender Systems

The paper introduces PinRec, a unified generative retrieval model for Pinterest's recommendation systems, enhancing performance across va...

arXiv - Machine Learning · 4 min · about 1 month ago

Llms

[2509.23040] Look Back to Reason Forward: Revisitable Memory for Long-Context LLM Agents

The paper presents ReMemR1, a novel approach for enhancing long-context reasoning in large language models by integrating revisitable mem...

arXiv - AI · 4 min · about 1 month ago

Machine Learning

[2501.06336] MEt3R: Measuring Multi-View Consistency in Generated Images

The paper presents MEt3R, a novel metric for assessing multi-view consistency in generated images, addressing limitations of traditional ...

arXiv - Machine Learning · 4 min · about 1 month ago

Llms

[2508.11915] CORE: Measuring Multi-Agent LLM Interaction Quality under Game-Theoretic Pressures

The paper introduces CORE, a metric for evaluating language quality in multi-agent LLM interactions under game-theoretic conditions, reve...

arXiv - Machine Learning · 4 min · about 1 month ago

Llms

[2410.02099] A Watermark for Black-Box Language Models

The paper presents a novel watermarking scheme for black-box language models, enabling detection of model outputs without requiring white...

arXiv - Machine Learning · 3 min · about 1 month ago

Llms

[2506.07751] AbstRaL: Augmenting LLMs' Reasoning by Reinforcing Abstract Thinking

The paper presents AbstRaL, a method to enhance large language models' reasoning capabilities by reinforcing abstract thinking, particula...

arXiv - AI · 4 min · about 1 month ago

Llms

[2505.16789] Accidental Vulnerability: Factors in Fine-Tuning that Shift Model Safeguards

The paper explores how fine-tuning large language models can unintentionally create vulnerabilities, analyzing factors like dataset chara...

arXiv - Machine Learning · 3 min · about 1 month ago

Llms

[2505.16670] BitHydra: Towards Bit-flip Inference Cost Attack against Large Language Models

The paper presents BitHydra, a framework for executing bit-flip inference cost attacks on large language models (LLMs), demonstrating how...

arXiv - AI · 4 min · about 1 month ago

Llms

[2602.01428] Improving the Trade-off Between Watermark Strength and Speculative Sampling Efficiency for Language Models

This paper explores the balance between watermark strength and speculative sampling efficiency in language models, proposing a new approa...

arXiv - Machine Learning · 4 min · about 1 month ago

Machine Learning

[2602.01289] Gradient-Aligned Calibration for Post-Training Quantization of Diffusion Models

The paper presents a novel method for post-training quantization (PTQ) of diffusion models, addressing inefficiencies in existing calibra...

arXiv - Machine Learning · 4 min · about 1 month ago

Llms

[2504.04717] Beyond Single-Turn: A Survey on Multi-Turn Interactions with Large Language Models

This article surveys advancements in multi-turn interactions with large language models (LLMs), focusing on evaluation methods, challenge...

arXiv - AI · 4 min · about 1 month ago

Machine Learning

[2503.23377] JavisDiT: Joint Audio-Video Diffusion Transformer with Hierarchical Spatio-Temporal Prior Synchronization

The paper presents JavisDiT, a novel Joint Audio-Video Diffusion Transformer that enhances synchronized audio-video generation through a ...

arXiv - AI · 4 min · about 1 month ago

Nlp

[2601.03612] Mathematical Foundations of Polyphonic Music Generation via Structural Inductive Bias

This article presents a novel approach to polyphonic music generation using structural inductive bias, focusing on Beethoven's piano sona...

arXiv - Machine Learning · 3 min · about 1 month ago

Llms

[2501.17860] Dialogue is Better Than Monologue: Instructing Medical LLMs via Strategical Conversations

This article presents a novel approach to training medical large language models (LLMs) through dialogue-based fine-tuning, improving the...

arXiv - AI · 3 min · about 1 month ago

Previous Page 53 Next

Stay updated with AI News

Get the latest news, tools, and insights delivered to your inbox.

Subscribe to Newsletter

Daily or weekly digest • Unsubscribe anytime

Generative AI

Top This Week

AI video generation seems fundamentally more expensive than text, not just less optimized

Accelerating science with AI and simulations

[2603.10202] Hybrid Hidden Markov Model for Modeling Equity Excess Growth Rate Dynamics: A Discrete-State Approach with Jump-Diffusion

All Content

[2510.13632] Closing the Gap Between Text and Speech Understanding in LLMs

[2510.10987] DITTO: A Spoofing Attack Framework on Watermarked LLMs via Knowledge Distillation

[2507.11768] LLMs are Bayesian, In Expectation, Not in Realization

[2510.04891] SocialHarmBench: Revealing LLM Vulnerabilities to Socially Harmful Requests

[2506.01928] Esoteric Language Models: Bridging Autoregressive and Masked Diffusion LLMs

[2509.24526] CMT: Mid-Training for Efficient Learning of Consistency, Mean Flow, and Flow Map Models

[2504.10507] PinRec: Unified Generative Retrieval for Pinterest Recommender Systems

[2509.23040] Look Back to Reason Forward: Revisitable Memory for Long-Context LLM Agents

[2501.06336] MEt3R: Measuring Multi-View Consistency in Generated Images

[2508.11915] CORE: Measuring Multi-Agent LLM Interaction Quality under Game-Theoretic Pressures

[2410.02099] A Watermark for Black-Box Language Models

[2506.07751] AbstRaL: Augmenting LLMs' Reasoning by Reinforcing Abstract Thinking

[2505.16789] Accidental Vulnerability: Factors in Fine-Tuning that Shift Model Safeguards

[2505.16670] BitHydra: Towards Bit-flip Inference Cost Attack against Large Language Models

[2602.01428] Improving the Trade-off Between Watermark Strength and Speculative Sampling Efficiency for Language Models

[2602.01289] Gradient-Aligned Calibration for Post-Training Quantization of Diffusion Models

[2504.04717] Beyond Single-Turn: A Survey on Multi-Turn Interactions with Large Language Models

[2503.23377] JavisDiT: Joint Audio-Video Diffusion Transformer with Hierarchical Spatio-Temporal Prior Synchronization

[2601.03612] Mathematical Foundations of Polyphonic Music Generation via Structural Inductive Bias

[2501.17860] Dialogue is Better Than Monologue: Instructing Medical LLMs via Strategical Conversations

Related Topics

Stay updated with AI News