Generative AI

Image, video, audio, and text generation

Top This Week

Machine Learning

AI video generation seems fundamentally more expensive than text, not just less optimized

There’s been a lot of discussion recently about how expensive AI video generation is compared to text, and it feels like this is more tha...

Reddit - Artificial Intelligence · 1 min ·
Accelerating science with AI and simulations
Machine Learning

Accelerating science with AI and simulations

MIT Professor Rafael Gómez-Bombarelli discusses the transformative potential of AI in scientific research, emphasizing its role in materi...

AI News - General · 10 min ·
[2603.10202] Hybrid Hidden Markov Model for Modeling Equity Excess Growth Rate Dynamics: A Discrete-State Approach with Jump-Diffusion
Machine Learning

[2603.10202] Hybrid Hidden Markov Model for Modeling Equity Excess Growth Rate Dynamics: A Discrete-State Approach with Jump-Diffusion

Abstract page for arXiv paper 2603.10202: Hybrid Hidden Markov Model for Modeling Equity Excess Growth Rate Dynamics: A Discrete-State Ap...

arXiv - Machine Learning · 4 min ·

All Content

[2510.13632] Closing the Gap Between Text and Speech Understanding in LLMs
Llms

[2510.13632] Closing the Gap Between Text and Speech Understanding in LLMs

This paper addresses the performance gap between text and speech understanding in large language models (LLMs), proposing a new method, S...

arXiv - AI · 4 min ·
[2510.10987] DITTO: A Spoofing Attack Framework on Watermarked LLMs via Knowledge Distillation
Llms

[2510.10987] DITTO: A Spoofing Attack Framework on Watermarked LLMs via Knowledge Distillation

The paper introduces DITTO, a spoofing attack framework that exploits vulnerabilities in watermarked large language models (LLMs) via kno...

arXiv - AI · 4 min ·
[2507.11768] LLMs are Bayesian, In Expectation, Not in Realization
Llms

[2507.11768] LLMs are Bayesian, In Expectation, Not in Realization

This paper explores the Bayesian nature of large language models (LLMs) in expectation rather than realization, highlighting the impact o...

arXiv - Machine Learning · 4 min ·
[2510.04891] SocialHarmBench: Revealing LLM Vulnerabilities to Socially Harmful Requests
Llms

[2510.04891] SocialHarmBench: Revealing LLM Vulnerabilities to Socially Harmful Requests

The paper introduces SocialHarmBench, a dataset designed to evaluate the vulnerabilities of large language models (LLMs) to socially harm...

arXiv - Machine Learning · 4 min ·
[2506.01928] Esoteric Language Models: Bridging Autoregressive and Masked Diffusion LLMs
Llms

[2506.01928] Esoteric Language Models: Bridging Autoregressive and Masked Diffusion LLMs

The paper introduces Eso-LMs, a novel language model that integrates autoregressive and masked diffusion paradigms, enhancing inference e...

arXiv - Machine Learning · 4 min ·
[2509.24526] CMT: Mid-Training for Efficient Learning of Consistency, Mean Flow, and Flow Map Models
Machine Learning

[2509.24526] CMT: Mid-Training for Efficient Learning of Consistency, Mean Flow, and Flow Map Models

The paper introduces Consistency Mid-Training (CMT), a novel method for enhancing the efficiency of training flow map models, achieving s...

arXiv - Machine Learning · 4 min ·
[2504.10507] PinRec: Unified Generative Retrieval for Pinterest Recommender Systems
Machine Learning

[2504.10507] PinRec: Unified Generative Retrieval for Pinterest Recommender Systems

The paper introduces PinRec, a unified generative retrieval model for Pinterest's recommendation systems, enhancing performance across va...

arXiv - Machine Learning · 4 min ·
[2509.23040] Look Back to Reason Forward: Revisitable Memory for Long-Context LLM Agents
Llms

[2509.23040] Look Back to Reason Forward: Revisitable Memory for Long-Context LLM Agents

The paper presents ReMemR1, a novel approach for enhancing long-context reasoning in large language models by integrating revisitable mem...

arXiv - AI · 4 min ·
[2501.06336] MEt3R: Measuring Multi-View Consistency in Generated Images
Machine Learning

[2501.06336] MEt3R: Measuring Multi-View Consistency in Generated Images

The paper presents MEt3R, a novel metric for assessing multi-view consistency in generated images, addressing limitations of traditional ...

arXiv - Machine Learning · 4 min ·
[2508.11915] CORE: Measuring Multi-Agent LLM Interaction Quality under Game-Theoretic Pressures
Llms

[2508.11915] CORE: Measuring Multi-Agent LLM Interaction Quality under Game-Theoretic Pressures

The paper introduces CORE, a metric for evaluating language quality in multi-agent LLM interactions under game-theoretic conditions, reve...

arXiv - Machine Learning · 4 min ·
[2410.02099] A Watermark for Black-Box Language Models
Llms

[2410.02099] A Watermark for Black-Box Language Models

The paper presents a novel watermarking scheme for black-box language models, enabling detection of model outputs without requiring white...

arXiv - Machine Learning · 3 min ·
[2506.07751] AbstRaL: Augmenting LLMs' Reasoning by Reinforcing Abstract Thinking
Llms

[2506.07751] AbstRaL: Augmenting LLMs' Reasoning by Reinforcing Abstract Thinking

The paper presents AbstRaL, a method to enhance large language models' reasoning capabilities by reinforcing abstract thinking, particula...

arXiv - AI · 4 min ·
[2505.16789] Accidental Vulnerability: Factors in Fine-Tuning that Shift Model Safeguards
Llms

[2505.16789] Accidental Vulnerability: Factors in Fine-Tuning that Shift Model Safeguards

The paper explores how fine-tuning large language models can unintentionally create vulnerabilities, analyzing factors like dataset chara...

arXiv - Machine Learning · 3 min ·
[2505.16670] BitHydra: Towards Bit-flip Inference Cost Attack against Large Language Models
Llms

[2505.16670] BitHydra: Towards Bit-flip Inference Cost Attack against Large Language Models

The paper presents BitHydra, a framework for executing bit-flip inference cost attacks on large language models (LLMs), demonstrating how...

arXiv - AI · 4 min ·
[2602.01428] Improving the Trade-off Between Watermark Strength and Speculative Sampling Efficiency for Language Models
Llms

[2602.01428] Improving the Trade-off Between Watermark Strength and Speculative Sampling Efficiency for Language Models

This paper explores the balance between watermark strength and speculative sampling efficiency in language models, proposing a new approa...

arXiv - Machine Learning · 4 min ·
[2602.01289] Gradient-Aligned Calibration for Post-Training Quantization of Diffusion Models
Machine Learning

[2602.01289] Gradient-Aligned Calibration for Post-Training Quantization of Diffusion Models

The paper presents a novel method for post-training quantization (PTQ) of diffusion models, addressing inefficiencies in existing calibra...

arXiv - Machine Learning · 4 min ·
[2504.04717] Beyond Single-Turn: A Survey on Multi-Turn Interactions with Large Language Models
Llms

[2504.04717] Beyond Single-Turn: A Survey on Multi-Turn Interactions with Large Language Models

This article surveys advancements in multi-turn interactions with large language models (LLMs), focusing on evaluation methods, challenge...

arXiv - AI · 4 min ·
[2503.23377] JavisDiT: Joint Audio-Video Diffusion Transformer with Hierarchical Spatio-Temporal Prior Synchronization
Machine Learning

[2503.23377] JavisDiT: Joint Audio-Video Diffusion Transformer with Hierarchical Spatio-Temporal Prior Synchronization

The paper presents JavisDiT, a novel Joint Audio-Video Diffusion Transformer that enhances synchronized audio-video generation through a ...

arXiv - AI · 4 min ·
[2601.03612] Mathematical Foundations of Polyphonic Music Generation via Structural Inductive Bias
Nlp

[2601.03612] Mathematical Foundations of Polyphonic Music Generation via Structural Inductive Bias

This article presents a novel approach to polyphonic music generation using structural inductive bias, focusing on Beethoven's piano sona...

arXiv - Machine Learning · 3 min ·
[2501.17860] Dialogue is Better Than Monologue: Instructing Medical LLMs via Strategical Conversations
Llms

[2501.17860] Dialogue is Better Than Monologue: Instructing Medical LLMs via Strategical Conversations

This article presents a novel approach to training medical large language models (LLMs) through dialogue-based fine-tuning, improving the...

arXiv - AI · 3 min ·
Previous Page 53 Next

Related Topics

Stay updated with AI News

Get the latest news, tools, and insights delivered to your inbox.

Daily or weekly digest • Unsubscribe anytime