Generative AI

Image, video, audio, and text generation

Top This Week

Machine Learning

AI video generation seems fundamentally more expensive than text, not just less optimized

There’s been a lot of discussion recently about how expensive AI video generation is compared to text, and it feels like this is more tha...

Reddit - Artificial Intelligence · 1 min ·
Accelerating science with AI and simulations
Machine Learning

Accelerating science with AI and simulations

MIT Professor Rafael Gómez-Bombarelli discusses the transformative potential of AI in scientific research, emphasizing its role in materi...

AI News - General · 10 min ·
[2603.10202] Hybrid Hidden Markov Model for Modeling Equity Excess Growth Rate Dynamics: A Discrete-State Approach with Jump-Diffusion
Machine Learning

[2603.10202] Hybrid Hidden Markov Model for Modeling Equity Excess Growth Rate Dynamics: A Discrete-State Approach with Jump-Diffusion

Abstract page for arXiv paper 2603.10202: Hybrid Hidden Markov Model for Modeling Equity Excess Growth Rate Dynamics: A Discrete-State Ap...

arXiv - Machine Learning · 4 min ·

All Content

[2602.19980] Discrete Diffusion Models Exploit Asymmetry to Solve Lookahead Planning Tasks
Llms

[2602.19980] Discrete Diffusion Models Exploit Asymmetry to Solve Lookahead Planning Tasks

This paper explores how Discrete Diffusion Models (NAR) outperform Autoregressive models (AR) in lookahead planning tasks by leveraging a...

arXiv - Machine Learning · 4 min ·
[2602.18873] BiMotion: B-spline Motion for Text-guided Dynamic 3D Character Generation
Generative Ai

[2602.18873] BiMotion: B-spline Motion for Text-guided Dynamic 3D Character Generation

BiMotion introduces a novel approach to dynamic 3D character generation using B-spline curves, enhancing motion quality and alignment wit...

arXiv - AI · 3 min ·
[2602.18846] DUET-VLM: Dual stage Unified Efficient Token reduction for VLM Training and Inference
Llms

[2602.18846] DUET-VLM: Dual stage Unified Efficient Token reduction for VLM Training and Inference

DUET-VLM introduces a dual-stage token reduction framework for vision-language models, enhancing efficiency without sacrificing accuracy ...

arXiv - AI · 4 min ·
[2602.19931] Expanding the Role of Diffusion Models for Robust Classifier Training
Machine Learning

[2602.19931] Expanding the Role of Diffusion Models for Robust Classifier Training

This article explores the use of diffusion models to enhance adversarial training for robust image classifiers, demonstrating improved pe...

arXiv - Machine Learning · 3 min ·
[2602.19895] DSDR: Dual-Scale Diversity Regularization for Exploration in LLM Reasoning
Llms

[2602.19895] DSDR: Dual-Scale Diversity Regularization for Exploration in LLM Reasoning

The paper presents DSDR, a novel reinforcement learning framework aimed at enhancing exploration in large language model (LLM) reasoning ...

arXiv - Machine Learning · 4 min ·
[2602.18782] MANATEE: Inference-Time Lightweight Diffusion Based Safety Defense for LLMs
Llms

[2602.18782] MANATEE: Inference-Time Lightweight Diffusion Based Safety Defense for LLMs

The paper presents MANATEE, a novel defense mechanism for large language models (LLMs) against adversarial attacks, utilizing a lightweig...

arXiv - Machine Learning · 3 min ·
[2602.18745] Synthesizing Multimodal Geometry Datasets from Scratch and Enabling Visual Alignment via Plotting Code
Llms

[2602.18745] Synthesizing Multimodal Geometry Datasets from Scratch and Enabling Visual Alignment via Plotting Code

The paper presents a novel pipeline for synthesizing multimodal geometry datasets, introducing the GeoCode dataset which enhances visual-...

arXiv - AI · 3 min ·
[2602.18734] Rethinking Retrieval-Augmented Generation as a Cooperative Decision-Making Problem
Nlp

[2602.18734] Rethinking Retrieval-Augmented Generation as a Cooperative Decision-Making Problem

This paper proposes a novel framework called Cooperative Retrieval-Augmented Generation (CoRAG), which reformulates retrieval-augmented g...

arXiv - AI · 3 min ·
[2602.19685] PerturbDiff: Functional Diffusion for Single-Cell Perturbation Modeling
Machine Learning

[2602.19685] PerturbDiff: Functional Diffusion for Single-Cell Perturbation Modeling

PerturbDiff introduces a novel approach to modeling single-cell responses to perturbations by utilizing a diffusion-based generative proc...

arXiv - AI · 4 min ·
[2602.18705] EDU-MATRIX: A Society-Centric Generative Cognitive Digital Twin Architecture for Secondary Education
Ai Agents

[2602.18705] EDU-MATRIX: A Society-Centric Generative Cognitive Digital Twin Architecture for Secondary Education

The EDU-MATRIX paper presents a novel generative cognitive digital twin architecture aimed at enhancing secondary education through a soc...

arXiv - AI · 3 min ·
[2602.18699] Semantic Substrate Theory: An Operator-Theoretic Framework for Geometric Semantic Drift
Generative Ai

[2602.18699] Semantic Substrate Theory: An Operator-Theoretic Framework for Geometric Semantic Drift

This paper introduces Semantic Substrate Theory, an operator-theoretic framework that formalizes various signals of semantic drift, integ...

arXiv - AI · 3 min ·
[2602.18630] Lost in Instructions: Study of Blind Users' Experiences with DIY Manuals and AI-Rewritten Instructions for Assembly, Operation, and Troubleshooting of Tangible Products
Llms

[2602.18630] Lost in Instructions: Study of Blind Users' Experiences with DIY Manuals and AI-Rewritten Instructions for Assembly, Operation, and Troubleshooting of Tangible Products

This study investigates the experiences of blind users with DIY manuals and AI-generated instructions for assembling and troubleshooting ...

arXiv - AI · 4 min ·
[2602.19619] Is Your Diffusion Sampler Actually Correct? A Sampler-Centric Evaluation of Discrete Diffusion Language Models
Llms

[2602.19619] Is Your Diffusion Sampler Actually Correct? A Sampler-Centric Evaluation of Discrete Diffusion Language Models

This article evaluates the accuracy of discrete diffusion language models (dLLMs) through a sampler-centric framework, revealing signific...

arXiv - Machine Learning · 3 min ·
[2602.18623] Finding the Signal in the Noise: An Exploratory Study on Assessing the Effectiveness of AI and Accessibility Forums for Blind Users' Support Needs
Generative Ai

[2602.18623] Finding the Signal in the Noise: An Exploratory Study on Assessing the Effectiveness of AI and Accessibility Forums for Blind Users' Support Needs

This study evaluates the effectiveness of AI and accessibility forums for blind users, highlighting user experiences and identifying supp...

arXiv - AI · 4 min ·
[2602.18589] DM4CT: Benchmarking Diffusion Models for Computed Tomography Reconstruction
Machine Learning

[2602.18589] DM4CT: Benchmarking Diffusion Models for Computed Tomography Reconstruction

The paper presents DM4CT, a benchmark for evaluating diffusion models in computed tomography (CT) reconstruction, addressing practical ch...

arXiv - AI · 4 min ·
[2602.18548] 1D-Bench: A Benchmark for Iterative UI Code Generation with Visual Feedback in Real-World
Data Science

[2602.18548] 1D-Bench: A Benchmark for Iterative UI Code Generation with Visual Feedback in Real-World

The paper introduces 1D-Bench, a benchmark for evaluating iterative UI code generation with visual feedback, aimed at improving design-to...

arXiv - AI · 4 min ·
[2602.19512] Variational Trajectory Optimization of Anisotropic Diffusion Schedules
Machine Learning

[2602.19512] Variational Trajectory Optimization of Anisotropic Diffusion Schedules

This paper presents a variational framework for optimizing anisotropic diffusion schedules in machine learning, enhancing performance acr...

arXiv - Machine Learning · 3 min ·
[2602.18514] Trojan Horses in Recruiting: A Red-Teaming Case Study on Indirect Prompt Injection in Standard vs. Reasoning Models
Llms

[2602.18514] Trojan Horses in Recruiting: A Red-Teaming Case Study on Indirect Prompt Injection in Standard vs. Reasoning Models

This article presents a case study on the security implications of Indirect Prompt Injection (IPI) in Large Language Models (LLMs) used i...

arXiv - AI · 4 min ·
[2602.18483] Red Teaming LLMs as Socio-Technical Practice: From Exploration and Data Creation to Evaluation
Llms

[2602.18483] Red Teaming LLMs as Socio-Technical Practice: From Exploration and Data Creation to Evaluation

The article examines red teaming as a socio-technical practice in evaluating large language models (LLMs), highlighting the importance of...

arXiv - AI · 4 min ·
[2602.18478] ZUNA: Flexible EEG Superresolution with Position-Aware Diffusion Autoencoders
Machine Learning

[2602.18478] ZUNA: Flexible EEG Superresolution with Position-Aware Diffusion Autoencoders

The paper presents ZUNA, a 380M-parameter masked diffusion autoencoder designed for EEG signal superresolution and channel infilling, dem...

arXiv - Machine Learning · 3 min ·
Previous Page 58 Next

Related Topics

Stay updated with AI News

Get the latest news, tools, and insights delivered to your inbox.

Daily or weekly digest • Unsubscribe anytime