Generative AI

Image, video, audio, and text generation

Top This Week

[2512.23994] PhyAVBench: A Challenging Audio Physics-Sensitivity Benchmark for Physically Grounded Text-to-Audio-Video Generation
Machine Learning

[2512.23994] PhyAVBench: A Challenging Audio Physics-Sensitivity Benchmark for Physically Grounded Text-to-Audio-Video Generation

Abstract page for arXiv paper 2512.23994: PhyAVBench: A Challenging Audio Physics-Sensitivity Benchmark for Physically Grounded Text-to-A...

arXiv - AI · 4 min ·
[2512.10785] Developing and Evaluating a Large Language Model-Based Automated Feedback System Grounded in Evidence-Centered Design for Supporting Physics Problem Solving
Llms

[2512.10785] Developing and Evaluating a Large Language Model-Based Automated Feedback System Grounded in Evidence-Centered Design for Supporting Physics Problem Solving

Abstract page for arXiv paper 2512.10785: Developing and Evaluating a Large Language Model-Based Automated Feedback System Grounded in Ev...

arXiv - AI · 4 min ·
[2510.13870] Unlocking the Potential of Diffusion Language Models through Template Infilling
Llms

[2510.13870] Unlocking the Potential of Diffusion Language Models through Template Infilling

Abstract page for arXiv paper 2510.13870: Unlocking the Potential of Diffusion Language Models through Template Infilling

arXiv - AI · 3 min ·

All Content

[2602.12996] Know More, Know Clearer: A Meta-Cognitive Framework for Knowledge Augmentation in Large Language Models
Llms

[2602.12996] Know More, Know Clearer: A Meta-Cognitive Framework for Knowledge Augmentation in Large Language Models

This article presents a novel meta-cognitive framework aimed at enhancing knowledge augmentation in Large Language Models (LLMs), address...

arXiv - AI · 3 min ·
[2602.12924] Never say never: Exploring the effects of available knowledge on agent persuasiveness in controlled physiotherapy motivation dialogues
Robotics

[2602.12924] Never say never: Exploring the effects of available knowledge on agent persuasiveness in controlled physiotherapy motivation dialogues

This article examines how the availability of knowledge influences the persuasiveness of generative social agents (GSAs) in physiotherapy...

arXiv - AI · 4 min ·
[2602.12873] Knowledge-Based Design Requirements for Generative Social Robots in Higher Education
Llms

[2602.12873] Knowledge-Based Design Requirements for Generative Social Robots in Higher Education

The article explores design requirements for generative social robots in higher education, emphasizing the need for knowledge-based frame...

arXiv - AI · 3 min ·
[2602.12846] Amortized Reasoning Tree Search: Decoupling Proposal and Decision in Large Language Models
Llms

[2602.12846] Amortized Reasoning Tree Search: Decoupling Proposal and Decision in Large Language Models

The paper presents Amortized Reasoning Tree Search (ARTS), a novel approach to enhance reasoning in Large Language Models by decoupling p...

arXiv - Machine Learning · 4 min ·
[2602.12829] FLAC: Maximum Entropy RL via Kinetic Energy Regularized Bridge Matching
Machine Learning

[2602.12829] FLAC: Maximum Entropy RL via Kinetic Energy Regularized Bridge Matching

The paper presents FLAC, a novel framework for Maximum Entropy Reinforcement Learning that utilizes kinetic energy regularization to opti...

arXiv - Machine Learning · 4 min ·
[2602.12763] "Not Human, Funnier": How Machine Identity Shapes Humor Perception in Online AI Stand-up Comedy
Llms

[2602.12763] "Not Human, Funnier": How Machine Identity Shapes Humor Perception in Online AI Stand-up Comedy

This article explores how AI's machine identity influences humor perception in online stand-up comedy, revealing that AI can be perceived...

arXiv - AI · 3 min ·
[2602.12705] MedXIAOHE: A Comprehensive Recipe for Building Medical MLLMs
Llms

[2602.12705] MedXIAOHE: A Comprehensive Recipe for Building Medical MLLMs

MedXIAOHE is a medical vision-language foundation model that enhances medical understanding and reasoning in clinical applications, achie...

arXiv - AI · 3 min ·
[2602.12675] SLA2: Sparse-Linear Attention with Learnable Routing and QAT
Machine Learning

[2602.12675] SLA2: Sparse-Linear Attention with Learnable Routing and QAT

The paper presents SLA2, an advanced Sparse-Linear Attention model that enhances video generation efficiency by introducing a learnable r...

arXiv - Machine Learning · 3 min ·
[2602.12642] Beyond Normalization: Rethinking the Partition Function as a Difficulty Scheduler for RLVR
Llms

[2602.12642] Beyond Normalization: Rethinking the Partition Function as a Difficulty Scheduler for RLVR

This article presents a novel approach to reinforcement learning by reinterpreting the partition function as a difficulty scheduler, enha...

arXiv - AI · 4 min ·
[2602.12574] Monte Carlo Tree Search with Reasoning Path Refinement for Small Language Models in Conversational Text-to-NoSQL
Llms

[2602.12574] Monte Carlo Tree Search with Reasoning Path Refinement for Small Language Models in Conversational Text-to-NoSQL

This paper presents a novel framework, Stage-MCTS, which enhances small language models' ability to generate NoSQL queries through conver...

arXiv - AI · 4 min ·
[2602.12470] Designing RNAs with Language Models
Llms

[2602.12470] Designing RNAs with Language Models

The paper presents a novel approach to RNA design using language models, reframing the task as conditional sequence generation, which sig...

arXiv - Machine Learning · 3 min ·
[2602.12424] RankLLM: Weighted Ranking of LLMs by Quantifying Question Difficulty
Llms

[2602.12424] RankLLM: Weighted Ranking of LLMs by Quantifying Question Difficulty

The paper introduces RankLLM, a framework for evaluating large language models (LLMs) by quantifying question difficulty, enhancing model...

arXiv - AI · 4 min ·
[2602.12393] Reproducing DragDiffusion: Interactive Point-Based Editing with Diffusion Models
Machine Learning

[2602.12393] Reproducing DragDiffusion: Interactive Point-Based Editing with Diffusion Models

This article presents a reproducibility study of DragDiffusion, a method for interactive point-based image editing using diffusion models...

arXiv - Machine Learning · 4 min ·
[2602.12311] Perceptual Self-Reflection in Agentic Physics Simulation Code Generation
Nlp

[2602.12311] Perceptual Self-Reflection in Agentic Physics Simulation Code Generation

This article presents a multi-agent framework for generating physics simulation code from natural language descriptions, introducing a no...

arXiv - AI · 4 min ·
[2602.12304] OmniCustom: Sync Audio-Video Customization Via Joint Audio-Video Generation Model
Machine Learning

[2602.12304] OmniCustom: Sync Audio-Video Customization Via Joint Audio-Video Generation Model

The paper introduces OmniCustom, a novel framework for synchronizing audio-video customization, enhancing identity and timbre fidelity th...

arXiv - AI · 4 min ·
[2602.13093] Consistency of Large Reasoning Models Under Multi-Turn Attacks
Machine Learning

[2602.13093] Consistency of Large Reasoning Models Under Multi-Turn Attacks

This article evaluates the robustness of large reasoning models against multi-turn adversarial attacks, revealing vulnerabilities and pro...

arXiv - AI · 3 min ·
[2602.12586] Can I Have Your Order? Monte-Carlo Tree Search for Slot Filling Ordering in Diffusion Language Models
Llms

[2602.12586] Can I Have Your Order? Monte-Carlo Tree Search for Slot Filling Ordering in Diffusion Language Models

This paper introduces McDiffuSE, a Monte Carlo Tree Search framework aimed at optimizing slot filling orders in Masked Diffusion Models, ...

arXiv - AI · 3 min ·
[2602.12566] To Mix or To Merge: Toward Multi-Domain Reinforcement Learning for Large Language Models
Llms

[2602.12566] To Mix or To Merge: Toward Multi-Domain Reinforcement Learning for Large Language Models

This paper explores the effectiveness of multi-domain reinforcement learning for large language models, comparing mixed multi-task traini...

arXiv - AI · 4 min ·
Llms

Customizable AI Companions.

The article discusses the potential of customizable AI companions that can engage in real-time video calls, leveraging technologies like ...

Reddit - Artificial Intelligence · 1 min ·
Llms

Qwen3.5 vs DeepSeek — which matters more?

The discussion compares Qwen3.5 and DeepSeek, two AI models released around the same time, highlighting user excitement and potential app...

Reddit - Artificial Intelligence · 1 min ·
Previous Page 108 Next

Related Topics

Stay updated with AI News

Get the latest news, tools, and insights delivered to your inbox.

Daily or weekly digest • Unsubscribe anytime