Generative AI

Image, video, audio, and text generation

Top This Week

Machine Learning

AI video generation seems fundamentally more expensive than text, not just less optimized

There’s been a lot of discussion recently about how expensive AI video generation is compared to text, and it feels like this is more tha...

Reddit - Artificial Intelligence · 1 min ·
Accelerating science with AI and simulations
Machine Learning

Accelerating science with AI and simulations

MIT Professor Rafael Gómez-Bombarelli discusses the transformative potential of AI in scientific research, emphasizing its role in materi...

AI News - General · 10 min ·
[2603.10202] Hybrid Hidden Markov Model for Modeling Equity Excess Growth Rate Dynamics: A Discrete-State Approach with Jump-Diffusion
Machine Learning

[2603.10202] Hybrid Hidden Markov Model for Modeling Equity Excess Growth Rate Dynamics: A Discrete-State Approach with Jump-Diffusion

Abstract page for arXiv paper 2603.10202: Hybrid Hidden Markov Model for Modeling Equity Excess Growth Rate Dynamics: A Discrete-State Ap...

arXiv - Machine Learning · 4 min ·

All Content

[2602.19450] Red-Teaming Claude Opus and ChatGPT-based Security Advisors for Trusted Execution Environments
Llms

[2602.19450] Red-Teaming Claude Opus and ChatGPT-based Security Advisors for Trusted Execution Environments

This article presents a red-teaming study of Claude Opus and ChatGPT as security advisors for Trusted Execution Environments (TEEs), high...

arXiv - AI · 4 min ·
[2602.19441] When AI Teammates Meet Code Review: Collaboration Signals Shaping the Integration of Agent-Authored Pull Requests
Robotics

[2602.19441] When AI Teammates Meet Code Review: Collaboration Signals Shaping the Integration of Agent-Authored Pull Requests

This paper investigates how AI-generated pull requests integrate into human-led code review processes, emphasizing the importance of coll...

arXiv - AI · 3 min ·
[2602.19140] CaReFlow: Cyclic Adaptive Rectified Flow for Multimodal Fusion
Machine Learning

[2602.19140] CaReFlow: Cyclic Adaptive Rectified Flow for Multimodal Fusion

The paper presents CaReFlow, a novel approach for multimodal fusion that addresses modality gaps using cyclic adaptive rectified flow, en...

arXiv - Machine Learning · 4 min ·
[2602.19089] Ani3DHuman: Photorealistic 3D Human Animation with Self-guided Stochastic Sampling
Generative Ai

[2602.19089] Ani3DHuman: Photorealistic 3D Human Animation with Self-guided Stochastic Sampling

Ani3DHuman presents a novel framework for photorealistic 3D human animation, combining kinematics-based methods with video diffusion prio...

arXiv - Machine Learning · 4 min ·
[2602.19049] IAPO: Information-Aware Policy Optimization for Token-Efficient Reasoning
Llms

[2602.19049] IAPO: Information-Aware Policy Optimization for Token-Efficient Reasoning

The paper presents IAPO, a novel framework for token-efficient reasoning in large language models, enhancing accuracy while reducing infe...

arXiv - Machine Learning · 3 min ·
[2602.19348] MultiDiffSense: Diffusion-Based Multi-Modal Visuo-Tactile Image Generation Conditioned on Object Shape and Contact Pose
Machine Learning

[2602.19348] MultiDiffSense: Diffusion-Based Multi-Modal Visuo-Tactile Image Generation Conditioned on Object Shape and Contact Pose

The paper presents MultiDiffSense, a diffusion-based model for generating visuo-tactile images conditioned on object shape and contact po...

arXiv - AI · 3 min ·
[2602.19190] FUSAR-GPT : A Spatiotemporal Feature-Embedded and Two-Stage Decoupled Visual Language Model for SAR Imagery
Llms

[2602.19190] FUSAR-GPT : A Spatiotemporal Feature-Embedded and Two-Stage Decoupled Visual Language Model for SAR Imagery

FUSAR-GPT is a novel visual language model designed for interpreting SAR imagery, enhancing performance through spatiotemporal feature em...

arXiv - AI · 4 min ·
[2602.19177] Next Reply Prediction X Dataset: Linguistic Discrepancies in Naively Generated Content
Llms

[2602.19177] Next Reply Prediction X Dataset: Linguistic Discrepancies in Naively Generated Content

The paper introduces the Next Reply Prediction X Dataset, addressing linguistic discrepancies in content generated by Large Language Mode...

arXiv - AI · 3 min ·
[2602.18715] A Data-Driven Method to Map the Functional Organisation of Human Brain White Matter
Machine Learning

[2602.18715] A Data-Driven Method to Map the Functional Organisation of Human Brain White Matter

This article presents a data-driven method to map the functional organization of human brain white matter, integrating diffusion and func...

arXiv - Machine Learning · 4 min ·
[2602.19166] CosyAccent: Duration-Controllable Accent Normalization Using Source-Synthesis Training Data
Machine Learning

[2602.19166] CosyAccent: Duration-Controllable Accent Normalization Using Source-Synthesis Training Data

The paper presents CosyAccent, a novel approach to accent normalization that utilizes source-synthesis training data, enhancing naturalne...

arXiv - AI · 3 min ·
[2602.19153] Constrained Diffusion for Accelerated Structure Relaxation of Inorganic Solids with Point Defects
Generative Ai

[2602.19153] Constrained Diffusion for Accelerated Structure Relaxation of Inorganic Solids with Point Defects

This article presents a novel generative framework for simulating point defects in inorganic solids, enhancing structure relaxation proce...

arXiv - Machine Learning · 3 min ·
[2602.19115] How Do LLMs Encode Scientific Quality? An Empirical Study Using Monosemantic Features from Sparse Autoencoders
Llms

[2602.19115] How Do LLMs Encode Scientific Quality? An Empirical Study Using Monosemantic Features from Sparse Autoencoders

This paper investigates how large language models (LLMs) encode scientific quality using monosemantic features from sparse autoencoders, ...

arXiv - AI · 4 min ·
[2602.19101] Value Entanglement: Conflation Between Different Kinds of Good In (Some) Large Language Models
Llms

[2602.19101] Value Entanglement: Conflation Between Different Kinds of Good In (Some) Large Language Models

This paper investigates value entanglement in Large Language Models (LLMs), revealing how moral values influence grammatical and economic...

arXiv - AI · 3 min ·
[2602.18920] DeepInnovator: Triggering the Innovative Capabilities of LLMs
Llms

[2602.18920] DeepInnovator: Triggering the Innovative Capabilities of LLMs

DeepInnovator proposes a novel training framework to enhance the innovative capabilities of Large Language Models (LLMs) for scientific r...

arXiv - AI · 4 min ·
[2602.20126] Adaptation to Intrinsic Dependence in Diffusion Language Models
Llms

[2602.20126] Adaptation to Intrinsic Dependence in Diffusion Language Models

This article presents a novel unmasking schedule for diffusion language models (DLMs) that adapts to the intrinsic dependence of data dis...

arXiv - Machine Learning · 4 min ·
[2602.20070] Training-Free Generative Modeling via Kernelized Stochastic Interpolants
Machine Learning

[2602.20070] Training-Free Generative Modeling via Kernelized Stochastic Interpolants

This paper presents a novel kernel method for generative modeling that eliminates the need for training neural networks, utilizing linear...

arXiv - Machine Learning · 3 min ·
[2602.18891] Orchestrating LLM Agents for Scientific Research: A Pilot Study of Multiple Choice Question (MCQ) Generation and Evaluation
Llms

[2602.18891] Orchestrating LLM Agents for Scientific Research: A Pilot Study of Multiple Choice Question (MCQ) Generation and Evaluation

This pilot study explores the orchestration of LLM agents in scientific research, focusing on the generation and evaluation of multiple-c...

arXiv - AI · 4 min ·
[2602.18882] SceneTok: A Compressed, Diffusable Token Space for 3D Scenes
Computer Vision

[2602.18882] SceneTok: A Compressed, Diffusable Token Space for 3D Scenes

SceneTok introduces a novel tokenizer that compresses 3D scene representations into a set of diffusable tokens, achieving superior compre...

arXiv - Machine Learning · 3 min ·
[2602.18880] FOCA: Frequency-Oriented Cross-Domain Forgery Detection, Localization and Explanation via Multi-Modal Large Language Model
Llms

[2602.18880] FOCA: Frequency-Oriented Cross-Domain Forgery Detection, Localization and Explanation via Multi-Modal Large Language Model

The paper presents FOCA, a novel framework for detecting and localizing image forgery using a multi-modal large language model that integ...

arXiv - AI · 3 min ·
[2602.18874] Structure-Level Disentangled Diffusion for Few-Shot Chinese Font Generation
Generative Ai

[2602.18874] Structure-Level Disentangled Diffusion for Few-Shot Chinese Font Generation

This article presents the Structure-Level Disentangled Diffusion Model (SLD-Font) for few-shot Chinese font generation, enhancing style f...

arXiv - AI · 4 min ·
Previous Page 57 Next

Related Topics

Stay updated with AI News

Get the latest news, tools, and insights delivered to your inbox.

Daily or weekly digest • Unsubscribe anytime