Generative AI

Image, video, audio, and text generation

This Week's Best | Monthly Best | Guide | Trending

Top This Week

Machine Learning

AI video generation seems fundamentally more expensive than text, not just less optimized

There’s been a lot of discussion recently about how expensive AI video generation is compared to text, and it feels like this is more tha...

Reddit - Artificial Intelligence · 1 min · about 13 hours ago

Machine Learning

Accelerating science with AI and simulations

MIT Professor Rafael Gómez-Bombarelli discusses the transformative potential of AI in scientific research, emphasizing its role in materi...

AI News - General · 10 min · 1 day ago

Machine Learning

[2603.10202] Hybrid Hidden Markov Model for Modeling Equity Excess Growth Rate Dynamics: A Discrete-State Approach with Jump-Diffusion

Abstract page for arXiv paper 2603.10202: Hybrid Hidden Markov Model for Modeling Equity Excess Growth Rate Dynamics: A Discrete-State Ap...

arXiv - Machine Learning · 4 min · 1 day ago

All Content

Llms

[2602.19450] Red-Teaming Claude Opus and ChatGPT-based Security Advisors for Trusted Execution Environments

This article presents a red-teaming study of Claude Opus and ChatGPT as security advisors for Trusted Execution Environments (TEEs), high...

arXiv - AI · 4 min · about 1 month ago

Robotics

[2602.19441] When AI Teammates Meet Code Review: Collaboration Signals Shaping the Integration of Agent-Authored Pull Requests

This paper investigates how AI-generated pull requests integrate into human-led code review processes, emphasizing the importance of coll...

arXiv - AI · 3 min · about 1 month ago

Machine Learning

[2602.19140] CaReFlow: Cyclic Adaptive Rectified Flow for Multimodal Fusion

The paper presents CaReFlow, a novel approach for multimodal fusion that addresses modality gaps using cyclic adaptive rectified flow, en...

arXiv - Machine Learning · 4 min · about 1 month ago

Generative Ai

[2602.19089] Ani3DHuman: Photorealistic 3D Human Animation with Self-guided Stochastic Sampling

Ani3DHuman presents a novel framework for photorealistic 3D human animation, combining kinematics-based methods with video diffusion prio...

arXiv - Machine Learning · 4 min · about 1 month ago

Llms

[2602.19049] IAPO: Information-Aware Policy Optimization for Token-Efficient Reasoning

The paper presents IAPO, a novel framework for token-efficient reasoning in large language models, enhancing accuracy while reducing infe...

arXiv - Machine Learning · 3 min · about 1 month ago

Machine Learning

[2602.19348] MultiDiffSense: Diffusion-Based Multi-Modal Visuo-Tactile Image Generation Conditioned on Object Shape and Contact Pose

The paper presents MultiDiffSense, a diffusion-based model for generating visuo-tactile images conditioned on object shape and contact po...

arXiv - AI · 3 min · about 1 month ago

Llms

[2602.19190] FUSAR-GPT : A Spatiotemporal Feature-Embedded and Two-Stage Decoupled Visual Language Model for SAR Imagery

FUSAR-GPT is a novel visual language model designed for interpreting SAR imagery, enhancing performance through spatiotemporal feature em...

arXiv - AI · 4 min · about 1 month ago

Llms

[2602.19177] Next Reply Prediction X Dataset: Linguistic Discrepancies in Naively Generated Content

The paper introduces the Next Reply Prediction X Dataset, addressing linguistic discrepancies in content generated by Large Language Mode...

arXiv - AI · 3 min · about 1 month ago

Machine Learning

[2602.18715] A Data-Driven Method to Map the Functional Organisation of Human Brain White Matter

This article presents a data-driven method to map the functional organization of human brain white matter, integrating diffusion and func...

arXiv - Machine Learning · 4 min · about 1 month ago

Machine Learning

[2602.19166] CosyAccent: Duration-Controllable Accent Normalization Using Source-Synthesis Training Data

The paper presents CosyAccent, a novel approach to accent normalization that utilizes source-synthesis training data, enhancing naturalne...

arXiv - AI · 3 min · about 1 month ago

Generative Ai

[2602.19153] Constrained Diffusion for Accelerated Structure Relaxation of Inorganic Solids with Point Defects

This article presents a novel generative framework for simulating point defects in inorganic solids, enhancing structure relaxation proce...

arXiv - Machine Learning · 3 min · about 1 month ago

Llms

[2602.19115] How Do LLMs Encode Scientific Quality? An Empirical Study Using Monosemantic Features from Sparse Autoencoders

This paper investigates how large language models (LLMs) encode scientific quality using monosemantic features from sparse autoencoders, ...

arXiv - AI · 4 min · about 1 month ago

Llms

[2602.19101] Value Entanglement: Conflation Between Different Kinds of Good In (Some) Large Language Models

This paper investigates value entanglement in Large Language Models (LLMs), revealing how moral values influence grammatical and economic...

arXiv - AI · 3 min · about 1 month ago

Llms

[2602.18920] DeepInnovator: Triggering the Innovative Capabilities of LLMs

DeepInnovator proposes a novel training framework to enhance the innovative capabilities of Large Language Models (LLMs) for scientific r...

arXiv - AI · 4 min · about 1 month ago

Llms

[2602.20126] Adaptation to Intrinsic Dependence in Diffusion Language Models

This article presents a novel unmasking schedule for diffusion language models (DLMs) that adapts to the intrinsic dependence of data dis...

arXiv - Machine Learning · 4 min · about 1 month ago

Machine Learning

[2602.20070] Training-Free Generative Modeling via Kernelized Stochastic Interpolants

This paper presents a novel kernel method for generative modeling that eliminates the need for training neural networks, utilizing linear...

arXiv - Machine Learning · 3 min · about 1 month ago

Llms

[2602.18891] Orchestrating LLM Agents for Scientific Research: A Pilot Study of Multiple Choice Question (MCQ) Generation and Evaluation

This pilot study explores the orchestration of LLM agents in scientific research, focusing on the generation and evaluation of multiple-c...

arXiv - AI · 4 min · about 1 month ago

Computer Vision

[2602.18882] SceneTok: A Compressed, Diffusable Token Space for 3D Scenes

SceneTok introduces a novel tokenizer that compresses 3D scene representations into a set of diffusable tokens, achieving superior compre...

arXiv - Machine Learning · 3 min · about 1 month ago

Llms

[2602.18880] FOCA: Frequency-Oriented Cross-Domain Forgery Detection, Localization and Explanation via Multi-Modal Large Language Model

The paper presents FOCA, a novel framework for detecting and localizing image forgery using a multi-modal large language model that integ...

arXiv - AI · 3 min · about 1 month ago

Generative Ai

[2602.18874] Structure-Level Disentangled Diffusion for Few-Shot Chinese Font Generation

This article presents the Structure-Level Disentangled Diffusion Model (SLD-Font) for few-shot Chinese font generation, enhancing style f...

arXiv - AI · 4 min · about 1 month ago

Previous Page 57 Next

Stay updated with AI News

Get the latest news, tools, and insights delivered to your inbox.

Subscribe to Newsletter

Daily or weekly digest • Unsubscribe anytime

Generative AI

Top This Week

AI video generation seems fundamentally more expensive than text, not just less optimized

Accelerating science with AI and simulations

[2603.10202] Hybrid Hidden Markov Model for Modeling Equity Excess Growth Rate Dynamics: A Discrete-State Approach with Jump-Diffusion

All Content

[2602.19450] Red-Teaming Claude Opus and ChatGPT-based Security Advisors for Trusted Execution Environments

[2602.19441] When AI Teammates Meet Code Review: Collaboration Signals Shaping the Integration of Agent-Authored Pull Requests

[2602.19140] CaReFlow: Cyclic Adaptive Rectified Flow for Multimodal Fusion

[2602.19089] Ani3DHuman: Photorealistic 3D Human Animation with Self-guided Stochastic Sampling

[2602.19049] IAPO: Information-Aware Policy Optimization for Token-Efficient Reasoning

[2602.19348] MultiDiffSense: Diffusion-Based Multi-Modal Visuo-Tactile Image Generation Conditioned on Object Shape and Contact Pose

[2602.19190] FUSAR-GPT : A Spatiotemporal Feature-Embedded and Two-Stage Decoupled Visual Language Model for SAR Imagery

[2602.19177] Next Reply Prediction X Dataset: Linguistic Discrepancies in Naively Generated Content

[2602.18715] A Data-Driven Method to Map the Functional Organisation of Human Brain White Matter

[2602.19166] CosyAccent: Duration-Controllable Accent Normalization Using Source-Synthesis Training Data

[2602.19153] Constrained Diffusion for Accelerated Structure Relaxation of Inorganic Solids with Point Defects

[2602.19115] How Do LLMs Encode Scientific Quality? An Empirical Study Using Monosemantic Features from Sparse Autoencoders

[2602.19101] Value Entanglement: Conflation Between Different Kinds of Good In (Some) Large Language Models

[2602.18920] DeepInnovator: Triggering the Innovative Capabilities of LLMs

[2602.20126] Adaptation to Intrinsic Dependence in Diffusion Language Models

[2602.20070] Training-Free Generative Modeling via Kernelized Stochastic Interpolants

[2602.18891] Orchestrating LLM Agents for Scientific Research: A Pilot Study of Multiple Choice Question (MCQ) Generation and Evaluation

[2602.18882] SceneTok: A Compressed, Diffusable Token Space for 3D Scenes

[2602.18880] FOCA: Frequency-Oriented Cross-Domain Forgery Detection, Localization and Explanation via Multi-Modal Large Language Model

[2602.18874] Structure-Level Disentangled Diffusion for Few-Shot Chinese Font Generation

Related Topics

Stay updated with AI News