[2510.26818] GACA-DiT: Diffusion-based Dance-to-Music Generation with

[2510.26818] GACA-DiT: Diffusion-based Dance-to-Music Generation with Genre-Adaptive Rhythm and Context-Aware Alignment

arXiv - AI March 03, 2026 4 min read

About this article

Abstract page for arXiv paper 2510.26818: GACA-DiT: Diffusion-based Dance-to-Music Generation with Genre-Adaptive Rhythm and Context-Aware Alignment

Computer Science > Sound arXiv:2510.26818 (cs) [Submitted on 28 Oct 2025 (v1), last revised 2 Mar 2026 (this version, v2)] Title:GACA-DiT: Diffusion-based Dance-to-Music Generation with Genre-Adaptive Rhythm and Context-Aware Alignment Authors:Jinting Wang, Chenxing Li, Li Liu View a PDF of the paper titled GACA-DiT: Diffusion-based Dance-to-Music Generation with Genre-Adaptive Rhythm and Context-Aware Alignment, by Jinting Wang and 2 other authors View PDF HTML (experimental) Abstract:Dance-to-music (D2M) generation aims to automatically compose music that is rhythmically and temporally aligned with dance movements. Existing methods typically rely on coarse rhythm embeddings, such as global motion features or binarized joint-based rhythm values, which discard fine-grained motion cues and result in weak rhythmic alignment. Moreover, temporal mismatches introduced by feature downsampling further hinder precise synchronization between dance and music. To address these problems, we propose \textbf{GACA-DiT}, a diffusion transformer-based framework with two novel modules for rhythmically consistent and temporally aligned music generation. First, a \textbf{genre-adaptive rhythm extraction} module combines multi-scale temporal wavelet analysis and spatial phase histograms with adaptive joint weighting to capture fine-grained, genre-specific rhythm patterns. Second, a \textbf{context-aware temporal alignment} module resolves temporal mismatches using learnable context queries to ...

Originally published on March 03, 2026. Curated by AI News.

Generative Ai

[2602.08277] PISCO: Precise Video Instance Insertion with Sparse Control

Abstract page for arXiv paper 2602.08277: PISCO: Precise Video Instance Insertion with Sparse Control

arXiv - AI · 4 min · about 7 hours ago

Machine Learning

[2511.18746] Any4D: Open-Prompt 4D Generation from Natural Language and Images

Abstract page for arXiv paper 2511.18746: Any4D: Open-Prompt 4D Generation from Natural Language and Images

arXiv - AI · 4 min · about 7 hours ago

Llms

[2512.14549] Dual-objective Language Models: Training Efficiency Without Overfitting

Abstract page for arXiv paper 2512.14549: Dual-objective Language Models: Training Efficiency Without Overfitting

arXiv - AI · 3 min · about 7 hours ago

Llms

[2510.21011] Generating the Modal Worker: A Cross-Model Audit of Race and Gender in LLM-Generated Personas Across 41 Occupations

Abstract page for arXiv paper 2510.21011: Generating the Modal Worker: A Cross-Model Audit of Race and Gender in LLM-Generated Personas A...

arXiv - AI · 4 min · about 7 hours ago

[2510.26818] GACA-DiT: Diffusion-based Dance-to-Music Generation with Genre-Adaptive Rhythm and Context-Aware Alignment

About this article

Related Articles

[2602.08277] PISCO: Precise Video Instance Insertion with Sparse Control

[2511.18746] Any4D: Open-Prompt 4D Generation from Natural Language and Images

[2512.14549] Dual-objective Language Models: Training Efficiency Without Overfitting

[2510.21011] Generating the Modal Worker: A Cross-Model Audit of Race and Gender in LLM-Generated Personas Across 41 Occupations

No comments

Stay updated with AI News