[2602.18874] Structure-Level Disentangled Diffusion for Few-Shot Chinese Font Generation
Summary
This article presents the Structure-Level Disentangled Diffusion Model (SLD-Font) for few-shot Chinese font generation, enhancing style fidelity and content accuracy through improved disentanglement techniques.
Why It Matters
The research addresses a significant challenge in generative AI, particularly in the context of few-shot learning for font generation. By improving the disentanglement of content and style, this model could lead to more accurate and visually appealing font synthesis, which is crucial for applications in design and typography, especially for languages with complex characters like Chinese.
Key Takeaways
- SLD-Font improves few-shot Chinese font generation by separating content and style information.
- The model utilizes a cross-attention mechanism to integrate style features effectively.
- A Background Noise Removal module enhances the quality of generated characters.
- The proposed fine-tuning strategy allows for better adaptation to new styles without overfitting.
- Experimental results demonstrate superior style fidelity compared to existing methods.
Computer Science > Computer Vision and Pattern Recognition arXiv:2602.18874 (cs) [Submitted on 21 Feb 2026] Title:Structure-Level Disentangled Diffusion for Few-Shot Chinese Font Generation Authors:Jie Li, Suorong Yang, Jian Zhao, Furao Shen View a PDF of the paper titled Structure-Level Disentangled Diffusion for Few-Shot Chinese Font Generation, by Jie Li and 3 other authors View PDF HTML (experimental) Abstract:Few-shot Chinese font generation aims to synthesize new characters in a target style using only a handful of reference images. Achieving accurate content rendering and faithful style transfer requires effective disentanglement between content and style. However, existing approaches achieve only feature-level disentanglement, allowing the generator to re-entangle these features, leading to content distortion and degraded style fidelity. We propose the Structure-Level Disentangled Diffusion Model (SLD-Font), which receives content and style information from two separate channels. SimSun-style images are used as content templates and concatenated with noisy latent features as the input. Style features extracted by a CLIP model from target-style images are integrated via cross-attention. Additionally, we train a Background Noise Removal module in the pixel space to remove background noise in complex stroke regions. Based on theoretical validation of disentanglement effectiveness, we introduce a parameter-efficient fine-tuning strategy that updates only the style-rela...