[2507.03043] K-Function: Joint Pronunciation Transcription and Feedback for Evaluating Kids Language Function
Summary
The K-Function framework enhances children's language evaluation by integrating precise phoneme transcription with LLM-driven scoring, improving assessment accuracy significantly.
Why It Matters
This research addresses the challenges of evaluating children's language skills, particularly in the context of automatic speech recognition. By improving phoneme recognition and assessment frameworks, it opens avenues for scalable language screening, crucial for early childhood development.
Key Takeaways
- K-Function combines sub-word transcription with LLM scoring for children's language evaluation.
- The Kids-Weighted Finite State Transducer (K-WFST) achieves significant improvements in phoneme error rates.
- High-quality transcripts enable accurate grading of verbal skills and developmental milestones.
- The framework supports scalable language screening for children, enhancing early detection of language issues.
- Results align closely with human evaluators, validating the effectiveness of the approach.
Computer Science > Computation and Language arXiv:2507.03043 (cs) [Submitted on 3 Jul 2025 (v1), last revised 24 Feb 2026 (this version, v3)] Title:K-Function: Joint Pronunciation Transcription and Feedback for Evaluating Kids Language Function Authors:Shuhe Li, Chenxu Guo, Jiachen Lian, Cheol Jun Cho, Wenshuo Zhao, Xiner Xu, Ruiyu Jin, Xiaoyu Shi, Xuanru Zhou, Dingkun Zhou, Sam Wang, Grace Wang, Jingze Yang, Jingyi Xu, Ruohan Bao, Xingrui Chen, Elise Brenner, Brandon In, Francesca Pei, Maria Luisa Gorno-Tempini, Gopala Anumanchipalli View a PDF of the paper titled K-Function: Joint Pronunciation Transcription and Feedback for Evaluating Kids Language Function, by Shuhe Li and 20 other authors View PDF HTML (experimental) Abstract:Evaluating young children's language is challenging for automatic speech recognizers due to high-pitched voices, prolonged sounds, and limited data. We introduce K-Function, a framework that combines accurate sub-word transcription with objective, Large Language Model (LLM)-driven scoring. Its core, Kids-Weighted Finite State Transducer (K-WFST), merges an acoustic phoneme encoder with a phoneme-similarity model to capture child-specific speech errors while remaining fully interpretable. K-WFST achieves a 1.39 % phoneme error rate on MyST and 8.61 % on Multitudes-an absolute improvement of 10.47 % and 7.06 % over a greedy-search decoder. These high-quality transcripts are used by an LLM to grade verbal skills, developmental milestones, reading, a...