[2602.03022] STAR: Similarity-guided Teacher-Assisted Refinement for Super-Tiny Function Calling Models
Summary
The paper presents STAR, a novel framework for transferring capabilities from large language models to super-tiny function calling models, addressing issues like overfitting and training instability.
Why It Matters
As AI continues to evolve, the ability to deploy efficient models that maintain high performance is crucial. STAR's approach enables the development of smaller AI agents that can perform complex tasks, making advanced AI more accessible and practical for various applications.
Key Takeaways
- STAR introduces Constrained Knowledge Distillation (CKD) for stable training of tiny models.
- Similarity-guided RL (Sim-RL) enhances policy optimization through fine-grained reward signals.
- The framework achieves state-of-the-art performance for models under 1B parameters.
- Extensive experiments validate STAR's effectiveness on challenging benchmarks.
- The approach paves the way for more efficient and accessible AI applications.
Computer Science > Artificial Intelligence arXiv:2602.03022 (cs) [Submitted on 3 Feb 2026 (v1), last revised 24 Feb 2026 (this version, v2)] Title:STAR: Similarity-guided Teacher-Assisted Refinement for Super-Tiny Function Calling Models Authors:Jiliang Ni, Jiachen Pu, Zhongyi Yang, Jingfeng Luo, Conggang Hu View a PDF of the paper titled STAR: Similarity-guided Teacher-Assisted Refinement for Super-Tiny Function Calling Models, by Jiliang Ni and 4 other authors View PDF HTML (experimental) Abstract:The proliferation of Large Language Models (LLMs) in function calling is pivotal for creating advanced AI agents, yet their large scale hinders widespread adoption, necessitating transferring their capabilities into smaller ones. However, existing paradigms are often plagued by overfitting, training instability, ineffective binary rewards for multi-solution tasks, and the difficulty of synergizing techniques. We introduce STAR: Similarity-guided Teacher-Assisted Refinement, a novel holistic framework that effectively transfers LLMs' capabilities to super-tiny models. STAR consists of two core technical innovations: (1) Constrained Knowledge Distillation (CKD), a training objective that augments top-k forward KL divergence to suppress confidently incorrect predictions, ensuring training stability while preserving exploration capacity for downstream RL. STAR holistically synergizes these strategies within a cohesive training curriculum, enabling super-tiny models to achieve excep...