[2603.15159] To See is Not to Master: Teaching LLMs to Use Private Libraries for Code Generation
About this article
Abstract page for arXiv paper 2603.15159: To See is Not to Master: Teaching LLMs to Use Private Libraries for Code Generation
Computer Science > Software Engineering arXiv:2603.15159 (cs) [Submitted on 16 Mar 2026 (v1), last revised 27 Mar 2026 (this version, v4)] Title:To See is Not to Master: Teaching LLMs to Use Private Libraries for Code Generation Authors:Yitong Zhang, Chengze Li, Ruize Chen, Guowei Yang, Xiaoran Jia, Yijie Ren, Jia Li View a PDF of the paper titled To See is Not to Master: Teaching LLMs to Use Private Libraries for Code Generation, by Yitong Zhang and 6 other authors View PDF HTML (experimental) Abstract:Large Language Models (LLMs) have shown strong potential for code generation, yet they remain limited in private-library-oriented code generation, where the goal is to generate code using APIs from private libraries. Existing approaches mainly rely on retrieving private-library API documentation and injecting relevant knowledge into the context at inference time. However, our study shows that this is insufficient: even given accurate required knowledge, LLMs still struggle to invoke private-library APIs effectively. To address this limitation, we propose PriCoder, an approach that teaches LLMs to invoke private-library APIs through automatically synthesized data. Specifically, PriCoder models private-library data synthesis as the construction of a graph, and alternates between two graph operators: (1) Progressive Graph Evolution, which improves data diversity by progressively synthesizing more diverse training samples from basic ones, and (2) Multidimensional Graph Pruning,...