[2603.22942] Optimizing Small Language Models for NL2SQL via Chain-of-Thought Fine-Tuning
About this article
Abstract page for arXiv paper 2603.22942: Optimizing Small Language Models for NL2SQL via Chain-of-Thought Fine-Tuning
Computer Science > Artificial Intelligence arXiv:2603.22942 (cs) [Submitted on 24 Mar 2026] Title:Optimizing Small Language Models for NL2SQL via Chain-of-Thought Fine-Tuning Authors:Anshul Solanki, Sanchit Latawa, Koushik Chakraborty, Navneet Kamboj View a PDF of the paper titled Optimizing Small Language Models for NL2SQL via Chain-of-Thought Fine-Tuning, by Anshul Solanki and 3 other authors View PDF HTML (experimental) Abstract:Translating Natural Language to SQL (NL2SQL) remains a critical bottleneck for democratization of data in enterprises. Although Large Language Models (LLMs) like Gemini 2.5 and other LLMs have demonstrated impressive zero-shot capabilities, their high inference costs limit deployment at scale. This paper explores the efficacy of fine-tuning both large and small language models on NL2SQL tasks. Our research reveals a counter-intuitive scaling phenomenon. Fine-tuning large models (Gemini 2.5 Flash/Lite) on standard datasets yields negligible returns, often leading to overfitting on complex queries. Conversely, small models (Qwen) show significant gains. Fine-tuning improved the small model baseline from 36% to 45%, and further enriching the dataset with explicit Chain-of-Thought (CoT) reasoning surged accuracy to 54.5%(Fig 2). While this is still lower than the accuracy of large models like Gemini 2.5 , it does serve the business goal of significant cost reduction, latency in inference time and also meeting the business critical performance accura...