SyGra: The One-Stop Framework for Building Data for LLMs and SLMs
About this article
A Blog post by ServiceNow-AI on Hugging Face
Back to Articles SyGra: The One-Stop Framework for Building Data for LLMs and SLMs Enterprise Article Published September 22, 2025 Upvote 13 +7 Bidyapati Pradhan bidyapati Follow ServiceNow-AI Vipul Mittal vipulmitt Follow ServiceNow-AI Amit Kumar Saha amitsaha Follow ServiceNow-AI Surajit Dasgupta surajit Follow ServiceNow-AI When we think about building a model - be it a Large Language Model (LLM) or a Small Language Model (SLM) - the first thing we need is data. While a vast amount of open data is available, it rarely comes in the exact format required to train or align models. In practice, we often face scenarios where the raw data isn't enough. We need data that is more structured, domain-specific, complex, or aligned with the task at hand. Let's look at some common situations: Complex Scenarios Missing You start with a simple dataset, but the model fails on advanced reasoning tasks. How do you generate more complex datasets to strengthen performance? Knowledge Base to Q&A You already have a knowledge base, but it's not in Q&A format. How can you transform it into a usable question-answering dataset? From SFT to DPO You've prepared a supervised fine-tuning (SFT) dataset. But now you want to align your model using Direct Preference Optimization (DPO). How can you generate preference pairs? Depth of Questions You have a Q&A dataset, but the questions are shallow. How can you create in-depth, multi-turn, or reasoning-heavy questions? Domain-Specific Mid-Training You...