[2602.17734] Five Fatal Assumptions: Why T-Shirt Sizing Systematically Fails for AI Projects
Summary
This paper critiques the T-shirt sizing estimation method in AI projects, highlighting five key assumptions that often lead to failure and proposing an alternative approach called Checkpoint Sizing.
Why It Matters
As AI projects become increasingly complex, traditional estimation methods like T-shirt sizing can mislead teams, resulting in project delays and inefficiencies. Understanding these pitfalls is crucial for engineering managers and product owners to improve planning and execution in AI initiatives.
Key Takeaways
- T-shirt sizing fails in AI due to non-linear performance and complex interactions.
- Five assumptions of T-shirt sizing are often invalid in AI contexts.
- Checkpoint Sizing offers a more iterative and human-centric approach.
- Engineering teams should reassess scope and feasibility throughout development.
- Awareness of these pitfalls can lead to more successful AI project outcomes.
Computer Science > Software Engineering arXiv:2602.17734 (cs) [Submitted on 18 Feb 2026] Title:Five Fatal Assumptions: Why T-Shirt Sizing Systematically Fails for AI Projects Authors:Raja Soundaramourty, Ozkan Kilic, Ramu Chenchaiah View a PDF of the paper titled Five Fatal Assumptions: Why T-Shirt Sizing Systematically Fails for AI Projects, by Raja Soundaramourty and 2 other authors View PDF HTML (experimental) Abstract:Agile estimation techniques, particularly T-shirt sizing, are widely used in software development for their simplicity and utility in scoping work. However, when we apply these methods to artificial intelligence initiatives -- especially those involving large language models (LLMs) and multi-agent systems -- the results can be systematically misleading. This paper shares an evidence-backed analysis of five foundational assumptions we often make during T-shirt sizing. While these assumptions usually hold true for traditional software, they tend to fail in AI contexts: (1) linear effort scaling, (2) repeatability from prior experience, (3) effort-duration fungibility, (4) task decomposability, and (5) deterministic completion criteria. Drawing on recent research into multi-agent system failures, scaling principles, and the inherent unreliability of multi-turn conversations, we show how AI development breaks these rules. We see this through non-linear performance jumps, complex interaction surfaces, and "tight coupling" where a small change in data cascades ...