[2602.21814] Prompt Architecture Determines Reasoning Quality: A Variable Isolation Study on the Car Wash Problem
Summary
This study investigates how different prompt architectures affect reasoning quality in large language models, specifically addressing the 'car wash problem' and demonstrating significant improvements in accuracy through structured reasoning frameworks.
Why It Matters
Understanding how prompt architecture influences reasoning quality is crucial for enhancing the performance of AI systems in complex tasks. This research provides insights into effective strategies for improving AI reasoning capabilities, which can have broad implications for applications in natural language processing and beyond.
Key Takeaways
- The STAR reasoning framework significantly improves accuracy from 0% to 85%.
- Incorporating user profile context can further enhance accuracy by 10%.
- Structured reasoning is more critical than context injection for implicit constraint reasoning tasks.
- The study utilized a controlled experimental design with 120 trials across six conditions.
- Achieving 100% accuracy in the full-stack condition highlights the potential of structured approaches.
Computer Science > Artificial Intelligence arXiv:2602.21814 (cs) [Submitted on 25 Feb 2026] Title:Prompt Architecture Determines Reasoning Quality: A Variable Isolation Study on the Car Wash Problem Authors:Heejin Jo View a PDF of the paper titled Prompt Architecture Determines Reasoning Quality: A Variable Isolation Study on the Car Wash Problem, by Heejin Jo View PDF HTML (experimental) Abstract:Large language models consistently fail the "car wash problem," a viral reasoning benchmark requiring implicit physical constraint inference. We present a variable isolation study (n=20 per condition, 6 conditions, 120 total trials) examining which prompt architecture layers in a production system enable correct reasoning. Using Claude 3.5 Sonnet with controlled hyperparameters (temperature 0.7, top_p 1.0), we find that the STAR (Situation-Task-Action-Result) reasoning framework alone raises accuracy from 0% to 85% (p=0.001, Fisher's exact test, odds ratio 13.22). Adding user profile context via vector database retrieval provides a further 10 percentage point gain, while RAG context contributes an additional 5 percentage points, achieving 100% accuracy in the full-stack condition. These results suggest that structured reasoning scaffolds -- specifically, forced goal articulation before inference -- matter substantially more than context injection for implicit constraint reasoning tasks. Comments: Subjects: Artificial Intelligence (cs.AI); Computation and Language (cs.CL) Cite as:...