[R] Forced Depth Consideration Reduces Type II Errors in LLM Self-Classification: Evidence from an Exploration Prompting Ablation Study - (200 trap prompts, 4 models, 8 Step-0 variants) [R]
LLM-Based task classifier tend to misroute prompts that look simple at first glance, but require deeper understanding - I call it "Type I...