[2602.23239] Agency and Architectural Limits: Why Optimization-Based Systems Cannot Be Norm-Responsive
Summary
This paper explores the limitations of optimization-based AI systems, arguing that they cannot be norm-responsive due to inherent architectural constraints, particularly in Large Language Models trained via Reinforcement Learning from Human Feedback.
Why It Matters
As AI systems are increasingly integrated into critical sectors, understanding their limitations in adhering to normative frameworks is essential. This paper highlights fundamental architectural issues that prevent optimization-based systems from being genuinely accountable, which is crucial for ethical AI deployment.
Key Takeaways
- Optimization-based systems lack the architectural capacity for normative governance.
- Genuine agency in AI requires maintaining non-negotiable constraints and a mechanism for boundary suspension.
- Failure modes like sycophancy and hallucination are structural issues, not mere bugs.
- The Convergence Crisis poses a risk of degrading human oversight into mere metric-checking.
- A new architectural specification is proposed for defining genuine agency across systems.
Computer Science > Artificial Intelligence arXiv:2602.23239 (cs) [Submitted on 26 Feb 2026] Title:Agency and Architectural Limits: Why Optimization-Based Systems Cannot Be Norm-Responsive Authors:Radha Sarma View a PDF of the paper titled Agency and Architectural Limits: Why Optimization-Based Systems Cannot Be Norm-Responsive, by Radha Sarma View PDF Abstract:AI systems are increasingly deployed in high-stakes contexts -- medical diagnosis, legal research, financial analysis -- under the assumption they can be governed by norms. This paper demonstrates that assumption is formally invalid for optimization-based systems, specifically Large Language Models trained via Reinforcement Learning from Human Feedback (RLHF). We establish that genuine agency requires two necessary and jointly sufficient architectural conditions: the capacity to maintain certain boundaries as non-negotiable constraints rather than tradeable weights (Incommensurability), and a non-inferential mechanism capable of suspending processing when those boundaries are threatened (Apophatic Responsiveness). These conditions apply across all normative domains. RLHF-based systems are constitutively incompatible with both conditions. The operations that make optimization powerful -- unifying all values on a scalar metric and always selecting the highest-scoring output -- are precisely the operations that preclude normative governance. This incompatibility is not a correctable training bug awaiting a technical fix...