Llms Machine Learning Nlp Ai Safety Data Science

[2602.15983] ReLoop: Structured Modeling and Behavioral Verification for Reliable LLM-Based Optimization

arXiv - Machine Learning February 19, 2026 4 min read Article

Summary

ReLoop introduces a structured approach to improve the reliability of LLM-generated optimization code by addressing silent failures through structured generation and behavioral verification.

Why It Matters

As large language models (LLMs) are increasingly used in optimization tasks, ensuring the correctness of generated code is crucial. ReLoop's dual approach mitigates risks associated with silent failures, enhancing the reliability of LLM applications in complex problem-solving scenarios.

Key Takeaways

ReLoop enhances LLM-generated optimization code reliability.
Structured generation decomposes code production into four stages.
Behavioral verification tests formulations against solver perturbations.
Correctness rates improved from 22.6% to 31.1% with ReLoop.
RetailOpt-190 dataset released for testing LLM performance on compositional problems.

Computer Science > Software Engineering arXiv:2602.15983 (cs) [Submitted on 17 Feb 2026] Title:ReLoop: Structured Modeling and Behavioral Verification for Reliable LLM-Based Optimization Authors:Junbo Jacob Lian, Yujun Sun, Huiling Chen, Chaoyu Zhang, Chung-Piaw Teo View a PDF of the paper titled ReLoop: Structured Modeling and Behavioral Verification for Reliable LLM-Based Optimization, by Junbo Jacob Lian and 4 other authors View PDF HTML (experimental) Abstract:Large language models (LLMs) can translate natural language into optimization code, but silent failures pose a critical risk: code that executes and returns solver-feasible solutions may encode semantically incorrect formulations, creating a feasibility-correctness gap of up to 90 percentage points on compositional problems. We introduce ReLoop, addressing silent failures from two complementary directions. Structured generation decomposes code production into a four-stage reasoning chain (understand, formalize, synthesize, verify) that mirrors expert modeling practice, with explicit variable-type reasoning and self-verification to prevent formulation errors at their source. Behavioral verification detects errors that survive generation by testing whether the formulation responds correctly to solver-based parameter perturbation, without requiring ground truth -- an external semantic signal that bypasses the self-consistency problem inherent in LLM-based code review. The two mechanisms are complementary: structured...

Read Original Article