[2602.16720] APEX-SQL: Talking to the data via Agentic Exploration for Text-to-SQL
Summary
The paper presents APEX-SQL, a novel framework for Text-to-SQL that enhances interaction with complex databases through agentic exploration, improving accuracy and efficiency in SQL generation.
Why It Matters
APEX-SQL addresses the limitations of existing Text-to-SQL systems by integrating a hypothesis-verification loop and logical planning, making it particularly relevant for enterprises dealing with large and complex datasets. This advancement could significantly enhance data accessibility and analysis capabilities.
Key Takeaways
- APEX-SQL improves SQL generation accuracy through agentic exploration.
- The framework reduces semantic ambiguity in complex database environments.
- Experiments show APEX-SQL outperforms existing models in execution accuracy.
- Ablation studies confirm the importance of each component in the framework.
- The deterministic mechanism aids in effective data exploration and hypothesis refinement.
Computer Science > Databases arXiv:2602.16720 (cs) [Submitted on 11 Feb 2026] Title:APEX-SQL: Talking to the data via Agentic Exploration for Text-to-SQL Authors:Bowen Cao, Weibin Liao, Yushi Sun, Dong Fang, Haitao Li, Wai Lam View a PDF of the paper titled APEX-SQL: Talking to the data via Agentic Exploration for Text-to-SQL, by Bowen Cao and 5 other authors View PDF HTML (experimental) Abstract:Text-to-SQL systems powered by Large Language Models have excelled on academic benchmarks but struggle in complex enterprise environments. The primary limitation lies in their reliance on static schema representations, which fails to resolve semantic ambiguity and scale effectively to large, complex databases. To address this, we propose APEX-SQL, an Agentic Text-to-SQL Framework that shifts the paradigm from passive translation to agentic exploration. Our framework employs a hypothesis-verification loop to ground model reasoning in real data. In the schema linking phase, we use logical planning to verbalize hypotheses, dual-pathway pruning to reduce the search space, and parallel data profiling to validate column roles against real data, followed by global synthesis to ensure topological connectivity. For SQL generation, we introduce a deterministic mechanism to retrieve exploration directives, allowing the agent to effectively explore data distributions, refine hypotheses, and generate semantically accurate SQLs. Experiments on BIRD (70.65% execution accuracy) and Spider 2.0-Sno...