Llms Machine Learning Nlp Ai Agents Ai Startups Ai Safety

[2602.20213] CodeHacker: Automated Test Case Generation for Detecting Vulnerabilities in Competitive Programming Solutions

arXiv - AI February 25, 2026 3 min read Article

Summary

CodeHacker is an automated framework designed to generate test cases that identify vulnerabilities in competitive programming solutions, enhancing the evaluation of code generation models.

Why It Matters

As competitive programming increasingly relies on automated systems, ensuring the robustness of code submissions is critical. CodeHacker addresses gaps in existing benchmarks, improving the detection of vulnerabilities and enhancing the training data for AI models, which is essential for advancing software engineering practices.

Key Takeaways

CodeHacker generates targeted adversarial test cases to expose vulnerabilities.
It employs a multi-strategy approach, including stress testing and logic-specific targeting.
The Calibration Phase refines the agent's Validator and Checker for better accuracy.
CodeHacker improves the True Negative Rate (TNR) of existing datasets.
Generated adversarial cases enhance the performance of RL-trained models.

Computer Science > Software Engineering arXiv:2602.20213 (cs) [Submitted on 23 Feb 2026] Title:CodeHacker: Automated Test Case Generation for Detecting Vulnerabilities in Competitive Programming Solutions Authors:Jingwei Shi, Xinxiang Yin, Jing Huang, Jinman Zhao, Shengyu Tao View a PDF of the paper titled CodeHacker: Automated Test Case Generation for Detecting Vulnerabilities in Competitive Programming Solutions, by Jingwei Shi and 4 other authors View PDF HTML (experimental) Abstract:The evaluation of Large Language Models (LLMs) for code generation relies heavily on the quality and robustness of test cases. However, existing benchmarks often lack coverage for subtle corner cases, allowing incorrect solutions to pass. To bridge this gap, we propose CodeHacker, an automated agent framework dedicated to generating targeted adversarial test cases that expose latent vulnerabilities in program submissions. Mimicking the hack mechanism in competitive programming, CodeHacker employs a multi-strategy approach, including stress testing, anti-hash attacks, and logic-specific targeting to break specific code submissions. To ensure the validity and reliability of these attacks, we introduce a Calibration Phase, where the agent iteratively refines its own Validator and Checker via self-generated adversarial probes before evaluating contestant this http URL demonstrate that CodeHacker significantly improves the True Negative Rate (TNR) of existing datasets, effectively filtering out ...

Read Original Article