[2602.20213] CodeHacker: Automated Test Case Generation for Detecting Vulnerabilities in Competitive Programming Solutions

[2602.20213] CodeHacker: Automated Test Case Generation for Detecting Vulnerabilities in Competitive Programming Solutions

arXiv - AI 3 min read Article

Summary

CodeHacker is an automated framework designed to generate test cases that identify vulnerabilities in competitive programming solutions, enhancing the evaluation of code generation models.

Why It Matters

As competitive programming increasingly relies on automated systems, ensuring the robustness of code submissions is critical. CodeHacker addresses gaps in existing benchmarks, improving the detection of vulnerabilities and enhancing the training data for AI models, which is essential for advancing software engineering practices.

Key Takeaways

  • CodeHacker generates targeted adversarial test cases to expose vulnerabilities.
  • It employs a multi-strategy approach, including stress testing and logic-specific targeting.
  • The Calibration Phase refines the agent's Validator and Checker for better accuracy.
  • CodeHacker improves the True Negative Rate (TNR) of existing datasets.
  • Generated adversarial cases enhance the performance of RL-trained models.

Computer Science > Software Engineering arXiv:2602.20213 (cs) [Submitted on 23 Feb 2026] Title:CodeHacker: Automated Test Case Generation for Detecting Vulnerabilities in Competitive Programming Solutions Authors:Jingwei Shi, Xinxiang Yin, Jing Huang, Jinman Zhao, Shengyu Tao View a PDF of the paper titled CodeHacker: Automated Test Case Generation for Detecting Vulnerabilities in Competitive Programming Solutions, by Jingwei Shi and 4 other authors View PDF HTML (experimental) Abstract:The evaluation of Large Language Models (LLMs) for code generation relies heavily on the quality and robustness of test cases. However, existing benchmarks often lack coverage for subtle corner cases, allowing incorrect solutions to pass. To bridge this gap, we propose CodeHacker, an automated agent framework dedicated to generating targeted adversarial test cases that expose latent vulnerabilities in program submissions. Mimicking the hack mechanism in competitive programming, CodeHacker employs a multi-strategy approach, including stress testing, anti-hash attacks, and logic-specific targeting to break specific code submissions. To ensure the validity and reliability of these attacks, we introduce a Calibration Phase, where the agent iteratively refines its own Validator and Checker via self-generated adversarial probes before evaluating contestant this http URL demonstrate that CodeHacker significantly improves the True Negative Rate (TNR) of existing datasets, effectively filtering out ...

Related Articles

Anthropic’s Unreleased Claude Mythos Might Be The Most Advanced AI Model Yet
Llms

Anthropic’s Unreleased Claude Mythos Might Be The Most Advanced AI Model Yet

Anthropic is testing an unreleased artificial intelligence (AI) model with capabilities that exceed any system it has previously released...

AI Tools & Products · 5 min ·
Anthropic leaks part of Claude Code's internal source code
Llms

Anthropic leaks part of Claude Code's internal source code

Claude Code has seen massive adoption over the last year, and its run-rate revenue had swelled to more than $2.5 billion as of February.

AI Tools & Products · 3 min ·
Australian government and Anthropic sign MOU for AI safety and research
Llms

Australian government and Anthropic sign MOU for AI safety and research

Anthropic is an AI safety and research company that's working to build reliable, interpretable, and steerable AI systems.

AI Tools & Products · 5 min ·
Penguin to sue OpenAI over ChatGPT version of German children’s book
Llms

Penguin to sue OpenAI over ChatGPT version of German children’s book

Publisher alleges AI research company’s chatbot violated its copyright over Coconut the Little Dragon series

AI Tools & Products · 3 min ·
More in Llms: This Week Guide Trending

No comments

No comments yet. Be the first to comment!

Stay updated with AI News

Get the latest news, tools, and insights delivered to your inbox.

Daily or weekly digest • Unsubscribe anytime