[2602.13851] Evaluating LLM-Generated ACSL Annotations for Formal Verification

[2602.13851] Evaluating LLM-Generated ACSL Annotations for Formal Verification

arXiv - AI 3 min read Article

Summary

This paper evaluates the effectiveness of LLM-generated ACSL annotations for formal verification in C programs, comparing multiple generation systems and their performance in automated settings.

Why It Matters

As software systems become increasingly complex, ensuring their reliability through formal specifications is vital. This study sheds light on the capabilities and limitations of automated annotation generation, which can streamline verification processes and improve software dependability.

Key Takeaways

  • The study compares five different systems for generating ACSL annotations.
  • Automated generation of formal specifications can enhance software verification processes.
  • Results highlight both the strengths and weaknesses of LLMs in generating accurate annotations.

Computer Science > Software Engineering arXiv:2602.13851 (cs) [Submitted on 14 Feb 2026] Title:Evaluating LLM-Generated ACSL Annotations for Formal Verification Authors:Arshad Beg, Diarmuid O'Donoghue, Rosemary Monahan View a PDF of the paper titled Evaluating LLM-Generated ACSL Annotations for Formal Verification, by Arshad Beg and Diarmuid O'Donoghue and Rosemary Monahan View PDF HTML (experimental) Abstract:Formal specifications are crucial for building verifiable and dependable software systems, yet generating accurate and verifiable specifications for real-world C programs remains challenging. This paper empirically evaluates the extent to which formal-analysis tools can automatically generate and verify ACSL specifications without human or learning-based assistance. We conduct a controlled study on a recently released dataset of 506 C programs, repurposing it from interactive, developer-driven workflows to an automated evaluation setting. Five ACSL generation systems are compared: a rule-based Python script, Frama-C's RTE plugin, and three large language models--DeepSeek-V3.2, GPT-5.2, and OLMo 3.1 32B Instruct. All generated specifications are verified under identical conditions using the Frama-C WP plugin powered by multiple SMT solvers, allowing a direct comparison of annotation quality, solver sensitivity, and proof stability. Our results provide new empirical evidence on the capabilities and limitations of automated ACSL generation, complementing prior survey-ba...

Related Articles

Anthropic Teams Up With Its Rivals to Keep AI From Hacking Everything | WIRED
Llms

Anthropic Teams Up With Its Rivals to Keep AI From Hacking Everything | WIRED

The AI lab's Project Glasswing will bring together Apple, Google, and more than 45 other organizations. They'll use the new Claude Mythos...

Wired - AI · 7 min ·
Llms

The public needs to control AI-run infrastructure, labor, education, and governance— NOT private actors

A lot of discussion around AI is becoming siloed, and I think that is dangerous. People in AI-focused spaces often talk as if the only qu...

Reddit - Artificial Intelligence · 1 min ·
Llms

Agents that write their own code at runtime and vote on capabilities, no human in the loop

hollowOS just hit v4.4 and I added something that I haven’t seen anyone else do. Previous versions gave you an OS for agents: structured ...

Reddit - Artificial Intelligence · 1 min ·
Google Maps can now write captions for your photos using AI | TechCrunch
Llms

Google Maps can now write captions for your photos using AI | TechCrunch

Gemini can now create captions when users are looking to share a photo or video.

TechCrunch - AI · 4 min ·
More in Llms: This Week Guide Trending

No comments

No comments yet. Be the first to comment!

Stay updated with AI News

Get the latest news, tools, and insights delivered to your inbox.

Daily or weekly digest • Unsubscribe anytime