[2602.20502] ActionEngine: From Reactive to Programmatic GUI Agents via State Machine Memory

[2602.20502] ActionEngine: From Reactive to Programmatic GUI Agents via State Machine Memory

arXiv - Machine Learning 4 min read Article

Summary

The paper presents ActionEngine, a novel framework that enhances GUI agents by transitioning from reactive execution to programmatic planning using state machine memory, improving efficiency and accuracy in task execution.

Why It Matters

This research addresses the limitations of current GUI agents, which often suffer from high costs and latency. By introducing a two-agent architecture, ActionEngine significantly improves task success rates and reduces operational costs, making it relevant for developers and researchers in AI and machine learning.

Key Takeaways

  • ActionEngine improves GUI agent efficiency by using state machine memory.
  • The framework achieves a 95% task success rate with reduced costs and latency.
  • It employs a Crawling Agent for memory construction and an Execution Agent for task execution.
  • Robustness is ensured through vision-based re-grounding to handle interface changes.
  • The approach combines programmatic planning with localized action validation.

Computer Science > Artificial Intelligence arXiv:2602.20502 (cs) [Submitted on 24 Feb 2026] Title:ActionEngine: From Reactive to Programmatic GUI Agents via State Machine Memory Authors:Hongbin Zhong, Fazle Faisal, Luis França, Tanakorn Leesatapornwongsa, Adriana Szekeres, Kexin Rong, Suman Nath View a PDF of the paper titled ActionEngine: From Reactive to Programmatic GUI Agents via State Machine Memory, by Hongbin Zhong and 6 other authors View PDF HTML (experimental) Abstract:Existing Graphical User Interface (GUI) agents operate through step-by-step calls to vision language models--taking a screenshot, reasoning about the next action, executing it, then repeating on the new page--resulting in high costs and latency that scale with the number of reasoning steps, and limited accuracy due to no persistent memory of previously visited pages. We propose ActionEngine, a training-free framework that transitions from reactive execution to programmatic planning through a novel two-agent architecture: a Crawling Agent that constructs an updatable state-machine memory of the GUIs through offline exploration, and an Execution Agent that leverages this memory to synthesize complete, executable Python programs for online task execution. To ensure robustness against evolving interfaces, execution failures trigger a vision-based re-grounding fallback that repairs the failed action and updates the memory. This design drastically improves both efficiency and accuracy: on Reddit tasks fr...

Related Articles

Llms

What if Claude purposefully made its own code leakable so that it would get leaked

What if Claude leaked itself by socially and architecturally engineering itself to be leaked by a dumb human submitted by /u/smurfcsgoawp...

Reddit - Artificial Intelligence · 1 min ·
Llms

Observer-Embedded Reality

Observer-Embedded Reality Consciousness, Complexity, Meaning, and the Limits of Human Knowledge A Conceptual Philosophy-of-Science Paper ...

Reddit - Artificial Intelligence · 1 min ·
Llms

I think we’re about to have a new kind of “SEO”… and nobody is talking about it.

More people are asking ChatGPT things like: “what’s the best CRM?” “is this tool worth it?” “alternatives to X” And they just… trust the ...

Reddit - Artificial Intelligence · 1 min ·
Llms

Why would Claude give me the same response over and over and give others different replies?

I asked Claude to "generate me a random word" so I could do some word play. Then I asked it again in a new prompt window on desktop after...

Reddit - Artificial Intelligence · 1 min ·
More in Llms: This Week Guide Trending

No comments

No comments yet. Be the first to comment!

Stay updated with AI News

Get the latest news, tools, and insights delivered to your inbox.

Daily or weekly digest • Unsubscribe anytime