The year 2025 marks a major turning point for generative artificial intelligence and automation: the advent of AI Agents with varying degrees of autonomy. These AI agents, based on LLMs, function almost independently, capable of making and executing decisions in real-time, often with a human "in the loop."
However, placing these text-generating LLMs at the heart of agents that we want to be truly reliable and predictable proves to be a journey fraught with challenges. Faced with this difficulty of mastering LLM behavior for specific tasks, different approaches are proposed, notably "scaffolding" and the use of finite state machines (FSM).
The "scaffolding" technique, proposed by Flavien Chervet, aims to guide the LLM through a series of detailed instructions, somewhat like a script for an actor (who nevertheless has a certain level of possible improvisation).
In parallel, the ProseCode for GPTs project, explained in the article on Hackster.io by the author, uses a finite state machine (FSM) to define the different phases of interaction for the educational chatbot Geppetto_Duino.
These two methods fundamentally rely on the idea that the behavior of an LLM can be significantly better determined and controlled by precise, structured instructions, written with syntax inspired by programming languages (a kind of pseudo-code, or #ProseCode).
These initiatives have laid the groundwork for better control, but they are not without limitations. Agents based on OpenAI's GPT assistants, for example, are constrained by a maximum configuration prompt size of 8.000 characters. This restriction inevitably limits the complexity of the FSM automation descriptions or scaffolding that can be integrated. Additionally, an overly long configuration prompt tends to reduce the LLM's "attention capacity, " increasing its propensity to deviate from the planned scenario, which undermines the trust that can be placed in the agent.
It should be emphasized that this tendency toward "improvisation" is inherent to the very principle of the algorithms at the core of LLMs. Functioning as statistical language models, their behavior is intrinsically non-deterministic. Despite clear instructions, these models can stop mid-task, omit information, or make unexpected decisions. The Scaffolding and FSM approaches, while offering notable improvement, remain vulnerable to these divergences related to the model's "temperature" and possible loss of attention within the context window.
Faced with these fundamental limitations, a more robust and deterministic approach is necessary. I propose here the use of a finite state machine (FSM) rigorously defined in a YAML format file: by externalizing the automaton logic in a structured format readable by the machine, we free ourselves from prompt size constraints and establish an external and immutable source of truth for the agent's behavior. Moreover, the YAML format is easy for humans to understand and write.
This method ensures that the automaton logic no longer relies on the potentially fluctuating "contextual memory" of the LLM, but rather on the systematic rereading of the current FSM state description via a Python script executed by the GPT within its "sandbox" (the "Advanced Data Analysis" function).
The explicit consultation of the YAML file at each evolution of the FSM automaton state minimizes the influence of the LLM's temperature and also its potential loss of attention. This considerably strengthens the predictability of the AI agent's behavior and thus its fidelity to the defined scenario. The other advantage is to allow far exceeding the 8000 character limit that restricted the description of an FSM in the configuration prompt.
An example: for the prototype described below, the FSM script in YAML was generated by Claude.ai from my instructions given in natural language... One can easily extrapolate that this type of "AI agent - FSM automaton" will be able to write its own YAML script to solve a given problem. The good old principle of "bootstrapping"...
To concretely illustrate and demonstrate the concept (POC) of this approach, I have developed a prototype that performs a simple sequential example mission: drafting an email.
This prototype is based on a finite state machine (FSM) rigorously defined in a YAML file. Each state of the FSM represents a specific step in the email creation process (for example, defining the recipient, writing the subject, composing the body of the message). The logic of transition between these states is also specified in the YAML file, thus establishing a predictable course for the task.
This GPT can be tested here: https://chatgpt.com/g/g-67f5170435bc8191ab7b5cc6840b7cd4-fsm-based-ai-agent
Operation:The execution of the FSM automaton is driven by Python scripts. At each state transition of the FSM, the Python script performs a systematic rereading of the description of the current state in the YAML file. This explicit consultation ensures that the agent's logic does not rely on the LLM's context memory.
The script then interacts with the LLM, providing it with instructions specific to the current state, based on information extracted from the YAML. For example, in the "write subject" state, the script might ask the LLM to generate several subject proposals based on the defined context.
This simple prototype aims to prove the concept of a more deterministic control technique for LLMs for sequential tasks, paving the way for more deterministic and reliable AI agents.
See the prototype in action in this video:
Sources :
- Flavien Chervet "HACKER CHATGPT POUR EN FAIRE UN AGENT AUTONOME"
- JEAN NOEL LEFEBVRE Prosecode for GPTs
- JEAN NOEL LEFEBVRE How I Created Geppetto_Duino: A GPTs for Arduino learning
- JEAN NOEL LEFEBVRE le LLM : un programme informatique… qui n’en fait qu'à sa tête ?!
- Le 5 ème Jour 𝗦𝗰𝗮𝗳𝗳𝗼𝗹𝗱𝗶𝗻𝗴 : 𝗹𝗮 𝗰𝗹𝗲́ 𝗽𝗼𝘂𝗿 𝘁𝗿𝗮𝗻𝘀𝗳𝗼𝗿𝗺𝗲𝗿 𝗖𝗵𝗮𝘁𝗚𝗣𝗧 𝗲𝗻 𝗔𝗴𝗲𝗻𝘁 𝗜𝗔 𝗮𝘂𝘁𝗼𝗻𝗼𝗺𝗲
Comments
Please log in or sign up to comment.