Hugin: A State Machine Framework for Agentic Reasoning

Documentation

Based on a four-part blog series on state machines for multi-agent systems. For more details on Hugin see the Hugin website and the Hugin GitHub repo.

Most agent frameworks treat LLMs as agents. Hugin treats them as oracles — one component in a larger reasoning system. The framework is built around a simple insight: if you want agents that reason well over long horizons, you need explicit structure around how they reason, not just what they reason about.

The result is a state machine architecture where every interaction is an immutable entry on a stack. This makes branching, debugging, replay, and multi-agent coordination natural rather than bolted on.

The problem with current agent frameworks

The dominant approach to building agents is to wrap an LLM in a loop: prompt, get response, execute tools, repeat. This works for short tasks but breaks down for longer-running, creative reasoning:

No memory of process — the agent has no structured record of what it tried and why
No backtracking — if a reasoning path fails, the only option is to start over or rely on the LLM to self-correct
Opaque debugging — when something goes wrong three steps deep in a multi-agent chain, good luck finding out why
Brittle orchestration — coordinating multiple agents requires ad-hoc message passing with no formal guarantees

The state machine

Hugin models every agent as a state machine with a well-defined lifecycle. An agent moves through states — receiving a task, consulting the oracle (LLM), executing tools, calling sub-agents, asking a human — and every transition produces an immutable interaction pushed onto a stack.

Each interaction has a type that captures what happened:

TaskDefinition / TaskResult — the start and end of a task
AskOracle / OracleResponse — consulting the LLM and receiving its response
ToolCall / ToolResult — executing a tool and capturing what it returned
AskHuman / HumanResponse — requesting and receiving human input
AgentCall / AgentResult — spawning a sub-agent and receiving its output
TaskChain — transitioning to a new task in a pipeline

The stack is the complete, ordered history of everything the agent has done. Because it's immutable, every state is preserved — you can inspect any point, replay from any point, or branch from any point.

Three building blocks

An agent in Hugin combines three elements:

Configuration

How the agent behaves. Tools, model, system prompt, templates — all dynamically rendered at each step, so the agent's capabilities can change mid-task. Configurations can define state machines with transitions, giving agents different tools and prompts in different phases.

Task

What the agent does. The initial prompt and parameters. Tasks can chain into pipelines: one task's result becomes the next task's input, with different configurations for each phase.

Stack

What the agent has done. The immutable history of interactions. The context window is re-rendered from the stack at every LLM call — the agent never manages its own memory.

This separation is key. The LLM never manages its own state — the framework does. The LLM is consulted as an oracle: given the current stack (rendered as context), what should we do next? This makes the system far more predictable and debuggable than approaches where the LLM is expected to self-manage.

Tools

Agents interact with the world through tools. The state machine treats tool calls as first-class transitions — the agent requests a tool call, the framework executes it, and the result is pushed onto the stack.

Every tool receives the full stack as its first parameter, giving it access to the agent's history, environment, shared state, and storage. This means tools can be deeply context-aware without the agent needing to pass information explicitly.

Hugin ships with built-in tools that cover the core capabilities:

finish — complete the current task with a result
save_insight / query_artifacts / get_artifact_content — long-term memory (more on this below)
ask_human — pause and request human input
create_branch — fork the stack for parallel exploration
call_agent — spawn a sub-agent for a specialised subtask

Custom tools are straightforward: a Python function plus a YAML definition. The function receives the stack and any parameters; the YAML describes the tool's name, description, and parameter schema. Tools can also chain deterministically via a next_tool mechanism — useful for pipelines where one tool's output always feeds into another.

Steering agents

The state machine gives us precise control over how agents reason, without constraining what they reason about.

Configuration state machines

An agent's configuration can itself be a state machine. A planning phase might use one set of tools and a reasoning-optimised model; execution switches to a different tool set and a faster model. Transitions between states can be triggered by the agent or by the framework.

Task chaining

Tasks compose into pipelines. Extract → analyse → summarise, where each step's result flows to the next via pass_result_as. Each step can use a different configuration, so a research task chains into a writing task with different tools and prompts.

Self-reflection

After completing a task, an agent can review its own stack — a complete record of its reasoning — and produce a critique. Sub-agent reflection takes this further: a second agent reviews the first agent's work with fresh context and different instructions.

Human-in-the-loop

The ask_human tool pauses the agent and requests input. The human response is pushed onto the stack as a HumanResponse interaction — part of the permanent record. This supports approval gates, clarification requests, and collaborative reasoning without breaking the state machine model.

Multi-agent coordination

A session in Hugin can host multiple agents, each with its own stack, configuration, and task. Agents coordinate through two mechanisms:

Sub-agents. An agent can spawn a sub-agent via call_agent. The parent pauses, the sub-agent runs to completion with its own stack and configuration, and the result is returned as an AgentResult interaction on the parent's stack. The sub-agent is fully isolated — its own context window, its own tools, its own model.

Shared state. Agents in the same session can communicate through namespaces — key-value stores with fine-grained access control. A producer agent writes findings to a namespace; a consumer agent reads them. This decouples agents without requiring direct message passing, and the access control prevents unintended interference.

Branching: parallel exploration

The most powerful consequence of the stack architecture is branching. At any point in an agent's reasoning, you can create a branch — a copy of the stack that diverges from that point. The branch shares all history up to the fork, then explores independently.

This is not just convenience — it's a fundamentally different approach to reasoning. Instead of committing to a single path and hoping the LLM self-corrects, you can explore multiple hypotheses in parallel, evaluate each, and select the best. The architecture supports this natively because branching is just stack manipulation.

Memory

For long-running tasks, the context window is not enough. Hugin provides two memory mechanisms:

Dynamic context — the stack, re-rendered at each LLM call. This is short-term memory: what happened in this task. It's automatic and complete, but bounded by the context window.

Artifacts — a persistent store the agent writes to and queries via three built-in tools: save_insight stores a finding with metadata; query_artifacts retrieves relevant artifacts by semantic search; get_artifact_content fetches a specific artifact in full. Artifacts include quality ratings and feedback, so the agent can assess the reliability of what it recalls.

The separation matters. Dynamic context captures everything but forgets at the context boundary. Artifacts are selective — the agent must decide what's worth remembering — but persist indefinitely.

Improving reasoning at test time

Once you have this architecture, how do you make agents reason better?

Local vs global reasoning. Most agent frameworks optimise locally — making each individual step as good as possible. But the quality of the final output depends on the sequence of steps: the global reasoning trajectory. A locally optimal step can lead to a globally poor outcome.

Evaluators. The framework supports pluggable evaluators — heuristic-based, LLM-as-a-judge, or learned — that score agent outputs. These evaluators enable:

Rejection sampling — generate multiple outputs, keep the best
Monte Carlo rollouts — branch from the current state, roll out several completions, evaluate each, and pick the most promising path to continue
Comparative ranking — present pairs of outputs to a judge and build a preference ordering

This is structurally analogous to how AlphaGo combines a policy network (the LLM choosing actions) with a value network (the evaluator scoring positions) and tree search (branching and rollout). The stack architecture makes this natural: each rollout is just a branch, and branches are cheap.

Debugging and monitoring

Because every interaction is on the stack, debugging is qualitatively different from traditional agent frameworks. Hugin provides three interfaces:

Web monitor (hugin run --monitor) — a dashboard showing the stack in real time, with the ability to inspect any interaction
Terminal UI (hugin run -i) — an interactive TUI for watching and intervening in agent runs
Replay — rewind to any point in the stack and re-run from there, with different configuration if needed

This means when something goes wrong, you can see exactly what the agent saw, what it decided, and why — then replay from before the mistake with adjusted parameters.

Why this matters for Gimle

Hugin was built because reasoning about dynamical systems requires exactly the capabilities this architecture provides. Analysing a system's stability is not a single-step task — it requires exploring multiple approaches, backtracking from dead ends, and synthesising results across different analyses. The state machine makes this structured rather than ad-hoc.

Combined with Asgard (the symbolic-numeric computing layer) and Mimir (the foundation model for dynamical systems), Hugin provides the agentic reasoning layer of the Gimle stack — the part that decides what to do with the formal tools available.