Hugin: A State Machine Framework for Agentic Reasoning

Based on a four-part blog series on state machines for multi-agent systems. For more details on Hugin see the Hugin website and the Hugin GitHub repo.

Most agent frameworks treat LLMs as agents. Hugin treats them as oracles — one component in a larger reasoning system. The framework is built around a simple insight: if you want agents that reason well over long horizons, you need explicit structure around how they reason, not just what they reason about.

The result is a state machine architecture where every interaction is an immutable entry on a stack. This makes branching, debugging, replay, and multi-agent coordination natural rather than bolted on.

The problem with current agent frameworks

The dominant approach to building agents is to wrap an LLM in a loop: prompt, get response, execute tools, repeat. This works for short tasks but breaks down for longer-running, creative reasoning:

No memory of process — the agent has no structured record of what it tried and why
No backtracking — if a reasoning path fails, the only option is to start over or rely on the LLM to self-correct
Opaque debugging — when something goes wrong three steps deep in a multi-agent chain, good luck finding out why
Brittle orchestration — coordinating multiple agents requires ad-hoc message passing with no formal guarantees

The state machine

Hugin models every agent as a state machine with a well-defined lifecycle. An agent moves through states — receiving a task, consulting the oracle (LLM), executing tools, calling sub-agents — and every transition produces an immutable interaction pushed onto a stack.

The stack is the complete, ordered history of everything the agent has done. Because it's immutable, every state is preserved — you can inspect any point, replay from any point, or branch from any point.

Three building blocks

Configuration

How the agent behaves. Tools, model, system prompt, templates — all dynamically rendered at each step, so the agent's capabilities can change mid-task.

Task

What the agent does. The initial prompt and parameters. Tasks can chain: one task's result becomes the next task's input.

Stack

What the agent has done. The immutable history of interactions. The context window is re-rendered from the stack at every LLM call.

This separation is key. The LLM never manages its own state — the framework does. The LLM is consulted as an oracle: given the current stack (rendered as context), what should we do next? This makes the system far more predictable and debuggable than approaches where the LLM is expected to self-manage.

Steering agents

The state machine gives us precise control over how agents reason, without constraining what they reason about.

Sub-agents

An agent can spawn sub-agents for specialised subtasks. The parent waits, receives the result, and continues. The sub-agent gets its own stack, its own configuration, its own context window.

Dynamic configuration

An agent's tools, prompts, and even model can change at runtime. A planning phase might use one set of tools; execution uses another. The configuration is re-evaluated at each step.

Self-reflection

After completing a task, an agent can review its own stack — a complete record of its reasoning — and produce a critique. Sub-agent reflection takes this further: a second agent reviews the first agent's work.

Task chaining

Tasks can be composed into pipelines. Extract → analyse → summarise, where each step's output flows into the next. Different configurations for different phases of the same workflow.

Branching: parallel exploration

The most powerful consequence of the stack architecture is branching. At any point in an agent's reasoning, you can create a branch — a copy of the stack that diverges from that point. The branch shares all history up to the fork, then explores independently.

This is not just convenience — it's a fundamentally different approach to reasoning. Instead of committing to a single path and hoping the LLM self-corrects, you can explore multiple hypotheses in parallel, evaluate each, and select the best. The architecture supports this natively because branching is just stack manipulation.

Improving reasoning at test time

The blog series' final instalment addresses the question: once you have this architecture, how do you make agents reason better?

Local vs global reasoning. Most agent frameworks optimise locally — making each individual step as good as possible. But the quality of the final output depends on the sequence of steps: the global reasoning trajectory. A locally optimal step can lead to a globally poor outcome.

Evaluators. The framework supports pluggable evaluators — heuristic-based, LLM-as-a-judge, or learned — that score agent outputs. These evaluators enable:

Rejection sampling — generate multiple outputs, keep the best
Monte Carlo rollouts — branch from the current state, roll out several completions, evaluate each, and pick the most promising path to continue
Comparative ranking — present pairs of outputs to a judge and build a preference ordering

This is structurally analogous to how AlphaGo combines a policy network (the LLM choosing actions) with a value network (the evaluator scoring positions) and tree search (branching and rollout). The stack architecture makes this natural: each rollout is just a branch, and branches are cheap.

Artifacts and memory

For long-running tasks, the context window is not enough. Hugin provides two memory mechanisms:

Dynamic context — the stack, re-rendered at each LLM call. This is short-term memory: what happened in this task.

Artifacts — a persistent store the agent can write to and query. Insights, intermediate results, and findings that should survive beyond the current context window. Agents save artifacts explicitly via tools, and query them by semantic search. This is long-term memory: what the agent has learned.

The separation matters. Dynamic context is automatic and complete but bounded by the context window. Artifacts are selective and persistent but require the agent to decide what's worth remembering.

Tool execution

Agents interact with the world through tools. The state machine treats tool calls as first-class transitions — the agent requests a tool call, the framework executes it, and the result is pushed onto the stack.

Tools receive the full stack as context, giving them access to the agent's history, environment, and shared state. This means tools can be context-aware without the agent having to explicitly pass information.

Why this matters for Gimle

Hugin was built because reasoning about dynamical systems requires exactly the capabilities this architecture provides. Analysing a system's stability is not a single-step task — it requires exploring multiple approaches, backtracking from dead ends, and synthesising results across different analyses. The state machine makes this structured rather than ad-hoc.

Combined with Asgard (the symbolic-numeric computing layer) and Mimir (the foundation model for dynamical systems), Hugin provides the agentic reasoning layer of the Gimle stack — the part that decides what to do with the formal tools available.