Gimle Papers Working Paper

Learning Explicit Structure of Dynamical Systems

The dominant paradigm for learning dynamics is implicit: neural networks approximate $\dot{x} = f_\theta(x)$ as black boxes. We take the opposite approach — training a foundation model to discover explicit, typed, compositional circuit representations from observed trajectories alone.

Read the short paper (PDF) — full version with extended results coming soon.

The Problem: Implicit Learning Sacrifices Structure

Neural ODEs are flexible but opaque — classical methods are explicit but brittle

Neural ODEs, PINNs, and related architectures learn black-box approximations of dynamics. While flexible, this sacrifices the structural properties that make classical system representations useful: interpretability (black-box dynamics cannot be inspected or validated), composability (learned components cannot be combined with analytical models), verifiability (standard tools for stability analysis do not apply), and generalisation (implicit representations fail to transfer across operating regimes).

Classical methods (SINDy, symbolic regression) discover explicit structure but require substantial domain expertise, scale poorly to high-dimensional systems, and struggle with hybrid or stochastic dynamics.

What if we could learn explicit system representations?

Structured, typed, compositional — using the same scalable techniques that power modern foundation models.

A Typed Circuit Language for Dynamics

Circuits as the representation space — typed, compositional, executable

We represent dynamical systems as circuits in a traced monoidal category. Each circuit is a directed graph of typed computational elements composed via three combinators.

Every circuit has a degree type $(m \to n)$ specifying input and output wire counts. Composition requires matching degrees — the type system enforces well-formedness at parse time.

Equations written in standard notation (e.g., $\dot{x} = -cx + u$) compile automatically to circuits. The same circuit executes under three interchangeable calculi: stream calculus (deterministic ODEs), stochastic calculus (SDEs), and finite difference calculus (sequences).

Three combinators

Composition (sequential): $f \circ g$ — output of $f$ feeds input of $g$.

Monoidal product (parallel): $f \otimes g$ — independent, side-by-side execution.

Trace (feedback): $\mathrm{Tr}(f)$ — routes an output back to an input, enabling recurrence and differential equations.

Why circuits, not equations?

Circuits are compositional, typed, and executable. A controller and plant compose sequentially; independent subsystems combine in parallel. The type system catches malformed systems at parse time. And the same circuit runs under different mathematical interpretations without modification. You can read more about the representation in the Asgard paper.

A Foundation Model for Circuit Discovery

From observed trajectories to explicit circuit structure

Given observed trajectories from an unknown system, the model discovers the circuit that generated them. Unlike symbolic regression over equation strings, it operates directly in the space of typed circuit structures.

A transformer autoregressively generates tokenised circuit representations, conditioned on behavioural embeddings of the target trajectories. At each decoding step, an interactive LALR parser restricts the output to syntactically valid tokens — guaranteeing 100% valid circuits with no post-hoc filtering.

Grammar-constrained decoding

The key architectural choice: generation is grammar-constrained. An interactive parser tracks the parse state at each token and masks invalid continuations. Every generated sequence is a well-typed circuit by construction.

Structure ≠ parameters

Before scoring, the parameters of each predicted circuit are optimised via gradient descent against the target trajectory — made possible by the full differentiability of the simulator. This decouples structure search from parameter estimation.

Training Pipeline

Three phases — from imitation to self-play

1

Curriculum learning

Supervised next-token prediction on synthetic (circuit, trajectory) pairs, progressing from simple atomics through compositions to full dynamical systems with trace and register. A REINFORCE-style behavioural loss is blended in as complexity increases. Replay buffers prevent catastrophic forgetting across stages.

2

Reward-ranked fine-tuning (RAFT)

For each input, candidate circuits are sampled and scored by simulation fidelity. The best candidate becomes a supervised training target, iteratively sharpening the model toward high-reward circuits.

3

Self-play search

AlphaZero-style MCTS over token sequences, guided by learned policy and value heads. The model discovers circuits beyond what single-shot generation achieves.

Unlimited training data

Because the simulator can execute any well-typed circuit, we generate unlimited synthetic training data. The model proposes circuits, the simulator scores them, and the signal drives the next round of improvement.

Preliminary Results

A 23M-parameter model trained on a single laptop

We evaluate on a benchmark of circuit recovery tasks spanning three difficulty levels: algebraic (static compositions), first-order dynamics (exponential decay, growth, single register), and second-order dynamics (harmonic oscillators, spring systems, chained registers).

The model uses ~390k synthetic examples across 4 curriculum stages, 20k RAFT examples, and 12.5k self-play games. For scale: GPT-1 used ~120M parameters on ~1B tokens — this model operates at roughly 1/5 the parameters and 1/100 the data.

Phase Algebraic 1st-order 2nd-order
Curriculum 99% 60% 17%
+ RAFT 100% 74% 20%
+ Self-play 100% 79% 19%

Structure vs. parameters

After the model predicts a circuit, 50 steps of gradient descent optimise its parameters. The low optimised MSE (0.05–0.09) confirms the model finds structurally sound circuits even when initial parameters are imprecise. This separates what the model must learn (topology) from what the simulator handles (parameter fitting).

What Comes Next

This is the beginning — the full paper will extend in three directions

Scaling

The current 23M-parameter model is comparable to a small GPT-1. Emergent capabilities in language models appeared at 100–1000× this size. We are in the infancy of scale.

Richer dynamics

The circuit language already supports stochastic and hybrid systems via interchangeable calculi. Extending training to include SDEs, coupled systems, hybrid automata, and more transcendentals is a natural next step.

Formal verification

Explicit circuit representations enable stability analysis and formal verification. Verifiable properties can be incorporated as training objectives and at inference time to generate systems with guaranteed properties.

Explicit, not implicit.

Given oscillator trajectories, the model recovers $\ddot{x} + c\dot{x} + kx = 0$ rather than an opaque $\ddot{x} = \mathrm{NN}(x, \dot{x})$. The result is an interpretable equation that can be composed, verified, and understood.