Gimle Papers Working Paper
Learning Explicit Structure of Dynamical Systems
The dominant paradigm for learning dynamics is implicit: neural networks approximate $\dot{x} = f_\theta(x)$ as black boxes. We take the opposite approach — training a foundation model to discover explicit, typed, compositional circuit representations from observed trajectories alone.
Read the short paper (PDF) — full version with extended results coming soon.
The Problem: Implicit Learning Sacrifices Structure
Neural ODEs are flexible but opaque — classical methods are explicit but brittle
Neural ODEs, PINNs, and related architectures learn black-box approximations of dynamics. While flexible, this sacrifices the structural properties that make classical system representations useful: interpretability (black-box dynamics cannot be inspected or validated), composability (learned components cannot be combined with analytical models), verifiability (standard tools for stability analysis do not apply), and generalisation (implicit representations fail to transfer across operating regimes).
Classical methods (SINDy, symbolic regression) discover explicit structure but require substantial domain expertise, scale poorly to high-dimensional systems, and struggle with hybrid or stochastic dynamics.
What if we could learn explicit system representations?
Structured, typed, compositional — using the same scalable techniques that power modern foundation models.
A Typed Circuit Language for Dynamics
Circuits as the representation space — typed, compositional, executable
We represent dynamical systems as circuits in a traced monoidal category. Each circuit is a directed graph of typed computational elements composed via three combinators.
Every circuit has a degree type $(m \to n)$ specifying input and output wire counts. Composition requires matching degrees — the type system enforces well-formedness at parse time.
Equations written in standard notation (e.g., $\dot{x} = -cx + u$) compile automatically to circuits. The same circuit executes under three interchangeable calculi: stream calculus (deterministic ODEs), stochastic calculus (SDEs), and finite difference calculus (sequences).
Three combinators
Composition (sequential): $f \circ g$ — output of $f$ feeds input of $g$.
Monoidal product (parallel): $f \otimes g$ — independent, side-by-side execution.
Trace (feedback): $\mathrm{Tr}(f)$ — routes an output back to an input, enabling recurrence and differential equations.
Why circuits, not equations?
Circuits are compositional, typed, and executable. A controller and plant compose sequentially; independent subsystems combine in parallel. The type system catches malformed systems at parse time. And the same circuit runs under different mathematical interpretations without modification. You can read more about the representation in the Asgard paper.
A Foundation Model for Circuit Discovery
From observed trajectories to explicit circuit structure
Given observed trajectories from an unknown system, the model discovers the circuit that generated them. Unlike symbolic regression over equation strings, it operates directly in the space of typed circuit structures.
A transformer autoregressively generates tokenised circuit representations, conditioned on behavioural embeddings of the target trajectories. At each decoding step, an interactive LALR parser restricts the output to syntactically valid tokens — guaranteeing 100% valid circuits with no post-hoc filtering.
Grammar-constrained decoding
The key architectural choice: generation is grammar-constrained. An interactive parser tracks the parse state at each token and masks invalid continuations. Every generated sequence is a well-typed circuit by construction.
Structure ≠ parameters
Before scoring, the parameters of each predicted circuit are optimised via gradient descent against the target trajectory — made possible by the full differentiability of the simulator. This decouples structure search from parameter estimation.
Training Pipeline
Three phases — from imitation to self-play
Curriculum learning
Supervised next-token prediction on synthetic (circuit, trajectory) pairs, progressing from simple atomics through compositions to full dynamical systems with trace and register. A REINFORCE-style behavioural loss is blended in as complexity increases. Replay buffers prevent catastrophic forgetting across stages.
Reward-ranked fine-tuning (RAFT)
For each input, candidate circuits are sampled and scored by simulation fidelity. The best candidate becomes a supervised training target, iteratively sharpening the model toward high-reward circuits.
Self-play search
AlphaZero-style MCTS over token sequences, guided by learned policy and value heads. The model discovers circuits beyond what single-shot generation achieves.
Unlimited training data
Because the simulator can execute any well-typed circuit, we generate unlimited synthetic training data. The model proposes circuits, the simulator scores them, and the signal drives the next round of improvement.
Preliminary Results
A 23M-parameter model trained on a single laptop
We evaluate on a benchmark of circuit recovery tasks spanning three difficulty levels: algebraic (static compositions), first-order dynamics (exponential decay, growth, single register), and second-order dynamics (harmonic oscillators, spring systems, chained registers).
The model uses ~390k synthetic examples across 4 curriculum stages, 20k RAFT examples, and 12.5k self-play games. For scale: GPT-1 used ~120M parameters on ~1B tokens — this model operates at roughly 1/5 the parameters and 1/100 the data.
| Phase | Algebraic | 1st-order | 2nd-order |
|---|---|---|---|
| Curriculum | 99% | 60% | 17% |
| + RAFT | 100% | 74% | 20% |
| + Self-play | 100% | 79% | 19% |
Structure vs. parameters
After the model predicts a circuit, 50 steps of gradient descent optimise its parameters. The low optimised MSE (0.05–0.09) confirms the model finds structurally sound circuits even when initial parameters are imprecise. This separates what the model must learn (topology) from what the simulator handles (parameter fitting).
What Comes Next
This is the beginning — the full paper will extend in three directions
Scaling
The current 23M-parameter model is comparable to a small GPT-1. Emergent capabilities in language models appeared at 100–1000× this size. We are in the infancy of scale.
Richer dynamics
The circuit language already supports stochastic and hybrid systems via interchangeable calculi. Extending training to include SDEs, coupled systems, hybrid automata, and more transcendentals is a natural next step.
Formal verification
Explicit circuit representations enable stability analysis and formal verification. Verifiable properties can be incorporated as training objectives and at inference time to generate systems with guaranteed properties.
Explicit, not implicit.
Given oscillator trajectories, the model recovers $\ddot{x} + c\dot{x} + kx = 0$ rather than an opaque $\ddot{x} = \mathrm{NN}(x, \dot{x})$. The result is an interpretable equation that can be composed, verified, and understood.