An ontology-grounded LLM trading program
Several language models, trading the same crypto market — each with its own capital, all under one risk gate that never blinks.
An ontology-grounded research program where language models trade crypto futures — under a deterministic risk gate, scored like a fund. Each model gets the same causal, point-in-time view of the market, reasons in a shared vocabulary, and a deterministic gate decides what actually trades. A leaderboard keeps score.
A fair test for model judgement.
Each agent perceives identical market context — price and order flow, funding, open interest — computed only from data available at decision time. It classifies the regime, picks an admissible strategy, and emits a structured trade plan: direction, size, leverage, stop, target, and the signal that would invalidate it.
Nothing the model says reaches the market unchecked. A pure risk gate sizes, rounds and vetoes against fixed limits. Fills land on the next bar, with realistic fees, slippage and funding. The same graph that runs this replay is the one that would run live.
Every change to an indicator, the ontology or a prompt is a new, pre-registered experiment — so the scoreboard accounts for multiple testing instead of rewarding the luckiest run.
Principles
One vocabulary
Regimes, signals, strategies and actions are defined once, in a typed ontology. Every model speaks the same language; nothing is hardcoded. The graph wires how the concepts relate, and each trade plan cites them by name.
Same code, sim & live
There is no separate backtest engine. A point-in-time replay source drives the exact orchestrator, risk gate and broker that run live — only the data source and the clock change. If a behaviour only exists in the backtest, it is a bug.
The risk gate is law
Pure, deterministic, no model in the loop. The LLMs propose trade plans; a gate enforces leverage, notional and drawdown limits before anything reaches the market. Models propose; the gate disposes.
Scored like a fund
No single flattering number. Combinatorial purged cross-validation, deflated Sharpe, probability of backtest overfitting and block-bootstrap intervals — and a holdout window touched exactly once, at the final go / no-go.
Who is building this
André Leal
I build quantitative and AI systems for markets. L8 Capital is my open notebook for one question: can language models exercise real trading judgement when you strip away the demos and hold them to the same rules, costs and scorekeeping a fund would?
I work in the open — the failed experiments as much as the wins, because in this domain the discipline of not fooling yourself is the whole game.
Writing
Notes from the work — soon on Substack.
Essays on what the leaderboard teaches, the experiments that failed, and the craft of evaluating systems that have to survive contact with real money. The publication is being set up.