How Active Inference Trading Works

A neuroscience-inspired approach to algorithmic trading using probabilistic reasoning and belief updating.

1Active Inference Overview 2The POMDP Framework 3How Trades Happen 4Model Training 5Understanding the Metrics

Active Inference Overview

What is Active Inference?

Active Inference is a theoretical framework from neuroscience, derived from Karl Friston's Free Energy Principle (FEP). It proposes that all biological agents (including brains) work by minimizing "surprise" - the difference between what they expect and what they observe.

// The Free Energy Principle

Minimize: F = Complexity - Accuracy

// Where:

// - Complexity = deviation from prior beliefs

// - Accuracy = how well predictions match observations

How It Applies to Trading

In trading, the agent (our model) tries to predict future market states and takes actions (trades) that both:

1.Exploit - Take profitable positions based on confident beliefs
2.Explore - Gather information to reduce uncertainty (epistemic value)

The model balances profit-seeking with information-gathering through Expected Free Energy (EFE) minimization, weighted by an epistemic parameter (EPISTEMIC_WEIGHT). This is fundamentally different from reinforcement learning which only optimizes for reward.

Why Different from Traditional Quant Methods?

Traditional ML/RL

-Maximize reward/return only
-Prone to overfitting
-Black box predictions
-Confidence often poorly calibrated

Active Inference

+Balances exploitation AND exploration
+Principled uncertainty quantification
+Interpretable belief states
+Knows when it doesn't know

Technical Specification

Concrete implementation details for technical due diligence. All values correspond to parameters in engine.py (2,393 lines) using pymdp v0.0.7.1.

Model Architecture

Framework: pymdp v0.0.7.1 (POMDP)
Modalities: 5 (Return, P&L, Momentum, Volatility, Volume Shock)
Hidden states: 5 (bearish to bullish discrete bins)
Actions: 5 (strong sell, sell, flat, buy, strong buy)
Active models: 250+ models across 60+ tickers in 9 behavioral finance categories

Learning Parameters

A[0] concentration: 20x (learnable via Dirichlet updates)
A[1-4] concentration: 1000x (frozen domain knowledge)
Forgetting rate (omega): 0.992 per step
Learning rate (eta): 0.08 default
Policy precision (gamma): 5.5 default

Trade Execution

Dual confidence gate (probability + entropy)
Half-Kelly position sizing (KELLY_FRACTION=0.5)
Loss aversion lambda: 2.25x (Prospect Theory)
Transaction cost: 10 bps per trade

Verification

A-matrix snapshots stored daily in D1
Model files HMAC-SHA256 signed
Matrix evolution visible on model detail pages
Sharpe RF rate: 0.000191/day (5% annual / 252 days)

See It In Action

Visit any model's AI Learning tab to see real A-matrix evolution data updated daily. Each model independently learns from market observations — you can verify this by comparing matrix snapshots across dates.

View Live Models

The POMDP Framework

Partially Observable Markov Decision Process

A POMDP is a mathematical framework for decision-making under uncertainty. Unlike MDPs where the state is fully known, in POMDPs the agent must infer the hidden state from noisy observations - perfect for financial markets where the true "regime" is never directly visible.


       POMDP Structure for Trading
       ============================

    ┌─────────────────────────────────────────────────────────┐
    │                    HIDDEN STATES                         │
    │         (Market Regimes x Position States)               │
    │                                                          │
    │   ┌─────────┐   ┌─────────┐   ┌─────────┐   ┌─────────┐ │
    │   │ Bullish │   │Neutral-U│   │Neutral-D│   │ Bearish │ │
    │   └────┬────┘   └────┬────┘   └────┬────┘   └────┬────┘ │
    └────────┼─────────────┼─────────────┼─────────────┼──────┘
             │             │             │             │
             ▼             ▼             ▼             ▼
    ┌─────────────────────────────────────────────────────────┐
    │                    OBSERVATIONS                          │
    │              (What the model "sees")                     │
    │                                                          │
    │  ┌────────┐ ┌────────┐ ┌────────┐ ┌────────┐ ┌────────┐ │
    │  │ Return │ │  P&L   │ │Momentum│ │  Vol   │ │VolShock│ │
    │  │ (4bin) │ │ (4bin) │ │ (3bin) │ │ (3bin) │ │ (2bin) │ │
    │  └────────┘ └────────┘ └────────┘ └────────┘ └────────┘ │
    │                                                          │
    │      Total: 4 x 4 x 3 x 3 x 2 = 288 observation combos   │
    └─────────────────────────────────────────────────────────┘
             │
             ▼
    ┌─────────────────────────────────────────────────────────┐
    │                     ACTIONS                              │
    │              (Position Changes)                          │
    │                                                          │
    │          ┌──────┐  ┌──────┐  ┌──────┐                   │
    │          │ LONG │  │ FLAT │  │ SHORT│                   │
    │          └──────┘  └──────┘  └──────┘                   │
    └─────────────────────────────────────────────────────────┘

The 5 Core Matrices (A, B, C, D, E)

The POMDP is defined by five matrices that encode the model's beliefs about the world:

AObservation Model (Likelihood)

Maps hidden states to observations. "If the market is bullish, what returns do I expect to see?"

Shape: (5 modalities) x (observations) x (states) x (positions)

Learnable: Only modality 0 (returns) updates online - others frozen at high concentration

BTransition Model (Dynamics)

How states evolve given actions. "If I go long in a bullish market, what happens next?"

Shape: (factors) x (states) x (states) x (actions)

Contains market persistence (~90%) and position execution uncertainty

CPreference Model (Utility)

Encodes what outcomes the agent prefers. "Profits are good, losses are bad (especially big ones)."

Shape: (modalities) x (observations)

Loss aversion (Lambda ~2.25x) makes losses hurt more than equivalent gains

DInitial State Prior

Prior belief about starting states. "At the start, I believe the market is probably neutral."

Shape: (factors) x (states)

Flat position bias (~80%) - model prefers starting flat

EPolicy Prior (Habits)

Prior preference over policies. "All else equal, prefer staying in current position."

Shape: (num_policies)

Policy switch penalty (~0.70) discourages excessive trading

Market States (4)

BullishStrong uptrend

Neutral-UpWeak upward bias

Neutral-DownWeak downward bias

BearishStrong downtrend

Position States (3)

+1LongBetting price rises

0FlatNo position

-1ShortBetting price falls

Total hidden state space: 4 x 3 = 12 states

How Trades Happen

Confidence Gating: The Dual Threshold System

The model only executes trades when it passes two independent confidence gates. This prevents trading on weak signals.

Probability GateCONF_P = 0.28

0%28% threshold100%

The highest probability action must exceed 28%. "Am I confident in my best choice?"

Entropy GateCONF_H = 0.92 x max

0 (certain)92% thresholdmax (random)

Policy entropy must be below 92% of maximum. "How spread out is my uncertainty?"

// Trade execution logic

max(policy_probs) >= 0.28 // probability gate

entropy(policy) <= 0.92 * max_entropy // entropy gate

then

EXECUTE_TRADE()

else

STAY_FLAT() // indecision = no action

Position Sizing: Kelly Criterion

Once a trade signal passes both gates, the model uses the Kelly Criterion to determine position size - the mathematically optimal fraction of capital to bet.

// Kelly Criterion formula

kelly_fraction = (win_rate * avg_win - (1 - win_rate) * avg_loss) / avg_win

// Half-Kelly for safety (reduces volatility)

position_size = kelly_fraction * 0.5

We use Half-Kelly which reduces position sizes by 50% compared to theoretical optimum. This sacrifices some expected return for significantly reduced volatility and drawdown risk.

Loss Aversion: Behavioral Finance Integration

Drawing from Kahneman and Tversky's Prospect Theory, the model weights losses more heavily than gains. This reflects how real traders and investors actually behave - and helps preserve capital.

+$100

Gain utility

+100 utils

-$100

Loss disutility

-225 utils

Lambda = 2.25x

With LOSS_AVERSION = 2.25, a $100 loss feels 2.25x worse than a $100 gain feels good. This asymmetry makes the model naturally risk-averse and helps avoid catastrophic drawdowns.

Model Training

Training the Generative Model

Unlike neural networks that learn through backpropagation, Active Inference models learn through Bayesian belief updating. Each model maintains Dirichlet concentration parameters over its 5-modality observation matrix (A), trained on Jan 2015 - Dec 2025 historical data. The training loop processes each day sequentially: observe market features, infer hidden state, evaluate policies via Expected Free Energy, select action, then update A[0] concentrations.


    Training Pipeline
    =================

    Historical Data (2015-2025)
           │
           ▼
    ┌──────────────────┐
    │  Feature Engine  │ ──► Returns, Momentum, Vol, VolumeShock
    └────────┬─────────┘
             │
             ▼
    ┌──────────────────┐
    │  Discretization  │ ──► Convert continuous → discrete bins
    └────────┬─────────┘
             │
             ▼
    ┌──────────────────┐     ┌─────────────────────────────────┐
    │   POMDP Agent    │ ◄───│  Dirichlet Concentration Prior  │
    │                  │     │  A[0]: 20x (learnable)          │
    │  For each day:   │     │  A[1-4]: 1000x (frozen domain)  │
    │  1. Observe      │     └─────────────────────────────────┘
    │  2. Infer state  │
    │  3. Select policy│
    │  4. Update A     │
    └────────┬─────────┘
             │
             ▼
    ┌──────────────────┐
    │  Trained Model   │ ──► HMAC-signed, versioned, stored
    └──────────────────┘

Online Learning: Dirichlet Updates

After training, each model continues to learn daily using Dirichlet concentration updates. Only modality A[0] (returns observation model, concentration=20x) is updated; modalities A[1-4] (P&L, momentum, volatility, volume shock) remain frozen at 1000x concentration to preserve domain knowledge.

// Dirichlet update rule

A_new[observation, state] = A_old[observation, state] + eta * posterior(state)

// Forgetting factor (prevents stale patterns)

A_concentration = A_concentration * omega_forget // omega ~ 0.992

Dual Concentration Strategy

A[0] - Returns: 20x concentration (learnable)
A[1-4] - Others: 1000x concentration (frozen)

Returns adapt to asset-specific patterns; other modalities encode domain knowledge that shouldn't drift.

Why Freeze Most Modalities?

P&L, momentum, volatility, and volume relationships are universal financial concepts. We don't want the model to "unlearn" that losses are bad.

Model Versioning & A/B Testing

Every trained model is cryptographically signed with HMAC-SHA256 using a secret key. This ensures model integrity and enables reproducible research.

Model Hash:a7b3c9d2e1f4...

Created:2024-01-15 08:30:00 UTC

Train Period:Jan 2015 - Dec 2025

Modalities:5 (Return, P&L, Mom, Vol, VolShock)

Multiple model versions can run simultaneously (e.g., SPY_conservative, SPY_aggressive) to A/B test different parameter configurations in live paper trading.

Understanding the Metrics

Sharpe Ratio

Risk-adjusted return: how much return per unit of volatility.

Sharpe = (Return - Risk-Free Rate) / Volatility * sqrt(252)

> 2.0 Excellent> 1.0 Good< 0.5 Poor

Sortino Ratio

Like Sharpe, but only penalizes downside volatility (losses).

Sortino = (Return - Target) / Downside Deviation * sqrt(252)

More forgiving than Sharpe - upside volatility isn't penalized.

Max Drawdown

Largest peak-to-trough decline. Shows worst historical loss.

MDD = (Peak - Trough) / Peak

Lower is better. A 50% drawdown requires 100% gain to recover.

Calmar Ratio

Annual return divided by max drawdown. Risk-adjusted CAGR.

Calmar = CAGR / Max Drawdown

> 3.0 Excellent> 1.0 Acceptable

Why Metrics Show "--"

You'll see dashes in metrics when there's insufficient data to calculate them reliably:

No Trades Yet

Win rate, avg trade size, and other trade-dependent metrics need at least 1 completed trade.

Insufficient History

Sharpe/Sortino need ~20+ data points. VaR/CVaR need ~20+ for statistical validity.

No Drawdown

Calmar ratio is undefined when max drawdown is 0 (model hasn't lost money yet).

Paper Trading vs Backtesting vs Live

Backtesting

Historical simulation. Model is trained and tested on past data. Useful for parameter tuning but susceptible to overfitting. Shown with dashed lines on charts.

Paper Trading (Current)

Real market data, simulated execution. No actual money at risk. This is where we validate that backtest results translate to forward performance. Solid lines after go-live date.

Live Trading (Future)

Real money, real execution. Will be enabled after paper trading validates the strategy with sufficient confidence (30+ days, positive Sharpe on forward test).

Ready to Explore?

Watch our models trade in real-time on the dashboard, or dive into the Research page to experiment with parameters and run your own backtests.

View Dashboard Research Lab