Lesson 23: Walk-Forward Optimisation

Problem

The Overfitting Trap

Overfitting happens when you tune parameters (EMA periods, RSI thresholds, ATR multipliers) on your entire dataset and then report the resulting performance as if it were real. The strategy has "memorised" the historical data instead of learning a genuine edge.

Signs of an overfit backtest

Sharpe > 3 on in-sample data but collapses to 0.3 on new data; strategy requires very specific parameter values (EMA 17 works, EMA 16 and 18 don't); very few trades but all winners; equity curve looks like a straight line upward.

Solution

Walk-Forward Optimisation (WFO)

WFO splits your data into rolling windows of In-Sample (IS) training and Out-of-Sample (OOS) testing periods. You only trust performance on OOS data.

Window 1

IS — Optimise

OOS

Window 2

IS — Optimise

OOS

Window 3

IS — Optimise

OOS

IS : OOS ratio of 3:1 or 4:1 is standard

Implementation

WalkForwardOptimiser Class

PYTHONwfo.py

import pandas as pd, numpy as np
from itertools import product
from backtest_engine import BacktestEngine
from performance     import PerformanceReport

class WalkForwardOptimiser:
    def __init__(
        self,
        df          : pd.DataFrame,
        signal_fn,               # callable(df, **params) → Series
        param_grid  : dict,      # {"fast": [5,9,13], "slow": [21,34]}
        is_periods  : int = 63, # 63 trading days = 3 months IS
        oos_periods : int = 21, # 21 trading days = 1 month OOS
        score_metric: str = "sharpe",
    ):
        self.df          = df
        self.signal_fn   = signal_fn
        self.param_grid  = param_grid
        self.is_periods  = is_periods
        self.oos_periods = oos_periods
        self.score_metric= score_metric
        self.engine      = BacktestEngine()

    def _param_combinations(self) -> list[dict]:
        keys   = list(self.param_grid.keys())
        values = list(self.param_grid.values())
        return [dict(zip(keys, v)) for v in product(*values)]

    def _score(self, result, metric: str) -> float:
        rpt = PerformanceReport(result.equity, result.trades)
        s   = rpt.summary()
        return s.get(metric, 0.0)

    def run(self) -> pd.DataFrame:
        """Run WFO and return OOS performance for each window."""
        n        = len(self.df)
        step     = self.oos_periods
        start    = self.is_periods
        windows  = []

        for oos_start in range(start, n - step + 1, step):
            is_start  = oos_start - self.is_periods
            is_end    = oos_start
            oos_end   = min(oos_start + self.oos_periods, n)

            df_is  = self.df.iloc[is_start:is_end]
            df_oos = self.df.iloc[is_end:oos_end]

            # Optimise on IS data
            best_score  = -np.inf
            best_params = {}
            for params in self._param_combinations():
                fn = lambda d, p=params: self.signal_fn(d, **p)
                r  = self.engine.run(df_is, fn, params)
                sc = self._score(r, self.score_metric)
                if sc > best_score:
                    best_score  = sc
                    best_params = params

            # Test on OOS with best IS params
            fn_oos = lambda d, p=best_params: self.signal_fn(d, **p)
            r_oos  = self.engine.run(df_oos, fn_oos, best_params)
            rpt    = PerformanceReport(r_oos.equity, r_oos.trades)

            windows.append({
                "oos_start"   : df_oos.index[0],
                "oos_end"     : df_oos.index[-1],
                "best_params" : best_params,
                "is_score"    : round(best_score, 3),
                **{f"oos_{k}": v for k, v in rpt.summary().items()},
            })

        return pd.DataFrame(windows)

Analysis

Interpreting WFO Results

PYTHONwfo_analysis.py

wfo = WalkForwardOptimiser(
    df          = df_nifty_daily,
    signal_fn   = ema_cross_signal,
    param_grid  = {"fast": [5,9,13], "slow": [21,34,50]},
    is_periods  = 252,  # 1 year IS
    oos_periods = 63,   # 3 months OOS
)

results = wfo.run()

# ── Key questions ─────────────────────────────────────────
# 1. Is OOS Sharpe consistently positive?
print(results["oos_sharpe"].describe())

# 2. Do IS best params stay stable across windows?
#    (instability = overfitting signal)
print(results["best_params"].value_counts().head(5))

# 3. Efficiency ratio = avg OOS Sharpe / avg IS Sharpe
#    Should be > 0.5 for a robust strategy
efficiency = results["oos_sharpe"].mean() / results["is_score"].mean()
print(f"WFO Efficiency: {efficiency:.2f}")

WFO Efficiency	Interpretation
> 0.7	Robust strategy — parameters generalise well
0.5 – 0.7	Acceptable — monitor closely in live trading
0.3 – 0.5	Marginal — consider simplifying the strategy
< 0.3	Overfit — do not trade live

The rule of robustness

A genuinely robust strategy should perform similarly across a range of parameter values (e.g., EMA 8–12 for fast, EMA 18–26 for slow). If performance collapses when you move one parameter by ±1, the edge is not real.

← L22: Performance Metrics

Phase 3 · Lesson 6 of 8

Walk-Forward Optimisation

L24: Multi-Strategy Portfolio →