The Overfitting Trap
Overfitting happens when you tune parameters (EMA periods, RSI thresholds, ATR multipliers) on your entire dataset and then report the resulting performance as if it were real. The strategy has "memorised" the historical data instead of learning a genuine edge.
Sharpe > 3 on in-sample data but collapses to 0.3 on new data; strategy requires very specific parameter values (EMA 17 works, EMA 16 and 18 don't); very few trades but all winners; equity curve looks like a straight line upward.
Walk-Forward Optimisation (WFO)
WFO splits your data into rolling windows of In-Sample (IS) training and Out-of-Sample (OOS) testing periods. You only trust performance on OOS data.
IS : OOS ratio of 3:1 or 4:1 is standard
WalkForwardOptimiser Class
import pandas as pd, numpy as np from itertools import product from backtest_engine import BacktestEngine from performance import PerformanceReport class WalkForwardOptimiser: def __init__( self, df : pd.DataFrame, signal_fn, # callable(df, **params) → Series param_grid : dict, # {"fast": [5,9,13], "slow": [21,34]} is_periods : int = 63, # 63 trading days = 3 months IS oos_periods : int = 21, # 21 trading days = 1 month OOS score_metric: str = "sharpe", ): self.df = df self.signal_fn = signal_fn self.param_grid = param_grid self.is_periods = is_periods self.oos_periods = oos_periods self.score_metric= score_metric self.engine = BacktestEngine() def _param_combinations(self) -> list[dict]: keys = list(self.param_grid.keys()) values = list(self.param_grid.values()) return [dict(zip(keys, v)) for v in product(*values)] def _score(self, result, metric: str) -> float: rpt = PerformanceReport(result.equity, result.trades) s = rpt.summary() return s.get(metric, 0.0) def run(self) -> pd.DataFrame: """Run WFO and return OOS performance for each window.""" n = len(self.df) step = self.oos_periods start = self.is_periods windows = [] for oos_start in range(start, n - step + 1, step): is_start = oos_start - self.is_periods is_end = oos_start oos_end = min(oos_start + self.oos_periods, n) df_is = self.df.iloc[is_start:is_end] df_oos = self.df.iloc[is_end:oos_end] # Optimise on IS data best_score = -np.inf best_params = {} for params in self._param_combinations(): fn = lambda d, p=params: self.signal_fn(d, **p) r = self.engine.run(df_is, fn, params) sc = self._score(r, self.score_metric) if sc > best_score: best_score = sc best_params = params # Test on OOS with best IS params fn_oos = lambda d, p=best_params: self.signal_fn(d, **p) r_oos = self.engine.run(df_oos, fn_oos, best_params) rpt = PerformanceReport(r_oos.equity, r_oos.trades) windows.append({ "oos_start" : df_oos.index[0], "oos_end" : df_oos.index[-1], "best_params" : best_params, "is_score" : round(best_score, 3), **{f"oos_{k}": v for k, v in rpt.summary().items()}, }) return pd.DataFrame(windows)
Interpreting WFO Results
wfo = WalkForwardOptimiser(
df = df_nifty_daily,
signal_fn = ema_cross_signal,
param_grid = {"fast": [5,9,13], "slow": [21,34,50]},
is_periods = 252, # 1 year IS
oos_periods = 63, # 3 months OOS
)
results = wfo.run()
# ── Key questions ─────────────────────────────────────────
# 1. Is OOS Sharpe consistently positive?
print(results["oos_sharpe"].describe())
# 2. Do IS best params stay stable across windows?
# (instability = overfitting signal)
print(results["best_params"].value_counts().head(5))
# 3. Efficiency ratio = avg OOS Sharpe / avg IS Sharpe
# Should be > 0.5 for a robust strategy
efficiency = results["oos_sharpe"].mean() / results["is_score"].mean()
print(f"WFO Efficiency: {efficiency:.2f}")
| WFO Efficiency | Interpretation |
|---|---|
| > 0.7 | Robust strategy — parameters generalise well |
| 0.5 – 0.7 | Acceptable — monitor closely in live trading |
| 0.3 – 0.5 | Marginal — consider simplifying the strategy |
| < 0.3 | Overfit — do not trade live |
A genuinely robust strategy should perform similarly across a range of parameter values (e.g., EMA 8–12 for fast, EMA 18–26 for slow). If performance collapses when you move one parameter by ±1, the edge is not real.
