**Introduction**

In this blog post, I'll be discussing three concepts in predictive modeling:

- Monte Carlo methods
- Model calibration
- Bootstrapping

The second concept is the mindset we'll need to test these results and decide if they have any merit.

**I: The problem at hand**

The simple example driving this blog post is this: can we predict the index value of the Dow Jones Industrial Average (DJIA) in advance? For the sake of argument, we'll look at prediction horizons of 1, 5, 10, and 40 years in advance.

For the further sake of argument, we'll use the following simple model: we'll look at monthly data from the previous 36 months and use those values to calculate an expected monthly return and an expected monthly standard deviation. We'll then use those parameters in a Geometric Brownian Motion (GBM) model and forecast out as many months in the future as we need.

(I'll be showing only the minimum amount of MATLAB code needed for this post; if you want to full code for this example, you can download it from MATLAB Central at http://www.mathworks.co.uk/matlabcentral/fileexchange/44899-predictive-model-calibration.)

Let's show a quick example of this:

% Monthly data for the DJIA, downloaded from Yahoo! Finance

load DJIA_Monthly_Data

% The row index of the end of 1949 (an arbitrarily selected date,

% just to explain this method):

idx1949 = 255;

% The monthly returns of the 36 months leading in to 1950:

sampledReturns = returns(idx1949-35 : idx1949);

% The 1949 year-end closing price:

price0 = prices(idx1949);

% The expected return and standard deviation:

eReturn = mean(sampledReturns);

eSigma = std(sampledReturns);

**II: Monte Carlo methods**

Now, for this very simple case (a single time series assuming a GBM model), there do exist closed-form solutions for the forecasted index levels at arbitrary future time horizons. Just go to Wikipedia for "GBM" or look up "Ito's Lemma" for the result.

One of the challenges of that closed-form result, though, is that it's very specific to this simple case. We can't easily derive a closed-form solution if we were to introduce any number of minor variations, such as:

- We are no longer interested in just the DJIA, but rather a basket of correlated indices such as the DJIA, the FTSE, the DAX, the Nikkei, etc.
- We no longer assume GBM, but rather a Student's t-distribution (or a GARCH(1,1) model, or a Heston model, or a SABR model...)
- We aren't really interested in the DJIA price as such, but rather in an investment account that adds $1000 in new capital each month to the existing investment, which tracks the DJIA (less fees that are applied only to the investment's profits).

By using Monte Carlo techniques, I can generate the full forecast distribution at any time horizon in just a few lines of code. These lines are easily adapted for any of the above wrinkles, and the result only takes a few seconds to generate:

nSims = 1e4;

% Simulate the Brownian motion for the returns:

simRets = normrnd(eReturn, eSigma, 40*12, nSims);

% Convert the simulations to GBM on the index prices:

simPrices = price0 * cumprod(1+simRets);

plotForecasts(simPrices, '1-Jan-1950')

**IIIa: (Naive) model calibration**

Next, we're going to ask if these simulated results hold any predictive value at all: do they behave as advertised? Let's start with a simple plot. Because our above simulations were based off of predictions from 1 January 1950, we know exactly what the DJIA actually looked like at the 1-, 5-, 10, and 40-year horizons. Let's add the actual values as red lines:

plotForecasts(simPrices, '1-Jan-1950', prices(idx1949 + [12 60 120 480]'))

I hope that the above paragraph made you cringe a bit, because it's a vast oversimplification based off a single data point at 1 January 1950. If we tried to base any sort of a conclusion off of this single test, then we should be laughed out of the room. (I'm sorry to say that I have seen others try to draw such conclusions from even flimsier "tests" than this.)

No, instead we need to do a more complete backtest using as much data as we can. Fortunately, we have 85 years of DJIA prices to work from. As a start, let's loop over all the years for which we have data and use our Monte Carlo simulation to find the median predicted prices at 1, 5, 10, and 40 year horizons. Once we have our predicted median values for all months in the backtest, we can compare them against the actual DJIA prices for as many months as we can. We'll need to carefully index to make sure that the dates line up, and we'll also need to trim our results (because, for example, we can't yet evaluate the accuracy of the 40-year prediction made as of 2010). If our model is any good, then its median predictions should be larger than the actual backtested results about half the time.

(Again, download the full code from MATLAB Central to see the details.)

What we find is that 46.3% of the time, our median, 1-year prediction is larger than the actual backtested value. Similarly, the 5-year prediction is larger 49.7% of the time, the 10-year is larger 47.5% of the time, and the 40-year is larger 45.8% of the time.

With all these values close to 50%, we have a promising result. But that's only looking at the median predicted value: unless we're investing in something like binary options, we'll want to feel confident in the full range of predicted outcomes and not just in the median prediction. IIIb: (Less naive) model calibration To look at the full range of predicted outcomes, I'm going to adapt a procedure outlined in chapter 4 of Nate Silver's

*The Signal and the Noise*(and which he credits to Eric Floehr of ForecastWatch.com). The procedure was originally applied to rain forecasts and asked the simple question: "When the forecast gives a 20% chance of rain, does it actually rain 20% of the time?" We can ask the same question at an 80% chance of rain, a 50% chance of rain-- at any percentage of certainty we like. If the forecasted predictions match the actual results across the full spectrum, only then can we call the model "well-calibrated".

In our problem, we actually have a simulated range of predicted outcomes at each time horizon. If our predictive model is "well-calibrated", then the actual index price should be less than the 5th percentile of our predictions in only 5% of the backtests. Similarly, the actual index price should be less than the 70th percentile of our predictions in 70% of the backtests, and so forth. This suggests a straightforward backtest whose results we can display on an "actual rate of occurrence vs. predicted rate of occurrence" set of axes:

Although the model's median outcomes are relatively accurate, it seems to completely underestimate its own error bars: it's entirely too confident in its predictions. Any attempts we might make to use this predictive model will end up hurting us: we'll be constantly surprised by the frequent occurrences of events that the model incorrectly asserts are rare.

**IV: What went wrong?**

So, what happened? Well, one possibility is that we simply have a "bad" model. It could be that 36 months of historical data simply have no predictive power or that a GBM model simply doesn't describe the DJIA adequately. These possibilities require us to go back to the drawing board and begin anew-- a possibly time-consuming affair.

Fortunately, there is one other possibility for what went wrong, and it's a mistake that we made above that the above calibration charts decisively hint at: we are acting too certain of ourselves! Recall that we are using 36 months of data to infer the expected return and the expected standard deviation. This is a relatively small sample size, so even if our data is well-modeled by this framework, we still have to expect that our calculated parameters are subject to a (possibly large!) sampling error.

This is analogous to the situation where someone hands you a coin and asks you to find the probability of it coming up heads. If you flip it 10 times and it comes up heads 6 of those times, does that mean that P(heads) = 0.6? Not at all! If you are making predictions, you must either:

- Flip the coin many more times to get a more precise estimate of P(heads), or
- Make your predictions under the assumption that P(heads) is not known precisely but rather is an unknowable value near 0.6.

**V. Bootstrapping**

When "flipping more coins" is a valid and inexpensive option, then the first possibility is best. We see it used in physics and chemistry experiments all the time: many repeated experiments leading to small estimation errors. In finance, though, the first choice is not valid at all: we might be able to expand the calibration window beyond 36 months, but doing so introduces other potential problems. We are instead forced to use the second approach, which uses so-called "stochastic parameters".

So, given that our expected returns are not known precisely, how much uncertainty should we allow to them? Well, that turns out to be a well-known result and is a simple function of the standard deviation. Likewise, the uncertainty of the standard deviation can be found using chi-squared distributions, and so forth. In other words, there are well-known, closed form equations for all of these uncertainties.

Just as with our forecasting above, though, these closed-form results only apply to a small subset of simple cases. Sure, we can use them on this single-variable GBM model, but it is far more challenging to derive equations for the uncertainty of the parameters for multiple, correlated time series or for more complex volatility models.

Our solution before, when closed-form forecasts were difficult, was to use fast computers to our advantage: Monte Carlo methods. This time around, we'll use fast computers again: bootstrapping.

Bootstrapping, for those who haven't used it, is nothing more complicated than sampling with replacement, over and over again. In this example, we use our trailing 36 months of returns to calculate an expected return and standard deviation. We will then randomly select 36 months of returns (with replacement) from this sample and recalculate the mean and standard deviation. Because this "bootstrap replicate" is not exactly the same as our original sample, we will get slightly different estimates. We then draw another replicate, and another, and another: thousands of times if needed.

The key here is that our original sample is

*characteristic*of the unknown, underlying process, and because it is, then so too are all of the bootstrap replicates. When we consider all of the replicates as a group, we can gain an understanding of the range and distribution that the stochastic parameters might take.

We can then take each bootstrap-derived parameter and feed it into our Monte Carlo simulations: the grand aggregate across all Monte Carlo simulations and all bootstrap estimates should correctly represent our model's prediction in the face of our sampling errors. Let's see it in action (with one more reminder that the code to do all this is at MATLAB Central):

Once we properly account for the stochastic parameters, the model starts to have some value to us: we can actually begin to trust its predictions.

**VI. Conclusions**

I like to think of calibration as a necessary condition for a "good" predictive model. If a model can't even be calibrated (that is, it has no confidence bounds to test), then it's already useless for just about any practical application. Any models that don't report their confidence bounds should instantly be mistrusted.

If a model has confidence bounds, but they're not well-calibrated, then it's also a poor model and shouldn't be trusted. What's the point of using a model that you know to be inaccurate?

If a model is well-calibrated, then it finally becomes accurate enough to be

*potentially*useful. From there on, it becomes a matter of precision: two well-calibrated models can be compared to each other in terms of the range of their confidence limits. For example, this current, "well-calibrated" model can only say that the median value of the DJIA 1 year from now (where "now" is the beginning of 2014) is about 18600 and that the 95% confidence bounds are between 14400 and 23600. Given that the closing price as of this writing is about 16500, that's hardly a precise prediction. It gets even broader in the long run: the 40-year prediction out to the year 2053 has a median value of around 1,900,000 (!) with a 95% confidence interval between 9600 and 300,000,000 (!!!). This is a clear illustration of the difference between accuracy (which a well-calibrated model has) and precision (which this model certainly does NOT have).