Statistical Modelling

Probabilistic Forecasting: Markov Chain Monte Carlo

MCMC is a stochastic simulation method used widely to solve complex problems where uncertainties are large. The method samples a range of values for the important model parameters and returns the most probable outcomes.

In the climate model MAGICC, the initial parameter values necessary to simulate a climate scenario, such as the ratio of the warming of land to the warming of the ocean have a range of possible values within the boundaries previously calculated. Then the model uses physical equations with these different input values to simulate a range of possible scenarios. The outcome is a set of possible global warming outcomes, where the central values are the most likely ones, and the extreme values are the least likely. For more check the climate page. The fact that there is a range of possible outcomes however only means that there is some uncertainty in how the climate system will respond and the Earth will get warmer, but we can quantify that uncertainty.

Another example of MCMC, is in the astrophysics page. To run a simulation of a galaxy that is orbiting another one and is losing mass to it we need first initial conditions for the simulation. How large and how massive the galaxy was to start with, how far from the host and how it was moving in 3 dimensional space. We have previous knowledge for the value of all of these parameters but with uncertainties. For instance from its present state, we know that it needs to be a galaxy with a beginning mass of between a million and ten billion solar masses. Although this is a large range of values, we can use a sampling method like MCMC to explore this large parameter space and give us the combination of parameter values that resulted in the most realistic simulations. A schematic representation of this process can be found here.

In the Metropolis-Hastings algorithm, MCMC performs the sampling of parameter values by drawing random values from an assumed background distribution. For instance in the above example, we can assume a normal (bell curved) distribution for the initial mass to be centered in the middle of the range. The MCMC will initially start with a random value for each parameter and perform individual simulations. Then in each iteration, it takes a random step within the parameter range to choose the next set of values and compare the result of the consecutive simulations to decide in which direction it should take the next step. It eventually converges to an equilibrium state where it has a range of best values for the parameters, constraining the possibilities.

You can check out a black box Python package for MCMC here. Or here is an unfortunately repetitive old bash script automating the sampling and the simulations.

UNDER CONSTRUCTION!


Monte Carlo Simulations


Working with Small Data Sets

Hypothesis Testing: KS Test

Hypothesis Testing: A/B Test

Gaussian Mixture Models

Markov Chains



Back to the main page