The Power of the Monte Carlo Simulation & How to Use It

Machine learning and Artificial Intelligence technologies are excellent methods for extracting insight from vast amount of data. But how do you generate insight when there is limited or inaccurate data?
Consider a scenario where a variety of outcomes exists with different degrees of certainty and data accuracy. For example, measuring risks of international oil and gas projects. These projects feature high capital-intensity, high risks, and contract diversity. Therefore, in order to help decision makers make more reasonable decisions under uncertainty, it is necessary to measure the risks.
Handling scenarios like this using machine learning techniques is very complex and requires a lot of accurate data. How, then, can you generate insight?
The simple answer is the Monte Carlo Simulation method.
The method could be applied to simulate the stochastic distribution of risk factors in a probabilistic model. The potential is incredible, and the opportunities are never-ending.
To demonstrate the usage of this technique, we are going to apply this rightly popular method to an incredibly complex pipeline integrity case and discuss the way infrastructure and compute challenges were handled.
First, however, let’s break down exactly what the Monte Carlo Simulation (MCS) is and how it functions as such an important tool.

A Powerful Data Science Multitool
The Monte Carlo Simulation method is a broad class of mathematical algorithms that uses repeated random sampling to gain probabilistic insights into problems. MCS is used to model the probability of different outcomes in a process that cannot easily be predicted due to the intervention of random variables. It is a technique used to understand the impact of risk and uncertainty in prediction and forecasting models. A Monte Carlo Simulation can be used to tackle a range of problems in finance, oil and gas, engineering, project management, supply chain, risk analysis, and science.
The simulation is used when:
A process cannot be easily predicted due to involved risks and uncertainties
The relationship between influential factors is too complex for traditional statistical models
Data is sufficient but not abundant
Data on outcomes/results are impossible or costly to collect
When faced with significant uncertainty in the process of making a forecast or estimation, rather than just replacing the uncertain variable with a single average number, the Monte Carlo Simulation does better, by representing uncertainty with distributions. Through the MCS, uncertainties in the input distributions is propagated to the output. The output is then a distribution of the millions of probable scenarios that are likely to happen given the input parameters. This makes MCS a very popular approach for risk analysis in business, engineering and operational problems, when there is always a degree of uncertainty in input parameters and output.
Below is a simple diagram demonstrating the MCS method:

For an equation: x + y = z, where x and y are single values, z is simply a sum of x and y.
Now, imagine instead of x and y being a single number, they are probabilistic distributions. In order to solve for z, a random number is sampled from each of the distributions, and subsequently plugged into the equation. This process of “sample -> substitute -> solve” is repeated many times, and the end values are collected to generate a distribution of z.
The Benefits of Monte Carlo Simulation
MCS offers a number of advantages over deterministic statistical methods when approaching complex problems:
What-if scenario analysis can be conducted, and the impact of changes in input variables can be tested with no real consequences.
Simulation model can be calibrated by adjusting the distribution models for input variables.
Better decision-making is made possible, as the distribution of estimated variable of interest provides more insights. Decisions can be made based on probable scenarios and a range of values instead of a single value result.
Accounting for uncertainties by using input distribution mitigates the risk of making assumptions. Input distributions take into account errors, variations and uncertainties innately.
Extreme events and rare instances can be simulated when performing risk analysis.