Role of Simulations. Principles, Design, and Anatomy
This lecture is a high-level overview of why and how we do Monte Carlo simulations
By the end, you should be able to
In one sentence:
Developing new methods that allow user to learn parameters of interest from data
Parameters of interest:
Consumers of statistical methods want to use methods that they know “work well”
Natural:
But what’s “well”?
Modern understanding:
A method works “well” if some metric of interest “tends” to “look good” under a reasonably broad variety of data generating processes
Three main methods, in decreasing strength:
Nonparametric finite sample bounds — best possible results
Usually possible for specific scenarios (bounded variables, bounded dimensions). Examples:
Key challenge:
In many problems impossible to obtain useful guarantees
Problematic scenarios:
Other end of spectrum:
Testing methods on real datasets of interest
Examples:
See the relevant Wikipedia page for more example datasets
Limited to scenarios where you know the ground truth: only prediction, but no causal inference
Other issues:
Simulations: check every aspect of performance in a “lab” setting with many synthetic datasets
Simulations allow answering “what if” questions, e.g.:
An easy clear simulation: good way of motivating a problem
Example: figure from intro of Chernozhukov et al. (2018) — danger of not using Neyman orthogonalization (left panel)
Our focus: Monte Carlo simulations
MC: drawing many random datasets and tabulating performance across these datasets
Not the only kind of simulations. Contrast with
Good simulations are
DGPs should mimic essential real-world features without excess complexity
Intuitively:
Simulations should be reproducible exactly
Steps to achieve:
Simulation DGPs should reflect the property of interest
Example:
In this lecture we
Now have an idea of the what and why — next question is how?
Why Simulate and General Principles