Data Classes for Scenarios. Running Many Simulations
This lecture is about making our code execute many simulation scenarios
By the end, you should be able to
Talking about bias of different penalized SSR-based estimators in simple linear model:
\[ \small Y_{t} = \beta_0 + \beta_1 X + U_t, \quad t=1, \dots, T \]
Estimators for \(\beta_1\) minimize penalized SSR with form \[ \small (\hat{\beta}_0, \hat{\beta}_1) = \argmin \sum_{t=1}^T (Y_t - b_0 - b_1 X_t)^2 + \lambda \Pcal(b_0, b_1) \] \(\Pcal(\cdot)\) — penalty (0 for OLS, \(L^1\) for Lasso, \(L^2\) for ridge)
project/
├── dgps/
│ ├── __init__.py
│ ├── static.py # StaticNormalDGP
│ └── dynamic.py # DynamicNormalDGP
├── estimators/
│ ├── __init__.py
│ ├── ols-like.py # SimpleOLS, SimpleRidge, LassoWrapper
├── main.py # Main script that we call from the CLI
├── protocols.py # DGPProtocol, EstimatorProtocol
└── runner.py # SimulationRunner
main.py with Only One Scenariomain.py
from dgps.dynamic import DynamicNormalDGP
from estimators.ols_like import LassoWrapper
from runner import SimulationRunner
if __name__ == "__main__":
dgp = DynamicNormalDGP(beta0=0.0, beta1=0.95)
estimator = LassoWrapper(reg_param=0.04)
n_obs = 50
# Run simulation for specified scenario
runner = SimulationRunner(dgp, estimator)
runner.simulate(n_sim=1000, n_obs=n_obs, first_seed=1)
# Print results
print(
f"Bias for {dgp.__class__.__name__} + {estimator.__class__.__name__}: "
)
runner.summarize_bias()Key challenge:
How do we run many scenarios automatically?
A problem of orchestration: coordinating multiple tasks
Goal: being able to focus on results
One way: add all combos (DGPs, estimators, sample sizes) manually in main.py, create SimulationRunner for each one
Not a very good approach
main.py for every changeSimulationRunner creation)SimulationScenario Class“Scenario” — collection of characteristics that uniquely define a setting for SimulationRunner
Our case has three characteristics:
collections.namedtuple() or typing.NamedTuple)@dataclasses.dataclassGenerally good practice to be explicit (know if something goes wrong; clearer code)
@dataclasses.dataclass like in the EPP class (but be aware of other simpler options)SimulationScenario Data Class Definition@dataclass(frozen=True)
class SimulationScenario:
"""A single simulation scenario: DGP, estimator, and sample size."""
name: str # For readability
dgp: type[DGPProtocol]
dgp_params: dict # E.g. betas go here
estimator: type[EstimatorProtocol]
estimator_params: dict # E.g. reg_params go here
sample_size: int
n_simulations: int = 1000
first_seed: int = 1dataclass comes with __init__, a nice __eq__ and other useful methods practically for freeThe type[Class] type annotation means we want a class, not an instance of it. See here
SimulationScenarioCan now define example instance:
Note how we passed a class to SimulationScenario (e.g. StaticNormalDGP, not instance with StaticNormalDGP())
SimulationScenario with SimulationRunner# Initialize the scenario
dgp = example_scenario.dgp(**example_scenario.dgp_params)
estimator = example_scenario.estimator(**example_scenario.estimator_params)
# Run the simulation
runner = SimulationRunner(dgp, estimator)
runner.simulate(
n_sim=example_scenario.n_simulations,
n_obs=example_scenario.sample_size,
first_seed=example_scenario.first_seed,
)
# Print results
print(f"Bias for {example_scenario.name}: {runner.errors.mean():.4f}")Bias for static_ols_T50: -0.0027
example_scenario contains all the information necessary for SimulationRunner
**example_scenario.dgp_params is an example of dictionary unpacking; here for using keyword arguments
Choice depends on your goal:
A couple of options:
For now: a Python list coming from a scenarios.py file is fine for us
scenarios.py
scenarios = [
SimulationScenario(
name="static_ols_T50",
dgp=StaticNormalDGP,
dgp_params={"beta0": 0.0, "beta1": 0.5},
estimator=SimpleOLS,
estimator_params={},
sample_size=50,
),
SimulationScenario(
name="dynamic_lasso_T200",
dgp=DynamicNormalDGP,
dgp_params={"beta0": 0.0, "beta1": 0.95},
estimator=LassoWrapper,
estimator_params={"reg_param": 0.1},
sample_size=200,
)
]Other extreme: all possible combinations of scenario characteristics
Creation steps:
One can think of other combinations that lie between manual lists and full-on Cartesian products
scenarios.py With All Possible Combinations:scenarios.py
from itertools import product
# Define lists of components
dgps = [
(StaticNormalDGP, {"beta0": 0.0, "beta1": 1.0}, 'static'),
(DynamicNormalDGP, {"beta0": 0.0, "beta1": 0.0}, 'dynamic_low_pers'),
(DynamicNormalDGP, {"beta0": 0.0, "beta1": 0.5}, 'dynamic_mid_pers'),
(DynamicNormalDGP, {"beta0": 0.0, "beta1": 0.95}, 'dynamic_high_pers'),
]
estimators = [
(SimpleOLS, {}),
(LassoWrapper, {"reg_param": 0.1}),
(SimpleRidge, {"reg_param": 0.1})
]
sample_sizes = [50, 200]
# Generate all combinations
scenarios = [
SimulationScenario(
name=f"{dgp_class.__name__.lower()}_{dgp_descr}_{estimator_class.__name__.lower()}_T{size}",
dgp=dgp_class,
dgp_params=dgp_params,
estimator=estimator_class,
estimator_params=estimator_params,
sample_size=size,
)
for (dgp_class, dgp_params, dgp_descr), (estimator_class, estimator_params), size
in product(dgps, estimators, sample_sizes)
]Here: choose automatic approach
Would be annoying to write all these by hand
Note:
In reality often some hybrid: manually create set of pairs (DGP, estimator), take product with some sizes (not all possible DGP-estimator pairs)
Now have added a new scenarios.py file to our folder:
project/
├── dgps/
│ ├── __init__.py
│ ├── static.py
│ └── dynamic.py
├── estimators/
│ ├── __init__.py
│ └── ols-like.py
├── protocols.py
├── runner.py
├── scenarios.py # New: Defines SimulationScenario and scenarios list
└── main.py
SimulationOrchestrator ClassSo far:
A simple simulation orchestrator:
simulate()More advanced: can parallelize/distribute computation, etc.
main.py Look Like?main.py
SimulationOrchestrator Class Definitionorchestrator.py
class SimulationOrchestrator:
"""Simple simulation orchestration class without any result handling
"""
def __init__(self, scenarios: list[SimulationScenario]):
self.scenarios = scenarios
def run_all(self):
for scenario in scenarios:
# Create DGP and estimator
dgp = scenario.dgp(**scenario.dgp_params)
estimator = scenario.estimator(**scenario.estimator_params)
# Run the simulation
runner = SimulationRunner(dgp, estimator)
runner.simulate(
n_sim=scenario.n_simulations,
n_obs=scenario.sample_size,
first_seed=scenario.first_seed,
)
# Print results
print(f"Bias for {scenario.name}: {runner.errors.mean():.4f}")main.pyExecuting the script now prints the results for all scenarios!
Bias for staticnormaldgp_static_simpleols_T50: -0.0027
Bias for staticnormaldgp_static_simpleols_T200: -0.0028
Bias for staticnormaldgp_static_lassowrapper_T50: -0.1108
Bias for staticnormaldgp_static_lassowrapper_T200: -0.1048
Bias for staticnormaldgp_static_simpleridge_T50: -0.0048
Bias for staticnormaldgp_static_simpleridge_T200: -0.0033
Bias for dynamicnormaldgp_low_pers_simpleols_T50: -0.0261
Bias for dynamicnormaldgp_low_pers_simpleols_T200: -0.0029
Bias for dynamicnormaldgp_low_pers_lassowrapper_T50: -0.0655
Bias for dynamicnormaldgp_low_pers_lassowrapper_T200: -0.0715
Bias for dynamicnormaldgp_low_pers_simpleridge_T50: -0.0262
Bias for dynamicnormaldgp_low_pers_simpleridge_T200: -0.0029
Bias for dynamicnormaldgp_mid_pers_simpleols_T50: -0.0509
Bias for dynamicnormaldgp_mid_pers_simpleols_T200: -0.0103
Bias for dynamicnormaldgp_mid_pers_lassowrapper_T50: -0.1374
Bias for dynamicnormaldgp_mid_pers_lassowrapper_T200: -0.0880
Bias for dynamicnormaldgp_mid_pers_simpleridge_T50: -0.0515
Bias for dynamicnormaldgp_mid_pers_simpleridge_T200: -0.0105
Bias for dynamicnormaldgp_high_pers_simpleols_T50: -0.0992
Bias for dynamicnormaldgp_high_pers_simpleols_T200: -0.0204
Bias for dynamicnormaldgp_high_pers_lassowrapper_T50: -0.1314
Bias for dynamicnormaldgp_high_pers_lassowrapper_T200: -0.0347
Bias for dynamicnormaldgp_high_pers_simpleridge_T50: -0.0993
Bias for dynamicnormaldgp_high_pers_simpleridge_T200: -0.0204
A total victory:
Can talk about further improvements, handling results, but we have an extensible and broadly-applicable core
So far: just printing bias values to the console
More typical: export results in some nice tabular/text form
Here: just brief example
orchestratormain.pySimulationOrchestrator Classclass SimulationOrchestrator:
"""Simulation orchestrator that stores results in a dictionary
"""
def __init__(self, scenarios: list[SimulationScenario]):
self.scenarios = scenarios
self.summary_results = {}
def run_all(self):
for scenario in scenarios:
# Create DGP and estimator
dgp = scenario.dgp(**scenario.dgp_params)
estimator = scenario.estimator(**scenario.estimator_params)
# Run the simulation
runner = SimulationRunner(dgp, estimator)
runner.simulate(
n_sim=scenario.n_simulations,
n_obs=scenario.sample_size,
first_seed=scenario.first_seed,
)
# Save results
self.summary_results[scenario.name] = runner.errors.mean()SimulationOrchestrator implementation knows that its handling biassummarize() method to SimulationRunner that knows what to exportorchestrator will just receive whatever summarize() givesorchestrator even more reusablemain.pymain.py
import pandas as pd
from orchestrator import SimulationOrchestrator
from scenarios import scenarios
if __name__ == "__main__":
# Create and execute simulations
orchestrator = SimulationOrchestrator(scenarios)
orchestrator.run_all()
# Results logic (print or export as pd.Series)
print(pd.Series(orchestrator.summary_results))Here for simplicity: print the Series, but would generally to_csv()
Executing the script now prints the results for all scenarios!
staticnormaldgp_static_simpleols_T50 -0.002680
staticnormaldgp_static_simpleols_T200 -0.002753
staticnormaldgp_static_lassowrapper_T50 -0.110823
staticnormaldgp_static_lassowrapper_T200 -0.104789
staticnormaldgp_static_simpleridge_T50 -0.004828
staticnormaldgp_static_simpleridge_T200 -0.003261
dynamicnormaldgp_low_pers_simpleols_T50 -0.026099
dynamicnormaldgp_low_pers_simpleols_T200 -0.002854
dynamicnormaldgp_low_pers_lassowrapper_T50 -0.065491
dynamicnormaldgp_low_pers_lassowrapper_T200 -0.071531
dynamicnormaldgp_low_pers_simpleridge_T50 -0.026203
dynamicnormaldgp_low_pers_simpleridge_T200 -0.002900
dynamicnormaldgp_mid_pers_simpleols_T50 -0.050864
dynamicnormaldgp_mid_pers_simpleols_T200 -0.010286
dynamicnormaldgp_mid_pers_lassowrapper_T50 -0.137379
dynamicnormaldgp_mid_pers_lassowrapper_T200 -0.088010
dynamicnormaldgp_mid_pers_simpleridge_T50 -0.051537
dynamicnormaldgp_mid_pers_simpleridge_T200 -0.010470
dynamicnormaldgp_high_pers_simpleols_T50 -0.099214
dynamicnormaldgp_high_pers_simpleols_T200 -0.020369
dynamicnormaldgp_high_pers_lassowrapper_T50 -0.131433
dynamicnormaldgp_high_pers_lassowrapper_T200 -0.034744
dynamicnormaldgp_high_pers_simpleridge_T50 -0.099312
dynamicnormaldgp_high_pers_simpleridge_T200 -0.020425
dtype: float64
In this lecture we
Can keep adding things to code:
Project could also benefit from more reproducibility:
This block: structuring and thinking about simulation code
Overall design:
Quality of life features:
Code Design IV: Orchestration