Good Simulation Code IV: Orchestration

Data Classes for Scenarios. Running Many Simulations

Vladislav Morozov

Introduction

Lecture Info

Learning Outcomes

This lecture is about making our code execute many simulation scenarios

By the end, you should be able to

Capture simulation scenarios in data classes
Automatically and manually build collections of simulation scenarios
Implement a simulation orchestrator class for running many simulations

References

Programming:

Chapter 5 in Ramalho (2022) about data classes
Chapter 39 in Lutz (2025) on decorators
Chapter 6 in Lau (2023) for a refresher on pandas

Reminder: Previous Simulation Setting

Reminder: Study Bias of Penalized SSR Estimators

Talking about bias of different penalized SSR-based estimators in simple linear model:

\[ \small Y_{t} = \beta_0 + \beta_1 X + U_t, \quad t=1, \dots, T \]

Estimators for \(\beta_1\) minimize penalized SSR with form \[ \small (\hat{\beta}_0, \hat{\beta}_1) = \argmin \sum_{t=1}^T (Y_t - b_0 - b_1 X_t)^2 + \lambda \Pcal(b_0, b_1) \] \(\Pcal(\cdot)\) — penalty (0 for OLS, \(L^1\) for Lasso, \(L^2\) for ridge)

Reminder: File Structure

Already implemented some DGPs, estimators, and a simulation runner
Figured out a basic file structure

project/
├── dgps/
│   ├── __init__.py
│   ├── static.py       # StaticNormalDGP
│   └── dynamic.py      # DynamicNormalDGP
├── estimators/
│   ├── __init__.py
│   ├── ols-like.py     # SimpleOLS, SimpleRidge, LassoWrapper
├── main.py             # Main script that we call from the CLI
├── protocols.py        # DGPProtocol, EstimatorProtocol
└── runner.py           # SimulationRunner

Problem Statement

Reminder: `main.py` with Only One Scenario

main.py

from dgps.dynamic import DynamicNormalDGP
from estimators.ols_like import LassoWrapper
from runner import SimulationRunner

if __name__ == "__main__": 
    dgp = DynamicNormalDGP(beta0=0.0, beta1=0.95)
    estimator = LassoWrapper(reg_param=0.04)
    n_obs = 50

    # Run simulation for specified scenario
    runner = SimulationRunner(dgp, estimator)
    runner.simulate(n_sim=1000, n_obs=n_obs, first_seed=1)

    # Print results
    print(
        f"Bias for {dgp.__class__.__name__} + {estimator.__class__.__name__}: "
    )
    runner.summarize_bias()

Issue: How To Run Many Scenarios?

Key challenge:

How do we run many scenarios automatically?

A problem of orchestration: coordinating multiple tasks

Automatically
As single workflow done in the correct order

Goal: being able to focus on results

Why Not Hardcode?

One way: add all combos (DGPs, estimators, sample sizes) manually in main.py, create SimulationRunner for each one

Not a very good approach

Brittle: have to edit main.py for every change
Prone to errors
Repeats code (e.g. SimulationRunner creation)
Breaks separation of concerns: job of the main script is not to say which scenarios you want today

Questions to Answer Today

How do we capture what a scenario is?
How do we execute all these scenarios?
What do we do with the outputs?

First: just executing simulations and printing results to the console as before
Second: basics of dealing with outputs

Expressing Simulation Scenarios

`SimulationScenario` Class

What’s A Simulation Scenario

“Scenario” — collection of characteristics that uniquely define a setting for SimulationRunner

Our case has three characteristics:

DGP
Estimator
Sample size

How To Encode Scenarios?

Implicitly
Explicitly in an object with suitable info:
- Dictionaries
- Named tuples (through collections.namedtuple() or typing.NamedTuple)
- @dataclasses.dataclass

Generally good practice to be explicit (know if something goes wrong; clearer code)

Reminder About Data Classes

“Data class” — class that’s just a collection of fields with little extra functionality
Here: use @dataclasses.dataclass like in the EPP class (but be aware of other simpler options)

To create: @dataclass and attributes with types

from dataclasses import dataclass

@dataclass(frozen=True)
class SimulationScenario:       # Simple example
    dgp: type[DGPProtocol]
    estimator: type[EstimatorProtocol]
    sample_size: int

`SimulationScenario` Data Class Definition

@dataclass(frozen=True)
class SimulationScenario:
    """A single simulation scenario: DGP, estimator, and sample size."""
    name: str               # For readability  
    dgp: type[DGPProtocol] 
    dgp_params: dict        # E.g. betas go here
    estimator: type[EstimatorProtocol]
    estimator_params: dict  # E.g. reg_params go here
    sample_size: int
    n_simulations: int = 1000
    first_seed: int = 1

Self-documenting
A dataclass comes with __init__, a nice __eq__ and other useful methods practically for free

Example `SimulationScenario`

Can now define example instance:

example_scenario = SimulationScenario(
    name="static_ols_T50",
    dgp=StaticNormalDGP,
    dgp_params={"beta0": 0.0, "beta1": 0.5},
    estimator=SimpleOLS,
    estimator_params={},
    sample_size=50, 
)

Using `SimulationScenario` with `SimulationRunner`

# Initialize the scenario
dgp = example_scenario.dgp(**example_scenario.dgp_params)
estimator = example_scenario.estimator(**example_scenario.estimator_params)

# Run the simulation
runner = SimulationRunner(dgp, estimator)
runner.simulate(
    n_sim=example_scenario.n_simulations, 
    n_obs=example_scenario.sample_size, 
    first_seed=example_scenario.first_seed,
)
# Print results
print(f"Bias for {example_scenario.name}: {runner.errors.mean():.4f}")

Bias for static_ols_T50: -0.0027

example_scenario contains all the information necessary for SimulationRunner

Collections of Scenarios

Two Ways To Create Many Scenarios

Manually: a file that explicitly specifies the desired scenarios
Automatically (e.g. as a Cartesian product of list of DGPs, sample sizes, estimators)

Choice depends on your goal:

Specific ones (beware of missing a desired combination)
All combinations (beware of exponential growth)

Where To Store Scenarios

A couple of options:

In a Python array (e.g. list of scenarios)
In an external config file (e.g. a YAML config)

For now: a Python list coming from a scenarios.py file is fine for us

Manual Example: List of Scenarios

scenarios.py

scenarios = [
    SimulationScenario(
        name="static_ols_T50",
        dgp=StaticNormalDGP,
        dgp_params={"beta0": 0.0, "beta1": 0.5},
        estimator=SimpleOLS,
        estimator_params={},
        sample_size=50, 
    ),
    SimulationScenario(
        name="dynamic_lasso_T200",
        dgp=DynamicNormalDGP,
        dgp_params={"beta0": 0.0, "beta1": 0.95},
        estimator=LassoWrapper,
        estimator_params={"reg_param": 0.1},
        sample_size=200, 
    )
]

Creating All Possible Combinations

Other extreme: all possible combinations of scenario characteristics

Creation steps:

Create lists/sets of DGPs, estimator, sample sizes
Take Cartesian product
Store results in a list

`scenarios.py` With All Possible Combinations:

scenarios.py

from itertools import product

# Define lists of components 
dgps = [
    (StaticNormalDGP, {"beta0": 0.0, "beta1": 1.0}, 'static'),
    (DynamicNormalDGP, {"beta0": 0.0, "beta1": 0.0}, 'dynamic_low_pers'),
    (DynamicNormalDGP, {"beta0": 0.0, "beta1": 0.5}, 'dynamic_mid_pers'),
    (DynamicNormalDGP, {"beta0": 0.0, "beta1": 0.95}, 'dynamic_high_pers'),
]
estimators = [
    (SimpleOLS, {}),
    (LassoWrapper, {"reg_param": 0.1}),
    (SimpleRidge, {"reg_param": 0.1})
]
sample_sizes = [50, 200]

# Generate all combinations
scenarios = [
    SimulationScenario(
        name=f"{dgp_class.__name__.lower()}_{dgp_descr}_{estimator_class.__name__.lower()}_T{size}",
        dgp=dgp_class,
        dgp_params=dgp_params,
        estimator=estimator_class,
        estimator_params=estimator_params,
        sample_size=size, 
    )
    for (dgp_class, dgp_params, dgp_descr), (estimator_class, estimator_params), size
    in product(dgps, estimators, sample_sizes)
]

Our Choice

Here: choose automatic approach

len(scenarios)

Would be annoying to write all these by hand

Note:

In reality often some hybrid: manually create set of pairs (DGP, estimator), take product with some sizes (not all possible DGP-estimator pairs)

Resulting File Structure

Now have added a new scenarios.py file to our folder:

project/
├── dgps/
│   ├── __init__.py
│   ├── static.py
│   └── dynamic.py       
├── estimators/
│   ├── __init__.py
│   └── ols-like.py       
├── protocols.py
├── runner.py
├── scenarios.py       # New: Defines SimulationScenario and scenarios list
└── main.py

Running Many Scenarios. `SimulationOrchestrator` Class

What’s Left?

So far:

All the simulation infrastructure (runner, DGPs, estimators)
Scenario list

Goal: want all scenarios executed when we run

python main.py

What Should It Do?

A simple simulation orchestrator:

Should ingest list of scenarios
Run all the scenarios:
- For each scenario, create a SimulationRunner
- simulate()
Do something with the results

More advanced: can parallelize/distribute computation, etc.

How Should Our `main.py` Look Like?

main.py

from orchestrator import SimulationOrchestrator 
1from scenarios import scenarios

if __name__ == "__main__":
    # Create and execute simulations 
2    orchestrator = SimulationOrchestrator(scenarios)
    orchestrator.run_all()

    # Results logic
3    ...

1: Get scenarios
2: Run them all
3: Do something with the results

Simple `SimulationOrchestrator` Class Definition

orchestrator.py

class SimulationOrchestrator:
    """Simple simulation orchestration class without any result handling
    """
    def __init__(self, scenarios: list[SimulationScenario]):
        self.scenarios = scenarios 

    def run_all(self):
        for scenario in scenarios: 
            # Create DGP and estimator
            dgp = scenario.dgp(**scenario.dgp_params)
            estimator = scenario.estimator(**scenario.estimator_params)

            # Run the simulation
            runner = SimulationRunner(dgp, estimator)
            runner.simulate(
                n_sim=scenario.n_simulations, 
                n_obs=scenario.sample_size, 
                first_seed=scenario.first_seed,
            )
            # Print results
            print(f"Bias for {scenario.name}: {runner.errors.mean():.4f}")

Executing `main.py`

Executing the script now prints the results for all scenarios!

python main.py

Bias for staticnormaldgp_static_simpleols_T50: -0.0027
Bias for staticnormaldgp_static_simpleols_T200: -0.0028
Bias for staticnormaldgp_static_lassowrapper_T50: -0.1108
Bias for staticnormaldgp_static_lassowrapper_T200: -0.1048
Bias for staticnormaldgp_static_simpleridge_T50: -0.0048
Bias for staticnormaldgp_static_simpleridge_T200: -0.0033
Bias for dynamicnormaldgp_low_pers_simpleols_T50: -0.0261
Bias for dynamicnormaldgp_low_pers_simpleols_T200: -0.0029
Bias for dynamicnormaldgp_low_pers_lassowrapper_T50: -0.0655
Bias for dynamicnormaldgp_low_pers_lassowrapper_T200: -0.0715
Bias for dynamicnormaldgp_low_pers_simpleridge_T50: -0.0262
Bias for dynamicnormaldgp_low_pers_simpleridge_T200: -0.0029
Bias for dynamicnormaldgp_mid_pers_simpleols_T50: -0.0509
Bias for dynamicnormaldgp_mid_pers_simpleols_T200: -0.0103
Bias for dynamicnormaldgp_mid_pers_lassowrapper_T50: -0.1374
Bias for dynamicnormaldgp_mid_pers_lassowrapper_T200: -0.0880
Bias for dynamicnormaldgp_mid_pers_simpleridge_T50: -0.0515
Bias for dynamicnormaldgp_mid_pers_simpleridge_T200: -0.0105
Bias for dynamicnormaldgp_high_pers_simpleols_T50: -0.0992
Bias for dynamicnormaldgp_high_pers_simpleols_T200: -0.0204
Bias for dynamicnormaldgp_high_pers_lassowrapper_T50: -0.1314
Bias for dynamicnormaldgp_high_pers_lassowrapper_T200: -0.0347
Bias for dynamicnormaldgp_high_pers_simpleridge_T50: -0.0993
Bias for dynamicnormaldgp_high_pers_simpleridge_T200: -0.0204

Discussion

A total victory:

Clean, well-focused files and implementation
Automatic collection and construction of scenarios
Full execution of all simulations

Can talk about further improvements, handling results, but we have an extensible and broadly-applicable core

Simulation Outputs

Here: Simple Example

Here: just brief example

Store summary bias results on the orchestrator
Somehow export them from main.py

Updated `SimulationOrchestrator` Class

class SimulationOrchestrator:
    """Simulation orchestrator that stores results in a dictionary
    """
    def __init__(self, scenarios: list[SimulationScenario]):
        self.scenarios = scenarios
        self.summary_results = {}

    def run_all(self):
        for scenario in scenarios: 
            # Create DGP and estimator
            dgp = scenario.dgp(**scenario.dgp_params)
            estimator = scenario.estimator(**scenario.estimator_params)

            # Run the simulation
            runner = SimulationRunner(dgp, estimator)
            runner.simulate(
                n_sim=scenario.n_simulations, 
                n_obs=scenario.sample_size, 
                first_seed=scenario.first_seed,
            )
            # Save results
            self.summary_results[scenario.name] = runner.errors.mean()

Results Handling Discussion

Here: took a quicker solution: the SimulationOrchestrator implementation knows that its handling bias
But can make more loosely coupled
- Add a summarize() method to SimulationRunner that knows what to export
- orchestrator will just receive whatever summarize() gives
- Would make orchestrator even more reusable

Changing `main.py`

main.py

import pandas as pd

from orchestrator import SimulationOrchestrator 
from scenarios import scenarios                                             

if __name__ == "__main__":
    # Create and execute simulations 
    orchestrator = SimulationOrchestrator(scenarios)                         
    orchestrator.run_all()

    # Results logic (print or export as pd.Series)
    print(pd.Series(orchestrator.summary_results))

Here for simplicity: print the Series, but would generally to_csv()

Executing Results

Executing the script now prints the results for all scenarios!

python main.py

staticnormaldgp_static_simpleols_T50           -0.002680
staticnormaldgp_static_simpleols_T200          -0.002753
staticnormaldgp_static_lassowrapper_T50        -0.110823
staticnormaldgp_static_lassowrapper_T200       -0.104789
staticnormaldgp_static_simpleridge_T50         -0.004828
staticnormaldgp_static_simpleridge_T200        -0.003261
dynamicnormaldgp_low_pers_simpleols_T50        -0.026099
dynamicnormaldgp_low_pers_simpleols_T200       -0.002854
dynamicnormaldgp_low_pers_lassowrapper_T50     -0.065491
dynamicnormaldgp_low_pers_lassowrapper_T200    -0.071531
dynamicnormaldgp_low_pers_simpleridge_T50      -0.026203
dynamicnormaldgp_low_pers_simpleridge_T200     -0.002900
dynamicnormaldgp_mid_pers_simpleols_T50        -0.050864
dynamicnormaldgp_mid_pers_simpleols_T200       -0.010286
dynamicnormaldgp_mid_pers_lassowrapper_T50     -0.137379
dynamicnormaldgp_mid_pers_lassowrapper_T200    -0.088010
dynamicnormaldgp_mid_pers_simpleridge_T50      -0.051537
dynamicnormaldgp_mid_pers_simpleridge_T200     -0.010470
dynamicnormaldgp_high_pers_simpleols_T50       -0.099214
dynamicnormaldgp_high_pers_simpleols_T200      -0.020369
dynamicnormaldgp_high_pers_lassowrapper_T50    -0.131433
dynamicnormaldgp_high_pers_lassowrapper_T200   -0.034744
dynamicnormaldgp_high_pers_simpleridge_T50     -0.099312
dynamicnormaldgp_high_pers_simpleridge_T200    -0.020425
dtype: float64

Recap and Conclusions

Recap

In this lecture we

Discussed how to specify a simulation scenario
Talked about how to construct a list of many scenarios
Implemented a simple orchestrator that pulls together scenarios and executes them

Further Improvements

Can keep adding things to code:

Logging and progress tracking
Improve robustness of code by adding error handling
Parallelize to take advantage
Custom output handler classes

Project could also benefit from more reproducibility:

Getting the right environment for reproducibility?
Not having to rerun all the simulations every time?

Block Recap

This block: structuring and thinking about simulation code

Overall design:

Starting simple with functions
Going modular for more complex scenarios

Quality of life features:

Scenario builders
Orchestrator

References

Lau, Sam. 2023. Learning Data Science. 1st ed. Sebastopol: O’Reilly Media, Incorporated.

Lutz, Mark. 2025. Learning Python: Powerful Object-Oriented Programming. Sixth edition. Santa Rosa, CA: O’Reilly.

Ramalho, Luciano. 2022. Fluent Python: Clear, Concise, and Effective Programming. 2nd edition. Sebastopol, California: O’Reilly Media, Inc.

Good Simulation Code IV: Orchestration

Introduction

Lecture Info

Learning Outcomes

References

Reminder: Previous Simulation Setting

Reminder: Study Bias of Penalized SSR Estimators

Reminder: File Structure

Problem Statement

Reminder: main.py with Only One Scenario

Issue: How To Run Many Scenarios?

Why Not Hardcode?

Questions to Answer Today

Expressing Simulation Scenarios

SimulationScenario Class

What’s A Simulation Scenario

How To Encode Scenarios?

Reminder About Data Classes

SimulationScenario Data Class Definition

Example SimulationScenario

Using SimulationScenario with SimulationRunner

Collections of Scenarios

Two Ways To Create Many Scenarios

Where To Store Scenarios

Manual Example: List of Scenarios

Creating All Possible Combinations

scenarios.py With All Possible Combinations:

Our Choice

Resulting File Structure

Running Many Scenarios. SimulationOrchestrator Class

What’s Left?

What Should It Do?

How Should Our main.py Look Like?

Simple SimulationOrchestrator Class Definition

Executing main.py

Discussion

Simulation Outputs

More On Handling Results

Here: Simple Example

Updated SimulationOrchestrator Class

Results Handling Discussion

Changing main.py

Executing Results

Recap and Conclusions

Recap

Further Improvements

Block Recap

References

Reminder: `main.py` with Only One Scenario

`SimulationScenario` Class

`SimulationScenario` Data Class Definition

Example `SimulationScenario`

Using `SimulationScenario` with `SimulationRunner`

`scenarios.py` With All Possible Combinations:

Running Many Scenarios. `SimulationOrchestrator` Class

How Should Our `main.py` Look Like?

Simple `SimulationOrchestrator` Class Definition

Executing `main.py`

Updated `SimulationOrchestrator` Class

Changing `main.py`