Course Introduction

Fundamentals of Simulations in Data Science (Part of Research Module in Econometrics and Statistics)

Vladislav Morozov

Content and Motivation

About Me

Instructor: JProf. Dr. Vladislav Morozov

  • Institute of Finance and Statistics
  • Email:

    morozov@uni-bonn.de

  • I work on practical statistical methods with a focus on unobserved heterogeneity

This Course

Example Questions

Evaluating and using new statistical methods involves questions like

  • How well does a new causal inference method estimate the effect of interest?
  • Does an ML algorithm generalize well?
  • Is hypothesis test A more powerful than test B?
  • What is the real coverage of a given nominally 95% confidence interval?

Limits of Theory

Sometimes can answer theoretically using some combination of

  • Non-asymptotic analysis
  • Parametric assumptions


But theory often not satisfactory (loose bounds, restrictive assumptions, intractable math)

Place of Simulations

Simulations offer the next best thing: can check every aspect of performance in controlled “lab” settings

  • Using synthetic data: can control every part of data generating process
  • Synthetic data \(\Rightarrow\) full knowledge of target quantities, unlike with real data — esp. critical for causal methods

If data-generating process is chosen “well”, informative about performance on real data

Course Contents

Course covers essentials you need to write your own simulations:

  • Metrics for evaluating estimators (causal and predictive), tests, confidence sets
  • Structuring code
  • Selecting DGPs

After end of lectures: putting things in practice with a project

This Course and the Research Module

This course — part of Research Module in Econometrics and Statistics

  • Preparation for writing a successful master’s thesis in DS-adjacent topics (and beyond)
  • At Bonn Econ a “typical” MSc thesis in DS involves simulations: critically evaluating an existing method (or a new method for more advanced theses)

Need to know how to write simulations

Course Logistics

Organization and Evaluations

Course Format

Two parts of class:

  • Lectures on simulations: towards the beginning of the term
  • Project development: you work on the projects and we have group-level meetings


Active questions encouraged! Also feel free to approach me after/before class or use office hours

Meeting Times

Class times:

  • Wednesdays 14:00-16:00, Room 0.042
  • Fridays 08:30-10:00, Room 0.042


Any modifications will be announced on eCampus

Evaluations

Course grade is based on project

  • Project: creating, running and evaluating simulations for a given statistical method
  • Groups of \(\leq 3\) people
  • Submittable materials: term paper, public presentation, simulation code
  • See course web for more details

Materials

On Simulations

  • Course designed to be self-contained regarding simulations
  • In general, very few books discuss the topic
  • One exception: Dormann and Ellison (2025)

Books on Core Python and Git

Lutz (2025)

Lau (2023)

Skoulikari (2023)

Books For Project Inspiration

Wager (2024)

Chernozhukov et al. (2024)

Gaillac and L’Hour (2025)

Resources for Writing and Presenting

Course project involves writing and presenting the results

Check out Nikolov (2022)

Alley (2013)

Schimel (2012)

References

Alley, Michael. 2013. The Craft of Scientific Presentations: Critical Steps to Succeeed and Critical Errors to Avoid. 2. ed. New York, NY: Springer.
Chernozhukov, Victor, Christian Hansen, Nathan Kallus, Martin Spindler, and Vasilis Syrgkanis. 2024. “Applied Causal Inference Powered by ML and AI.” arXiv. https://doi.org/10.48550/arXiv.2403.02467.
Dormann, Carsten F., and Aaron M. Ellison. 2025. Statistics by Simulation: A Synthetic Data Approach. Princeton Oxford: Princeton University Press.
Gaillac, Christophe, and Jeremy L’Hour. 2025. Machine Learning for Econometrics. Oxford University Press: Oxford.
Lau, Sam. 2023. Learning Data Science. 1st ed. Sebastopol: O’Reilly Media, Incorporated.
Lutz, Mark. 2025. Learning Python: Powerful Object-Oriented Programming. Sixth edition. Santa Rosa, CA: O’Reilly.
Nikolov, Plamen. 2022. “Writing Tips for Economics Research Papers: 2021-2022 Edition.” SSRN Electronic Journal. https://doi.org/10.2139/ssrn.4114601.
Schimel, Joshua, ed. 2012. Writing Science: How to Write Papers That Get Cited and Proposals That Get Funded. Oxford New York: Oxford University Press.
Skoulikari, Anna. 2023. Learning Git: A Hands-on and Visual Guide to the Basics of Git. First edition. Beijing Boston Farnham Sebastopol Tokyo: O’Reilly.
Wager, Stefan. 2024. Causal Inference: A Statistical Learning Approach.