Statistical and Machine Learning

Introduction to Prediction. Learning Scenarios

Vladislav Morozov

Introduction

Lecture Info

Learning Outcomes

This lecture is an introduction to prediction


By the end, you should be able to

  • Define statistical and machine learning
  • Contrast SL/ML with causal inference
  • Describe and classify learning scenarios in prediction

References


  • Chapter 1-2 in James et al. (2023)
  • Or chapter 1 in Mohri, Rostamizadeh, and Talwalkar (2018) (more examples, shorter)
  • Or chapter 1 in Géron (2023) (more examples, longer)

Statistical and Machine Learning

Definitions

The Two Faces of Statistics


Can split statistics into two blocks based on overall goal:

  1. Causal inference: answering counterfactual question regarding causal effects of interventions
  2. Predictions: finding predictive functions of data without explaining mechanisms

Statistical and Machine Learning

Prediction studied by statistical and machine learning.

Personal definition:

Definition 1  

  • Statistical learning is a branch of statistics studying prediction problems.
  • Machine learning is cross-disciplinary subfield of statistics and computer science studying prediction problems.

Goals in Prediction

Key goal of prediction — predicting well

Other goals:

  • Computational efficiency: better to have a cheaper and quicker way to produce a new prediction
  • Interpretability: why does the algorithm predict what it does?
  • Scalability: can it handle increasing loads, run in distributed manner, etc

Statistical/Machine Learning vs. Causal Inference

SL vs. Causal Inference I

Key goal in causal inference — correct identification


Causal settings: there is some true causal model. Trying to learn some of its features with identification arguments

SL/ML: only weak reference to the underlying “true” model. Generally no identification work

SL vs. Causal Inference II

For SL/ML the key metric is how well you predict with unseen data — generalization or risk (next lecture)


  • Want prediction to work well under many possible data-generating distributions
  • Terminology of algorithms, not models
  • Roughly: not trying to model

SL vs. Causal Inference III

Are the two fields totally disjoint?


No:

  • Causal use case: need to pre-estimate some complicated object before being able to estimate object of interest
  • Most important — more precise pre-estimates — exactly the typical goal of SL/ML

See Chernozhukov et al. (2024)

References on SL/ML

Books: SL Theory

drawing

Hastie, Tibshirani, and Friedman (2009)

drawing

Shalev-Shwartz and Ben-David (2014)

drawing

Mohri, Rostamizadeh, and Talwalkar (2018)

Books: Practice of “Classic” ML

Books on “core” machine learning methods in practice, mainly with scikit-learn

Géron (2023)

James et al. (2023)

Books on Deep Learning

Deep learning much better covered in newer and more specialized books. Not that much statistical theory — many recent advances empirical

drawing

Bishop and Bishop (2024)

drawing

Antiga, Stevens, and Viehmann (2020)

Books On SL/ML For Causal Settings

A couple of references on ML techniques specifically in causal settings

Chernozhukov et al. (2024)

Gaillac and L’Hour (2025)

Learning Scenarios

Introduction to Learning Scenarios

SL/ML not monolithic: there are different learning scenarios based on

  • Domain of application: what problem are you solving?
  • Nature and form of training data
    • Do you have a \(Y\) at all? (supervised, unsupervised, semi-supervised learning, …)
    • What does \(Y\) look like? (continuous — regression, discrete — classification, ranked list — ranking, …)
  • How the data arrives

Classification: By Domain

What kind of problems can you solve?

Domain of Application Examples
Forecasting Estimating the GDP in the current quarter
Causal inference Preestimating “first stage”/nuisance parameters
Text or document classification Assigning topics, determining whether contents are inappropriate, spam detection
NLP Part-of-speech tagging, named-entity recognition, context-free parsing, text summarization, chatbots
Speech processing Speech recognition, speech synthesis, speaker verification and identification
Computer vision Object recognition and identification, face detection, content-based image retrieval, optical character recognition, image segmentation
Anomaly detection Detecting credit card fraud
Clustering Segmenting clients into blocks and offering different marketing strategies
Data visualization Using dimensionality reduction
Recommender systems Suggesting next product to buy given purchase history

All these things also done with people with econ backgrounds

Classification: By Type of Output Variable I

Supervised settings: some observed output \(Y\):


Task Type of Variable Examples
Classification Categorical Document classification
Regression Continuous Nowcasting the GDP
Ranking Ordinal Selecting the order of results in a search

Classification: By Type of Output Variable II

Unsupervised settings: no obvious observed \(Y\):

Task Type of Variable Examples
Clustering Categorical Identifying communities in a large social network
Dimensionality reduction/manifold learning Continuous Preprocessing digital images in computer vision tasks

Classification: By Supervision


  • Supervised: all observations (“examples”) have \(Y\) available (“labels”)
  • Unsupervised: no examples have labels
  • Semi-supervised: some examples have labels

More concepts: self-supervised, active learning, reinforcement learning, etc.

Online vs. Batch Learning

Another axis: if new observations arrive, how to update model?


  • Retrain from scratch on bigger dataset — batch learning
  • Update existing model parameters only with new observations — online learning

Recap and Conclusions

Recap


In this lecture we:

  1. Talked about causal inference vs. prediction
  2. Defined statistical and machine learning
  3. Described various learning scenarios

Next Questions


  • What are the key components of a learning problem for predictions?
  • How does one evaluate predictions?
  • Is there a universally valid way to predict?

References

Antiga, Luca Pietro Giovanni, Eli Stevens, and Thomas Viehmann. 2020. Deep Learning with PyTorch. Shelter Island, NY: Manning.
Bishop, Christopher M., and Hugh Bishop. 2024. Deep Learning: Foundations and Concepts. Cham: Springer International Publishing. https://doi.org/10.1007/978-3-031-45468-4.
Chernozhukov, Victor, Christian Hansen, Nathan Kallus, Martin Spindler, and Vasilis Syrgkanis. 2024. “Applied Causal Inference Powered by ML and AI.” arXiv. https://doi.org/10.48550/arXiv.2403.02467.
Gaillac, Christophe, and Jeremy L’Hour. 2025. Machine Learning for Econometrics. Oxford University Press: Oxford.
Géron, Aurélien. 2023. Hands-on Machine Learning With Scikit-Learn, Keras, and TensorFlow: Concepts, Tools, and Techniques to Build Intelligent Systems. Third edition. Data Science / Machine Learning. Beijing Boston Farnham Sebastopol Tokyo: O’Reilly.
Hastie, Trevor, Robert Tibshirani, and J. H. Friedman. 2009. The Elements of Statistical Learning: Data Mining, Inference, and Prediction. 2nd ed. Springer Series in Statistics. New York, NY: Springer.
James, Gareth, Daniela Witten, Trevor Hastie, Robert Tibshirani, and Jonathan E. Taylor. 2023. An Introduction to Statistical Learning: With Applications in Python. Springer Texts in Statistics. Cham: Springer.
Mohri, Mehryar, Afshin Rostamizadeh, and Ameet Talwalkar. 2018. Foundations of Machine Learning. The MIT Press. https://doi.org/10.5555/3360093.
Shalev-Shwartz, Shai, and Shai Ben-David. 2014. Understanding Machine Learning. 1st ed. West Nyack: Cambridge University Press.