Statistical and Machine Learning

Introduction to Prediction. Learning Scenarios

Vladislav Morozov

Introduction

Lecture Info

Learning Outcomes

This lecture is an introduction to prediction

By the end, you should be able to

Define statistical and machine learning
Contrast SL/ML with causal inference
Describe and classify learning scenarios in prediction

References

Chapter 1-2 in James et al. (2023)
Or chapter 1 in Mohri, Rostamizadeh, and Talwalkar (2018) (more examples, shorter)
Or chapter 1 in Géron (2023) (more examples, longer)

Statistical and Machine Learning

Definitions

The Two Faces of Statistics

Can split statistics into two blocks based on overall goal:

Causal inference: answering counterfactual question regarding causal effects of interventions
Predictions: finding predictive functions of data without explaining mechanisms

Statistical and Machine Learning

Prediction studied by statistical and machine learning.

Personal definition:

Definition 1

Statistical learning is a branch of statistics studying prediction problems.
Machine learning is cross-disciplinary subfield of statistics and computer science studying prediction problems.

Goals in Prediction

Key goal of prediction — predicting well

Other goals:

Computational efficiency: better to have a cheaper and quicker way to produce a new prediction
Interpretability: why does the algorithm predict what it does?
Scalability: can it handle increasing loads, run in distributed manner, etc

Statistical/Machine Learning vs. Causal Inference

SL vs. Causal Inference I

Key goal in causal inference — correct identification

Causal settings: there is some true causal model. Trying to learn some of its features with identification arguments

SL/ML: only weak reference to the underlying “true” model. Generally no identification work

SL vs. Causal Inference II

For SL/ML the key metric is how well you predict with unseen data — generalization or risk (next lecture)

Want prediction to work well under many possible data-generating distributions
Terminology of algorithms, not models
Roughly: not trying to model

SL vs. Causal Inference III

Are the two fields totally disjoint?

No:

Causal use case: need to pre-estimate some complicated object before being able to estimate object of interest
Most important — more precise pre-estimates — exactly the typical goal of SL/ML

See Chernozhukov et al. (2024)

References on SL/ML

Books: SL Theory

drawing

Hastie, Tibshirani, and Friedman (2009)

drawing

Shalev-Shwartz and Ben-David (2014)

drawing

Mohri, Rostamizadeh, and Talwalkar (2018)

Books: Practice of “Classic” ML

Books on “core” machine learning methods in practice, mainly with scikit-learn

Géron (2023)

James et al. (2023)

Books on Deep Learning

Deep learning much better covered in newer and more specialized books. Not that much statistical theory — many recent advances empirical

drawing

Bishop and Bishop (2024)

drawing

Antiga, Stevens, and Viehmann (2020)

Books On SL/ML For Causal Settings

A couple of references on ML techniques specifically in causal settings

Chernozhukov et al. (2024)

Gaillac and L’Hour (2025)

Learning Scenarios

Introduction to Learning Scenarios

SL/ML not monolithic: there are different learning scenarios based on

Domain of application: what problem are you solving?
Nature and form of training data
- Do you have a \(Y\) at all? (supervised, unsupervised, semi-supervised learning, …)
- What does \(Y\) look like? (continuous — regression, discrete — classification, ranked list — ranking, …)
How the data arrives

Classification: By Domain

What kind of problems can you solve?

Domain of Application	Examples
Forecasting	Estimating the GDP in the current quarter
Causal inference	Preestimating “first stage”/nuisance parameters
Text or document classification	Assigning topics, determining whether contents are inappropriate, spam detection
NLP	Part-of-speech tagging, named-entity recognition, context-free parsing, text summarization, chatbots
Speech processing	Speech recognition, speech synthesis, speaker verification and identification
Computer vision	Object recognition and identification, face detection, content-based image retrieval, optical character recognition, image segmentation
Anomaly detection	Detecting credit card fraud
Clustering	Segmenting clients into blocks and offering different marketing strategies
Data visualization	Using dimensionality reduction
Recommender systems	Suggesting next product to buy given purchase history

All these things also done with people with econ backgrounds

Classification: By Type of Output Variable I

Supervised settings: some observed output \(Y\):

Task	Type of Variable	Examples
Classification	Categorical	Document classification
Regression	Continuous	Nowcasting the GDP
Ranking	Ordinal	Selecting the order of results in a search

Classification: By Type of Output Variable II

Unsupervised settings: no obvious observed \(Y\):

Task	Type of Variable	Examples
Clustering	Categorical	Identifying communities in a large social network
Dimensionality reduction/manifold learning	Continuous	Preprocessing digital images in computer vision tasks

Classification: By Supervision

Supervised: all observations (“examples”) have \(Y\) available (“labels”)
Unsupervised: no examples have labels
Semi-supervised: some examples have labels

More concepts: self-supervised, active learning, reinforcement learning, etc.

Online vs. Batch Learning

Another axis: if new observations arrive, how to update model?

Retrain from scratch on bigger dataset — batch learning
Update existing model parameters only with new observations — online learning

Recap and Conclusions

Recap

In this lecture we:

Talked about causal inference vs. prediction
Defined statistical and machine learning
Described various learning scenarios

Next Questions

What are the key components of a learning problem for predictions?
How does one evaluate predictions?
Is there a universally valid way to predict?

References

Antiga, Luca Pietro Giovanni, Eli Stevens, and Thomas Viehmann. 2020. Deep Learning with PyTorch. Shelter Island, NY: Manning.

Bishop, Christopher M., and Hugh Bishop. 2024. Deep Learning: Foundations and Concepts. Cham: Springer International Publishing. https://doi.org/10.1007/978-3-031-45468-4.

Chernozhukov, Victor, Christian Hansen, Nathan Kallus, Martin Spindler, and Vasilis Syrgkanis. 2024. “Applied Causal Inference Powered by ML and AI.” arXiv. https://doi.org/10.48550/arXiv.2403.02467.

Gaillac, Christophe, and Jeremy L’Hour. 2025. Machine Learning for Econometrics. Oxford University Press: Oxford.

Géron, Aurélien. 2023. Hands-on Machine Learning With Scikit-Learn, Keras, and TensorFlow: Concepts, Tools, and Techniques to Build Intelligent Systems. Third edition. Data Science / Machine Learning. Beijing Boston Farnham Sebastopol Tokyo: O’Reilly.

Hastie, Trevor, Robert Tibshirani, and J. H. Friedman. 2009. The Elements of Statistical Learning: Data Mining, Inference, and Prediction. 2nd ed. Springer Series in Statistics. New York, NY: Springer.

James, Gareth, Daniela Witten, Trevor Hastie, Robert Tibshirani, and Jonathan E. Taylor. 2023. An Introduction to Statistical Learning: With Applications in Python. Springer Texts in Statistics. Cham: Springer.

Mohri, Mehryar, Afshin Rostamizadeh, and Ameet Talwalkar. 2018. Foundations of Machine Learning. The MIT Press. https://doi.org/10.5555/3360093.

Shalev-Shwartz, Shai, and Shai Ben-David. 2014. Understanding Machine Learning. 1st ed. West Nyack: Cambridge University Press.