Introduction to Prediction. Learning Scenarios
This lecture is an introduction to prediction
By the end, you should be able to
Can split statistics into two blocks based on overall goal:
Prediction studied by statistical and machine learning.
Personal definition:
Definition 1
Both develop algorithms that can learn from data and generalize to unseen data
Key goal of prediction — predicting well
Other goals:
Key goal in causal inference — correct identification
Causal settings: there is some true causal model. Trying to learn some of its features with identification arguments
SL/ML: only weak reference to the underlying “true” model. Generally no identification work
For SL/ML the key metric is how well you predict with unseen data — generalization or risk (next lecture)
Are the two fields totally disjoint?
No:
See Chernozhukov et al. (2024)

Hastie, Tibshirani, and Friedman (2009)

Shalev-Shwartz and Ben-David (2014)

Mohri, Rostamizadeh, and Talwalkar (2018)
Books on “core” machine learning methods in practice, mainly with scikit-learn

Géron (2023)

James et al. (2023)
Possibly skip TensorFlow in Géron (2023) in favor of PyTorch
SL/ML not monolithic: there are different learning scenarios based on
What kind of problems can you solve?
| Domain of Application | Examples |
|---|---|
| Forecasting | Estimating the GDP in the current quarter |
| Causal inference | Preestimating “first stage”/nuisance parameters |
| Text or document classification | Assigning topics, determining whether contents are inappropriate, spam detection |
| NLP | Part-of-speech tagging, named-entity recognition, context-free parsing, text summarization, chatbots |
| Speech processing | Speech recognition, speech synthesis, speaker verification and identification |
| Computer vision | Object recognition and identification, face detection, content-based image retrieval, optical character recognition, image segmentation |
| Anomaly detection | Detecting credit card fraud |
| Clustering | Segmenting clients into blocks and offering different marketing strategies |
| Data visualization | Using dimensionality reduction |
| Recommender systems | Suggesting next product to buy given purchase history |
All these things also done with people with econ backgrounds
Supervised settings: some observed output \(Y\):
| Task | Type of Variable | Examples |
|---|---|---|
| Classification | Categorical | Document classification |
| Regression | Continuous | Nowcasting the GDP |
| Ranking | Ordinal | Selecting the order of results in a search |
Unsupervised settings: no obvious observed \(Y\):
| Task | Type of Variable | Examples |
|---|---|---|
| Clustering | Categorical | Identifying communities in a large social network |
| Dimensionality reduction/manifold learning | Continuous | Preprocessing digital images in computer vision tasks |
More concepts: self-supervised, active learning, reinforcement learning, etc.
Another axis: if new observations arrive, how to update model?
In this lecture we:
Prediction: Introduction