The Three Parts of Statistics
This lecture is about the basics of identification, estimation, and inference
By the end, you should be able to
Goal of all of statistics:
“Say something” about a “parameter” of interest “based on data”
Which “parameter”? What is “something”? How much “data”?
Which parameter you want depends on the context:
What about the “something”?
Three example questions:
All possible questions can be split into two classes of work:
Both equally important in causal settings. Identification less/not important in prediction
Focus first on identification
Let \(\theta\) be the “parameter of interest” — something unknown you care about, e.g.
Suppose that our data are observations on \((X, Y)\)
How to express the idea of having “infinite data”?
Infinite data = knowing the joint distribution function \(F_{X, Y}\)
Path from parameter \(\theta\) to restrictions/implications on the data distribution
The model specifies parts of the data generating mechanism:
Identification basically asks:
Given the
can \(\theta_0\) be uniquely determined?
Sometimes called point identification
May sound a bit vague
To make idea simpler, a special parametric case
Consider a simple example:
Implication of the model:
Let’s try our definition of identification:
Therefore, it must be that \[ \theta_0 = \E[Y_i] \] \(\theta_0\) uniquely determined as the above function of the distribution of the data
\(\theta_0\) is identified if for any \(\theta\neq\theta_0\) it holds that \[ F_Y(y|\theta) \neq F_Y(y|\theta_0) \]
In words: different \(\theta\) give different distributions of observed data
Second definition useful for showing non-identification
An example: suppose that \(Y_i \sim N(\abs{\theta_0}, 1)\):
Different \(\theta\) give the same distribution = \(\theta_0\) not identified if \(\theta_0\neq 0\)
Previous example — a bit simplistic
Let’s try a more useful case — a linear causal model
Need a causal framework to talk about causal effects!
Work in the familiar potential outcomes framework:
Together potential outcomes form a family \(\curl{Y^{\bx}_i}_{\bx}\)
What we see: realized values of \((Y_i, \bX_i)\). The realized outcomes are determined as \[ Y_i = Y^{\bX_i}_i \]
All other potential outcomes remain counterfactual
In this class we will assume:
Potential outcomes of unit \(i\) depend only on the treatment of unit \(i\)
Called the stable unit treatment value assumption (SUTVA) — no interference, no general equilibrium effects, etc.
Model: \[ Y^{\bx}_i = \bx'\bbeta + U_i \] Note: \(U_i\) does not depend on \(\bx\)
Causal effect of changing unit \(i\) from \(\bx_1\) to \(\bx_2\) given by \((\bx_1-\bx_2)'\bbeta\). Thus:
Sufficient to learn \(\bbeta\)
Our assumptions do not fully specify
To identify those, we need \(F_{\bX, Y}\) and (for distribution of \(U_i\)) also \(\bbeta\)
Proposition 1 Let
Then \(\bbeta\) is identified as \[ \bbeta = \E[\bX_i\bX_i']^{-1}\E[\bX_iY_i] \]
Proof by considering \(\E[\bX_iY_i]\)
Two key assumptions:
Together:
Identification — fundamentally theoretical exercise, always rests on assumptions
Some other approaches:
Let \(\theta\) be a parameter of interest and \((X_1, \dots, X_N)\) be the available data — the sample
Definition 1 Let \(\theta\) belong to some space \(\S\). An estimator \(\hat{\theta}_N\) is a function from \((X_1, \dots, X_N)\) to \(\S\): \[ \hat{\theta}_N = g(X_1, \dots, X_n) \]
Inference is about answering questions about the population based on the finite sample
Example questions:
Relevant both in causal and predictive settings
In this lecture we
Identification, Estimation, and Inference