1 Introduction: Causal Inference with Unobserved Heterogeneity
1.1 The Problem of Unobserved Heterogeneity
1.1.1 Introduction
In economics, business, and the social sciences, heterogeneity is the rule. No two individuals, firms, or countries are identical, even when they appear statistically similar in observed data. A worker with the same years of education as another may earn a different wage due to unobserved ability. Two firms in the same industry may respond differently to a policy shock because of latent managerial practices. A country’s growth trajectory may diverge from its peers’ due to unmeasured institutions or cultural norms. These unobserved differences — preferences, productivity, measurement errors, or idiosyncratic shocks — determine how agents respond to treatments, policies, or market conditions.
Our objective is to estimate causal parameters: treatment effects (or their moments and distributions), policy impacts, or other structural quantities that describe how outcomes would change under interventions. We focus on counterfactuals, rather than associational patterns.
In observational data, however, this task is complicated by unobserved heterogeneity. Treatments (broadly defined) are endogenously selected based on unobserved factors, creating confounding. Naive comparisons fail to isolate causal effects because the treatment-outcome relationship is confounded by unobserved determinants. The challenge of causal inference with unobserved heterogeneity is central to empirical research and the focus of this course.
1.1.2 Broad Setting
Formally, suppose we observe a dataset \(\lbrace(Y_i, X_i)\rbrace_{i=1}^N\) (cross-section) or \(\{(Y_{it}, X_{it})\}_{i=1}^N\) (panel), where \(Y_i\) is the realized outcome for unit \(i\) and \(X_i\) includes observed covariates (e.g., treatments, controls).
At all points in the course, we adopt the following potential outcomes framework. The outcome of unit \(i\) (in period \(t\)) under “treatment” \(x\) is determined as
\[ \begin{aligned} Y_i^x & = \phi(x, A_i), && \text{(cross-section)}\\ Y_{it}^x & = \phi(x, A_i, U_{it}), && \text{(panel)} \end{aligned} \tag{1.1}\]
where:
- \(\phi(\cdot)\) is an unknown structural function.
- \(A_i\) is time-invariant unobserved heterogeneity.
- \(U_{it}\) is time-varying unobserved heterogeneity.
Our goal is causal: to infer some feature of \(\phi(\cdot)\), such as average treatment or marginal effects, the variance or distribution of such effects, or distributional features of potential outcomes.
A key challenge to our work is the presence of \((A_i, U_{it})\) which are never observed but systematically influence both treatments and outcomes.
1.1.3 Types of Unobserved Heterogeneity
What are the \((A_i, U_{it})\)? Unobserved heterogeneity arises from three (overlapping) primary sources:
- Omitted Confounders:
- Latent variables correlated with both \(X_i\) and \(Y_i\) .
- Example: Estimating the returns to education \(X_i\) when unobserved ability \(A_i\) affects both education and wages \(Y_i\).
- Heterogeneous Treatment Effects:
- The impact of \(X_i\) varies across units due to \((A_i, U_{it})\), meaning effects are individual-specific.
- Example: The benefits \(A_i\) of job training program \(X_i\) may be larger for some units than for others.
- Measurement Error:
- Observed covariates \(X_i\) are noisy proxies for true values.
- Example: Using self-reported income to proxy true earnings.
1.1.4 Consequences of Ignoring Unobserved Heterogeneity
In randomized experiments, unobserved heterogeneity is less problematic for causal inference because: \[ \curl{(X_{it}}_{t=1}^T \perp\!\!\!\perp A_i, U_{it} \quad \text{(by design)}, \] where independence may hold conditionally on some further variables. In other words, treatment assignment is independent of the individual determinants of the potential outcomes. As a consequence, various transformations and comparisons of treatment groups directly estimate causal parameters of interest.
However, in observational data, this independence fails: \[ \curl{X_{it}}_{t=1}^T \not\!\perp\!\!\!\perp (A_i, U_{it}). \] Real-world data is generated by agents making choices based on all information, including the information that is not recorded in the final datasets. A student selects a college \(X_i\) based on unobserved ambition \(A_i\); a firm adopts a technology \(X_{it}\) based on unobserved costs \((A_i, U_{it})\); a patient complies with a medical treatment based on unobserved health beliefs. The resulting data is the result of these endogenous decisions, meaning that naive statistical associations between \(Y\) and \(X\) are almost always confounded by \((A_i, U_{it})\). This is the core problem of causal inference with unobserved heterogeneity. The following graphs schematically illustrate the point:
Ignoring this confounding leads to misleading conclusions. The issues manifests even in simple linear regression, with some basic examples including:
- Omitted variable bias: if an important explanatory variable is missing and correlated with observed covariates, coefficient estimates are be biased and inconsistent.
- Attenuation bias: When covariates suffer from measurement error, estimated effects tend to be systematically biased toward zero.
In nonlinear models, the resulting biases become completely unpredictable even in parametric models (e.g. Stefanski and Carroll 1985).
1.2 This Course
1.2.1 Course Description
As a response to this issue, econometricians have developed a range of statistical techniques that are suitable for observational data and robust to unobserved heterogeneity in varying senses.
This course surveys some of the advances in this field, structured into three key topics:
- Linear models with heterogeneous coefficients: extending traditional regression models to allow for individual-specific responses to treatment.
- Nonparametric models with unobserved heterogeneity: models that do not restrict the form of heterogeneity or how it affects the outcome.
- Quantile and distribution regression: approaches that focus on quantile and distributional treatment effects, rather than the distribution of treatment effects.
Throughout, we focus on non-experimental (observational) data, where unobserved heterogeneity cannot be ignored. We also allow for non-binary and continuous treatments throughout. Finally, we emphasize identification strategies over asymptotic theory.
1.2.2 A Common Theme
A common theme is that there always will be a price to pay for learning any feature of the counterfactual distribution. This price is paid in the form of assumptions and restrictions on the model (1.1) — another manifestation of the fundamental problem of causal inference. Some potential courses of action will include:
- Imposing functional form restrictions on \(\phi\) (e.g. parametric assumptions such as linearity, or assuming that \(\alpha_i\) is scalar and that it enters \(\phi\) monotonically);
- Restricting the extent of unobserved heterogeneity in the model (e.g. assuming that there is scalar unobserved variables or a vector of heterogeneous coefficients);
- Restricting the relationship between the observed and unobserved variables (e.g. assuming that \(\alpha_i\) is independent from \(x_i\));
- Focusing on particular parts of the distribution of the outcome (such as quantiles).
1.2.3 Overview
We will see examples of each approach throughout the course, and one should see each particular model discussed as a specific variation of model (1.1). The table below provides a taste of what is to come and a quick summary of this course:
Model Class | What We Pay | What We Get | Upside |
---|---|---|---|
Linear Heterogeneous Models | Linearity of \(\phi\): \(\phi(X_{it}, A_i, U_{it})\) \(=\) \(X_{it}'\beta(A_i)\) \(+ U_{it}\) and sufficient number of observations per unit | Distribution and moments of individual treatment effects | Full distribution of treatment effects |
Nonparametric Models | Identification only for subpopulation of stayers or assumptions on the relationship between \((X_{it})\) and \((A_i, U_{it})\) | Average marginal effects (more with additive separability of \(U_{it})\) | No restrictions on the form of \(\phi(\cdot)\) or \(A_i\) |
Quantile and Distributional Regression Models | Focus on distribution of potential outcomes, not treatment effects. | Quantile and distributional treatment effects (QTEs and DTEs) | No need for panel data |
Next Section
In the next section, we begin by examining how unobserved heterogeneity arises naturally in linear models and setting the stage for the first block of the course.