12 Average Effects and Heterogeneity Bias

Summary and Learning Outcomes

This section shows why average effects cannot in general be identified by simply differentiating average outcomes.

By the end of this section, you should be able to:

State the nonseparable model and define average marginal effects.
Derive the decomposition of changes in average outcomes into average marginal effects and heterogeneity bias.
Explain the source of heterogeneity bias in observational data.

12.1 Model and Object of Interest

12.1.1 Model

As noted in the previous section, we first focus our attention on the fully nonseparable panel data model (11.4): \[ Y_{it}^{x} = \phi(x, A_i, U_{it}), \quad {}_{i=1, \dots, N}^{t=1, 2}, \tag{12.1}\] where \(\phi(x, a, u)\) is differentiable in \(x\) for each possible value \((a, u)\).

Model (12.1) is an extremely general nonparametric model. We make essentially no assumptions on the form of \((A_i, U_{it})\) or \(\phi\) (beyond differentiability). In particular, \((A_i, U_{it})\) may be finite- or infinite-dimensional.

To illustrate the identification arguments clearly, we make two simplifying assumptions throughout:

\(X_{it}\) is scalar.
There are only two observations per each unit.

Both of these assumptions are easy to relax at the price of more complex notation.

12.1.2 Object of Interest

In studying economic models with continuous treatments or inputs, marginal effects provide interpretable measures of how units respond to small changes in covariates. Accordingly, our key object of interest is the average marginal effect: \[ \E\left[\partial_x Y^x_{it}|X_{it}=x, \dots \right] = \E[\partial_x \phi(x, A_i, U_{it})|X_{it}=x, \dots], \tag{12.2}\] where \(x\) is some fixed point, and the conditioning set defines some population of interest in terms of \(X_{it}\) and other variables. Such marginal effects are standard objects of interest in models with continuous covariates and outcomes (Hoderlein and Mammen 2007; Hoderlein and White 2012; Chernozhukov et al. 2015).

The parameter (12.2) has a standard causal interpretation. We consider the population of units with \(X_{it}=x\), along with other characteristics captured by the conditioning set. For these units, we exogenously change their \(X_{it}\) infinitesimally. Effect (12.2) reports the average change in outcomes for this fixed population of units.

One may also consider average treatment effects \(\E[\partial_x \phi(x_2, A_i, U_{it}) - \partial_x \phi(x_1, A_i, U_{it})|\dots]\). We do not discuss such effects explicitly, but all the results we derive for marginal effects have direct counterparts for average effects of non-marginal changes in \(x\).

12.2 Heterogeneity Bias and Issue with Average Outcomes

12.2.1 Intuitive Approach

One may think that it is easy to identify effect (12.2) even from cross-sectional data. It is tempting to consider the derivative of the conditional expectation of \(Y\) given \(X_{it}\). However, as we now see, this approach fails in the presence of non-random treatment assignment — typical in observational settings.

Consider the conditional average of \(Y_{it}\) for the units with \(X_{it}=x\): \[ \begin{aligned} \E[Y_{it}|X_{it}=x] & = \E[\phi(X_{it}, V_{it})|X_{it}=x] \\ & = \E[\phi(x, V_{it})|X_{it}=x]. \end{aligned} \] where we label \(V_{it}=(A_i, U_{it})\) for brevity. The expectation is with respect to the conditional law of \(V_{it}\) given \(X_{it}=x\).

One might hope that differentiating the conditional expectation with respect to \(x\) recovers the average marginal effect (12.2): \[ \partial_x \E[Y_{it}|X_{it}=x] \overset{??}{=} \E[\partial_x \phi(x, V_{it})|X_{it}=x] \]

12.2.2 Heterogeneity Bias

However, this reasoning fails when covariates may be correlated with unobserved factors affecting the outcome.

To make this intuition precise, let \(f_{V_{it}|X_{it}}(v|x)\) be the conditional density of \(V_{it}\) given \(X_{it}=x\) with respect to some dominating measure \(\mu\) that does not depend on \(x\) (its existence is an assumption, but not a particularly important one right now, see section 13). With this notation, the above expectation can be written as \[ \E[Y_{it}|X_{it}=x] = \int \phi(x, v)f_{V_{it}|X_{it}}(v|x)\mu(dv). \] We assume that \(f_{V_{it}|X_{it}}(v|x)\) is differentiable in the conditioning argument \(x\): there is smooth dependence between \(X_{it}\) and \(V_{it}\) in a distributional sense. Intuitively, the shares of the unobserved “types” \(v\) very smoothly with \(x\).

Assuming we can swap the integral and the derivative, we obtain the following expression for the derivative \[ \begin{aligned} & \partial_x \E[Y_{it}|X_{it}=x] \\ & = \int \partial_x \left[\phi(x, v)f_{V_{it}|X_{it}}(v|x) \right]\mu(dv)\\ & = \int \partial_x \phi(x, v)f_{V_{it}|X_{it}}(v|x)\mu(dv) + \int \phi(x, v)\partial_x f_{V_{it}X_{it}}(v|x)\mu(dv)\\ & = \E[\partial_x \phi(x, V_{it})|X_{it}=x] + \int \phi(x, v)\partial_x f_{V_{it}|X_{it}}(v|x)\mu(dv). \end{aligned} \tag{12.3}\]

Equation 12.3 shows that, as we change \(x\), the conditional expectation \(\E[Y_{it}|X_{it}=x]\) changes due to two factors:

Change in the function \(\phi\). This term is an average marginal effect — the parameter of interest.
Changes in the conditional distribution of \((A_i, U_{it})\) given \(X_{it}\) — the source of the second term, typically called the heterogeneity bias (Chamberlain 1982; see also Graham and Powell 2012 for a more explicit representation).

12.2.3 Source of Bias and Conditions for no Bias

So why does the heterogeneity bias appear? Fundamentally, it arises when the treatment \(X_{it}\) is not assigned exogenously but is instead chosen by economic agents based on information about their potential outcomes \(\phi(x, A_i, U_{it})\). For example, consumers choose products based on knowledge of their preferences \(A_i\) and transitory taste shocks and current prices \(U_{it}\). Similarly, firms decide on input levels based on firm-specific productivity and technology components encapsulated in \((A_i, U_{it})\).

As a result, units with different values of \(X_{it}\) may systematically differ in their unobserved characteristics. When we compute the derivative \(\partial_x \E[Y_{it} | X_{it} = x]\), we are not isolating the effect of changing \(x\) for a fixed unit. Instead, we are comparing different populations—those for whom \(X_{it} = x\) versus those for whom \(X_{it} = x + \varepsilon\). Each such population potentially has a different distribution of \((A_i, U_{it})\).

This violates the ceteris paribus logic required for causal interpretation of marginal effects. In particular, the expression \(\partial_x \E[Y_{it} | X_{it} = x]\) combines the genuine marginal effect \(\E[\partial_x Y_{it}^x|X_{it}=x]\) with a compositional shift due to variation in the distribution of unobservables across \(x\).

The bias term does vanish if the conditional distribution of \((A_i, U_{it})\) given \(X_{it}\) does not vary with \(x\). That is, a sufficient condition for identification of average marginal effects from average outcomes is \[ \partial_x f_{V_{it}|X_{it}}(v|x) = 0 \]

This condition implies that \(X_{it}\) is independent of \((A_i, U_{it})\) — a random assignment assumption reflective of a randomized controlled trial. In that case, \(\partial_x \E[Y_{it} | X_{it} = x]\) equals the average partial derivative \(\E[\partial_x \phi(x, A_i, U_{it}) | X_{it} = x]\) and hence recovers the target marginal effect.

In observational data, this assumption typically fails, and the heterogeneity bias must be explicitly addressed.

Note that it is also possible to consider the case where \((A_i, U_{it})\) and \(X_{it}\) are independent only conditionally on some additional observed variable \(Z_{it}\) (ignorability/unconfoundedness assumption). The results are then conditional on \(Z_{it}\). See Altonji and Matzkin (2005) for this flavor of results in continuous settings.

Next Section

In the next section we show how one can actually identify average marginal effects for certain subpopulations by assuming that \(U_{it}\) is stationary.