12 Average Effects and Heterogeneity Bias
12.1 Model and Object of Interest
12.1.1 Model
As noted in the previous section, we first focus our attention on the fully nonseparable panel data model (11.4): \[ Y_{it} = \phi(X_{it}, A_i, U_{it}), \quad {}_{i=1, \dots, N}^{t=1, 2}, \tag{12.1}\] where \(\phi(x, a, u)\) is differentiable in \(x\) for each possible value \((a, u)\).
Model (12.1) is an extremely general nonparametric model. We make essentially no assumptions on the form of \((A_i, U_{it})\) or \(\phi\) (beyond differentiability). In particular, \((A_i, U_{it})\) may be finite- or infinite-dimensional.
To illustrate the identification arguments clearly, we make two simplifying assumptions throughout:
- \(X_{it}\) is scalar.
- There are only two observations per each unit.
Both of these assumptions are easy to relax at the price of more complex notation.
12.1.2 Object of Interest
In studying economic models with continuous treatments or inputs, marginal effects provide interpretable measures of how units respond to small changes in covariates. Accordingly, our key object of interest is the average marginal effect: \[ \E[\partial_x \phi(x, A_i, U_{it})|X_{it}=x, \dots], \tag{12.2}\] where \(x\) is some fixed point, and the conditioning set defines some population of interest in terms of \(X_{it}\) and other variables. Such marginal effects are standard objects of interest in models with continuous covariates and outcomes (Hoderlein and Mammen 2007; Hoderlein and White 2012; Chernozhukov et al. 2015).
The parameter (12.2) has a standard causal interpretation. We consider the population of units with \(x_{it}=x\), along with other characteristics captured by the conditioning set. For these units, we exogenously change their \(x_{it}\) infinitesimally. Effect (12.2) reports the average change in outcomes for this fixed population of units.
12.2 Heterogeneity Bias and Issue with Average Outcomes
12.2.1 Intuitive Approach
One may think that it is easy to identify effect (12.2) even from cross-sectional data. It is tempting to consider the derivative of the conditional expectation of \(Y\) given \(X_{it}\). However, as we will now see, this approach fails in the presence of non-random treatment assignment, typical in observational settings.
Consider the conditional average of \(Y_{it}\) for the units with \(X_{it}=x\): \[ \begin{aligned} \E[Y_{it}|X_{it}=x] & = \E[\phi(X_{it}, V_{it})|X_{it}=x] \\ & = \E[\phi(x, V_{it})|X_{it}=x]. \end{aligned} \] where we label \(V_{it}=(A_i, U_{it})\) for brevity. The expectation is with respect to the conditional law of \(V_{it}\) given \(X_{it}=x\).
One might hope that differentiating the conditional expectation with respect to \(x\) recovers the average marginal effect (12.2): \[ \partial_x \E[Y_{it}|X_{it}=x] \overset{??}{=} \E[\partial_x \phi(x, V_{it})|X_{it}=x] \]
12.2.2 Heterogeneity Bias
However, this reasoning fails when covariates may be correlated with unobserved factors affecting the outcome.
To make this intuition precise, let \(f_{V_{it}|X_{it}}(v|x)\) be the conditional density of \(V_{it}\) given \(X_{it}=x\) with respect to some dominating measure \(\mu\) that does not depend on \(x\) (its existence is an assumption, but not a particularly important one right now). With this notation, the above expectation can be written as \[ \E[Y_{it}|X_{it}=x] = \int \phi(x, v)f_{V_{it}|X_{it}}(v|x)\mu(dv). \] We assume that \(f_{V_{it}|X_{it}}(v|x)\) is differentiable in the conditioning argument \(x\). In other words, there is smooth dependence between \(X_{it}\) and \(V_{it}\) in a distributional sense.
Assuming we can swap the integral and the derivative, we obtain the following expression for the derivative \[ \begin{aligned} & \partial_x \E[Y_{it}|X_{it}=x] \\ & = \int \partial_x \left[\phi(x, v)f_{V_{it}|X_{it}}(v|x) \right]\mu(dv)\\ & = \int \partial_x \phi(x, v)f_{V_{it}|X_{it}}(v|x)\mu(dv) + \int \phi(x, v)\partial_x f_{V_{it}X_{it}}(v|x)\mu(dv)\\ & = \E[\partial_x \phi(x, V_{it})|X_{it}=x] + \int \phi(x, v)\partial_x f_{V_{it}|X_{it}}(v|x)\mu(dv). \end{aligned} \tag{12.3}\]
Equation 12.3 shows that, as we change \(x\), the conditional expectation \(\E[Y_{it}|X_{it}=x]\) changes due to two factors:
- Change in the function \(\phi\). This term is an average marginal effect — the parameter of interest.
- Changes in the conditional distribution of \((A_i, U_{it})\) given \(X_{it}\) — the source of the second term, typically called the heterogeneity bias (Chamberlain 1982; see also Graham and Powell 2012 for a more explicit representation).
12.2.3 Source of Bias and Conditions for no Bias
So why does the heterogeneity bias appear? Fundamentally, it arises when the treatment \(X_{it}\) is not assigned exogenously but is instead chosen by economic agents based on information about their potential outcomes \(\phi(x, A_i, U_{it})\). For example, consumers choose products based on knowledge of their preferences \(A_i\) and transitory taste shocks and current prices \(U_{it}\). Similarly, firms decide on input levels based on firm-specific productivity and technology components encapsulated in \((A_i, U_{it})\).
As a result, units with different values of \(X_{it}\) may systematically differ in their unobserved characteristics. When we compute the derivative \(\partial_x \E[Y_{it} | X_{it} = x]\), we are not isolating the effect of changing \(x\) for a fixed unit. Instead, we are comparing different populations—those for whom \(X_{it} = x\) versus those for whom \(X_{it} = x + \varepsilon\). Each such population potentially has a different distribution of \((A_i, U_{it})\).
This violates the ceteris paribus logic required for causal interpretation of marginal effects. In particular, the expression \(\partial_x \E[Y_{it} | X_{it} = x]\) combines the genuine marginal effect of \(x\) with a compositional shift due to variation in the distribution of unobservables across \(x\).
The bias term does vanish if the conditional distribution of \((A_i, U_{it})\) given \(X_{it}\) does not vary with \(x\). That is, a sufficient condition for identification of average marginal effects from average outcomes is \[ \partial_x f_{V_{it}|X_{it}}(v|x) = 0 \]
This condition implies that \(X_{it}\) is independent of \((A_i, U_{it})\) — a random assignment assumption reflective of a randomized controlled trial. In that case, \(\partial_x \E[Y_{it} | X_{it} = x]\) equals the average partial derivative \(\E[\partial_x \phi(x, A_i, U_{it}) | X_{it} = x]\) and hence recovers the target marginal effect.
In observational data, this assumption typically fails, and the heterogeneity bias must be explicitly addressed.
Next Section
In the next section we show how one can actually identify average marginal effects for certain subpopulations by assuming that \(U_{it}\) is stationary.