11  Introduction to Nonparametric Models with Unobserved Heterogeneity

Summary and Learning Outcomes

This section introduces nonparametric models with unobserved heterogeneity and lays out the structure and goals for this block.

By the end of this section, you should be able to:

  • Explain limitations of linear models.
  • Write down generic nonparametric models with unobserved heterogeneity.
  • Understand identification challenges and the assumptions commonly used to address them.

11.1 Towards Nonparametric Models

11.1.1 Motivation

We began these notes by studying linear models with heterogeneous coefficients (2.7), a familiar and flexible starting point. As discussed in section 2, such models are widely used and arise naturally in empirical work.

But linearity is not innocent. Except for binary-only covariates, the assumption often lacks theoretical support and may be at odds with the data. Many economic settings suggest richer structures: preferences with satiation, production with non-constant returns to scale, or outcomes bounded by construction. Furthermore, differences between individuals may not be compressible into a finite-dimensional vector of heterogeneous coefficients. In such cases, linear models may be severely misspecified and lead to incorrect conclusions.

This motivates a shift. In this block, we move beyond linearity and consider nonparametric models with unobserved heterogeneity — a class that allows for far greater flexibility in how outcomes respond to both observed and unobserved variation. These models present new challenges but also offer a more powerful framework for accounting for unobserved differences.

11.1.2 Nonparametric Models

Nonparametric models address functional form concerns directly. Rather than imposing a specific shape on the relationship between \(y\) and the covariates \(\bx\), we assume only that this relationship is governed by an unknown function \(\phi\) of observed and unobserved variables. This leads to the following general setup:

  • In cross-sectional settings: \[ Y_i = \phi(\bX_i, A_i), \quad {}_{i=1,\dots, N}, \tag{11.1}\] where \(\bX_{i}\) includes the observed variables and \(A_i\) includes the unobserved components.

  • In panel data settings \[ Y_{it} = \phi(\bX_{it}, A_i, U_{it}), \quad {}_{i=1,\dots, N}^{t= 1, \dots, T}, \tag{11.2}\] where both \(A_i\) and \(U_{it}\) are not observed.

Models (11.1) and (11.2) parallel and generalize models (2.3) and (2.4).

In both cases the nature of \((A_i, U_{it})\) is not restricted a priori. These unobserved components may include both finite-dimensional vectors (such as unobserved variables or coefficients) and infinite-dimensional objects (such as utility functions). In such a fully unrestricted setting, we can equivalently represent (11.1) and (11.2) as \[ Y_i = \phi_i(X_i), \quad \quad Y_{it} = \phi_{it}(X_{it}), \] respectively.

11.1.3 Object of Interest

As in the linear case, possible objects of interest include:

  • The full structural function \(\phi(\cdot, \cdot)\) or \(\phi(\cdot, \cdot, \cdot)\). This function fully describes the relationship between \(Y\) and \(\bX\) for all individuals. This corresponds to the problem of identifying individual treatment effects.
  • Some distributional features of “treatment effects” — changes in outcomes due to variation in \(\bX_{it}\), conditional on unobserved heterogeneity. In the context of model (11.2), these effects are given by \[ \begin{aligned} & \phi(\bx_2, A_i, U_{it}) - \phi(\bx_1, A_i, U_{it}),\\ & \partial_{\bx} \phi(\bx_0, A_i, U_{it}), \end{aligned} \tag{11.3}\] where \(\bx_0, \bx_1, \bx_2\) are some possible values for \(\bX_{it}\), and the marginal effect is considered if \(\phi\) is suitably differentiable in \(\bx\). The distributional feature of interest may include average effects, variances, higher-order moments, or the full distribution.

11.1.4 Common Issue

Unfortunately, models (11.1) and (11.2) require further assumptions to be useful, as discussed in section 1. Without assumptions, we cannot hope to identify counterfactual objects of interest, not even average effects.

In a strict sense, one may point out that (11.1) and (11.2) are not even models in the sense of offering testable predictions or supporting counterfactual analysis. (11.1) and (11.2) are just statements that \(\bX\) and \(Y\) are related through some function that may differ with \(i\) and \(t\). Such a statement is vacuously true. For example, without further assumptions one may take \(\phi(x, a) = a\) and \(A_i = Y_i\) in (11.1).

Typically, such assumptions fall into two categories:

  1. Assumptions on the joint distribution of the observed and the unobserved components, including on the nature of \((A_i, U_{it})\).
  2. Assumptions on how unobserved components enter the equation.

11.2 Models of This Block

To make progress, we focus on two general but tractable versions of the general nonparametric panel model (11.2) in these notes. In both cases, we will assume that the outcome \(Y_{it}\) is continuous and that \(T=2\).

First Model

We will begin with the following model: \[ Y_{it} = \phi(X_{it}, A_i, U_{it}), \quad {}_{i=1, \dots, N}^{t=1, 2}. \tag{11.4}\] where for simplicity we assume that \(X_{it}\) is scalar. The variable \(X_{it}\) is assumed to be continuously distributed, and \(\phi\) is continuous in \(X_{it}\) for all values of \((A_i, U_{it})\).

We consider a version of (11.4) that is very general in terms of unobserved variables \((A_{i}, U_{it})\). In particular, we do not restrict

  • The dimension and the form of \(A_i\) and \(U_{it}\);
  • How the outcome depends on \((A_{i}, U_{it})\);
  • The dependence structure between \((A_i, U_{it})\) and \(X_{it}\).

At the same time, we impose an assumption of stationarity on \(U_{it}\). Its distribution is stable over time conditional on observed and unobserved covariates, allowing us to isolate changes in \(Y_{it}\) attributable to variation in \(X_{it}\).

Second Model

After reaching the (probable) limits of identification with (11.4), we will consider a different flavor of model (11.2), where the time-varying unobserved component \(U_{it}\) is scalar and affects the outcome \(Y_{it}\) additively: \[ Y_{it} = \phi(X_{it}, A_i) + U_{it}, \quad {}_{i=1, \dots, N}^{t=1, 2} \tag{11.5}\] In contrast to model (11.4), we do not assume that \(U_{it}\) is stationary. Models (11.4) and (11.5) are hence non-nested. We continue to allow \(A_i\) to have unrestricted dimensionality and structure. It may also have a complex dependence structure with \(X_{it}\).

11.3 Plan for This Block

In this block, we will focus on models (11.4)-(11.5) and consider identification of some distributional features of treatment effects (11.3). Specifically,

  1. Average treatment and marginal effects for model (11.4):
    • Show that identifying average effects is more complicated than considering averages of the outcome directly.
    • Discuss heterogeneity bias, another form of confounding.
    • Show how stationarity assumptions on \(U_{it}\) allow us to identify the average effects for a population of stayers — units with \(X_{i1}=X_{i2}\) — without any further assumptions.
    • Consider two generalizations of the identification result: beyond the population of stayers and allowing some non-stationarity in the structural function.
  2. Variance of treatment and marginal effects in model (11.5): identify the variance of effects (11.3) by requiring that \(U_{it}\) cannot depend on future values of \(\bX_{it}\)

In the next block, we will also revisit model (11.4) through the lens of quantile regression.

11.4 A Brief Classification of Nonparametric Models

In these notes we primarily focus on the general and powerful models (11.4)-(11.5). However, much work has gone into analyzing other special instances of (11.1) and (11.2). Before moving on to identification of average effects, we offer a brief taxonomy of nonparametric models with unobserved heterogeneity with some essential references. We organize the literature by the types of assumptions made:


Next Section

In the next section, we begin our analysis of average effects in model (11.4) and discuss why identification is more complex that analyzing average outcomes.