2 Intro: Linear Models with Heterogeneous Coefficients

Summary and Learning Outcomes

This section introduces linear models with heterogeneous coefficients, associated identification challenges, and the model used in this block of the course.

By the end of this section, you should

Recognize cases where heterogeneous coefficients arise.
Identify key challenges in estimating heterogeneous coefficients.
Discuss two strategies for identification under coefficient heterogeneity.

2.1 Linearity and Heterogeneity

2.1.1 Models with Homogeneous Slopes

We begin our journey where standard textbooks and first-year foundational courses in econometrics leave off. The “standard” linear models considered in such courses often assume homogeneity in individual responses to covariates (e.g., Hansen (2022)). A common cross-sectional specification is:

\[ y_i = \bbeta'\bx_i + u_{i}, \tag{2.1}\] where \(i=1, \dots, N\) indexes cross-sectional units.

In panel data, models often include unit-specific \((i)\) and time-specific \((t)\) intercepts while maintaining a common slope vector \(\bbeta\):

\[ y_{it} = \alpha_i + \delta_t + \bbeta'\bx_{it} + u_{it}. \tag{2.2}\]

2.1.2 Heterogeneity in Slopes. Examples

However, modern economic theory rarely supports the assumption of homogeneous slopes \(\bbeta\). Theoretical models recognize that observationally identical individuals, firms, and countries can respond differently to the same stimulus. In a linear model, this requires us to consider more flexible models with heterogeneous coefficients:

Cross-sectional model (2.1) generalizes to

\[ y_i = \bbeta_{i}'\bx + u_i. \tag{2.3}\]
Panel data model (2.2) generalizes to

\[ y_{it} = \bbeta_{it}'\bx_{it} + u_{it}. \tag{2.4}\]

Such models are worth studying, as they naturally arise in a variety of contexts:

Structural models with parametric restrictions: Certain parametric restrictions yield linear relationships in coefficients. An example is given by firm-level Cobb-Douglas production functions where firm-specific productivity differences induce heterogeneous coefficients (Combes et al. (2012); Sury (2011)).
Binary covariates and interaction terms: if all covariates are binary and all interactions are included, a linear model encodes all treatment effects without loss of generality (see, e.g., Wooldridge (2005)).
Log-linearized models: Nonlinear models may be approximated by linear models around a steady-state. For example, Heckman and Vytlacil (1998) demonstrate how the nonlinear Card (2001) education model simplifies to a heterogeneous linear specification after linearization.

2.2 What Do We Care About? Identification

2.2.1 Parameters of Interest

The parameters of interest in models (2.1) and (2.2) are straightforward. The common slope \(\bbeta\) simultaneously plays the role of both the average treatment effect and all the individual treatment effects. Estimating \(\bbeta\) is enough for policy analysis.

The situation is more complicated for the more general models (2.3) and (2.4). Consider model (2.3). Parameters of interest now include:

Individual effects: the coefficient vector \(\bbeta_i\) for specific units.
Moments of the distribution: the average coefficient vector (\(\E[\bbeta_i]\)), variance \(\var(\bbeta_i)\), and higher-order moments.
Distributional properties: The full distribution of \(\bbeta_i\) or its quantiles, or just the tail behavior of the distribution.

Similar objects are relevant for the panel model in Equation 2.4.

2.2.2 Regarding Identification

Unfortunately, greater flexibility in terms of parameters also leads to greater challenges in terms of identification. Models (2.3) and (2.4) are too general to permit identification of any of the above parameters without further assumptions. This failure of identification is driven by the combination of the following two issues:

Limited observations per coefficient vector. Since each unit \(i\) (or pair \((i,t)\)) provides only indirect information through \(\bbeta_i'\bx_i\) (or \(\bbeta_{it}'\bx_{it}\)), there is effectively less than one observation per \(\bbeta_i\).
Unrestricted dependence between coefficients and covariates. Without assumptions on the relationship between \(\bbeta_i\) and \(\bx_i\), identification is difficult.

Identification is typically achieved by mitigating one of these challenges. Common strategies to address these challenges include:

Increasing the effective number of observations per coefficient vector by restricting coefficient variation.
- In panel settings, assuming time-invariant coefficients simplifies Equation 2.4 to:
  
  \[ y_{it} = \bbeta_i'\bx_{it} + u_{it}. \tag{2.5}\]
- Alternative approaches assume a finite number of latent groups, each with its own coefficient vector, yielding the grouped structure:
  
  \[ y_{it} = \bbeta_{g_i, t}'\bx_{it} + u_{it}. \tag{2.6}\]
  
  This model in discussed in Bonhomme and Manresa (2015), Bester and Hansen (2016) (see also Bonhomme, Lamadon, and Manresa (2022)).
Restricting dependence between \(\bbeta_i\) and \(\bx_i\). For example, there is a strand of literature that assumes that \(\bbeta_i\) and \(\bx_i\) are independent (Beran, Feuerverger, and Hall 1996; Hoderlein, Klemelä, and Mammen 2010).

2.3 Model of This Block

This block primarily focuses on the first strategy. Specifically, we will consider a version of model (2.4) with time-invariant heterogeneous coefficients:

\[ y_{it} = \bbeta_i'\bx_{it} + u_{it}. \tag{2.7}\]

We do not impose restrictions on the dependence between \(\bbeta_i\) and \(\bx_{it}\). In general, it is important to allow for such dependence outside of experimental data — economic agents can select their covariates \(\bx_{it}\) based on knowledge of their own \(\bbeta_i\). Since parametrizing this dependence is non-trivial, we impose no assumptions on it.

We will also generally focus on the case where the number \(N\) of units is large, while the number \(T\) of observations per unit is fixed and not necessarily large.

In the panel data literature, approaches that do not restrict the dependence between the unobserved and the observed components are called “fixed effects”.

Note that model (2.7) includes a particular special case — the random intercept model (confusingly also called the “fixed effects model”). The random intercept model imposes homogeneity on all parameters except the intercept term. In the one-way case, the model takes the form:

\[ y_{it} = \alpha_i + \bbeta'\bx_{it} + u_{it}. \tag{2.8}\] Model (2.8) is one of the oldest ways of including unobserved heterogeneity in linear models and goes back at least to Mundlak (1961).

2.4 Plan for This Block

In this block, we will focus on model (2.7) and consider identification of the above parameters of interest. Specifically,

Average coefficient vector \(\E[\bbeta_i]\):
- First, we demonstrate that standard estimators for the random intercept model (2.8) are generally inconsistent for \(\E[\bbeta_i]\) in the more general model (2.7).
- Next, we introduce a mean group estimator robust to heterogeneity and dynamics.
Variance \(\var(\bbeta_i)\): we show how one can identify and estimate \(\var(\bbeta_i)\) by imposing structure on the temporal dependence in the residuals \(u_{it}\).
Identifying the Full Distribution of \(\bbeta_i\): we show how one can obtain the distribution of \(\bbeta_i\) with a deconvolution argument.

Knowing these features of the distribution of \(\bbeta_i\) allows one to compute the corresponding features of the treatment effects of changing from some treatment value \(\bx_1\) to \(\bx_2\) — these treatment effects are given by \(\bbeta_1(\bx_2-\bx_1)\).

Next Section

Next, we show that the within (fixed effects) estimator recovers \(\E[\bbeta_i]\) only under restrictive assumptions.

Beran, Rudolf, Andrey Feuerverger, and Peter Hall. 1996. “On Nonparametric Estimation of Intercept and Slope Distributions in Random Coefficient Regression.” The Annals of Statistics 24 (6): 2569–92. https://doi.org/10.1214/aos/1032181170.

Bester, C Alan, and Christian B Hansen. 2016. “Grouped Effects Estimators in Fixed Effects Models.” Journal of Econometrics 190 (1): 197–208. https://doi.org/10.1016/j.jeconom.2012.08.022.

Bonhomme, Stéphane, Thibaut Lamadon, and Elena Manresa. 2022. “Discretizing Unobserved Heterogeneity.” Econometrica 90 (2): 625–43. https://doi.org/10.3982/ECTA15238.

Bonhomme, Stéphane, and Elena Manresa. 2015. “Grouped Patterns of Heterogeneity in Panel Data.” Econometrica 83 (3): 1147–84. https://doi.org/10.3982/ecta11319.

Card, David. 2001. “Estimating the Return to Schooling: Progress on Some Persistent Econometric Problems.” Econometrica 69 (5): 1127–60. https://doi.org/10.1111/1468-0262.00237.

Combes, Pierre Philippe, Gilles Duranton, Laurent Gobillon, Diego Puga, and Sébastien Roux. 2012. “The Productivity Advantages of Large Cities: Distinguishing Agglomeration From Firm Selection.” Econometrica 80 (6): 2543–94. https://doi.org/10.3982/ecta8442.

Hansen, Bruce. 2022. Econometrics. Princeton University Press.

Heckman, James, and Edward Vytlacil. 1998. “Instrumental variables methods for the correlated random coefficient model.” Journal of Human Resources 33 (4): 974–87.

Hoderlein, Stefan, Jussi Klemelä, and Enno Mammen. 2010. “Analyzing the Random Coefficient Model Nonparametrically.” Econometric Theory 26 (03): 804–37. https://doi.org/10.1017/S0266466609990119.

Mundlak, Yair. 1961. “Empirical Production Function Free of Management Bias.” Journal of Farm Economics 43 (1): 44. https://doi.org/10.2307/1235460.

Sury, Tavneet. 2011. “Selection and Comparative Advantage in Technology Adoption.” Econometrica 79 (1): 159–209. https://doi.org/10.3982/ecta7749.

Wooldridge, Jeffrey M. 2005. “Fixed-effects and related estimators for correlated random-coefficient and treatment-effect panel data models.” The Review of Economics and Statistics 87 (May): 385–90.