6 Mean Group: Robust Estimator for Average Coefficients

Summary and Learning Outcomes

This section introduces a heterogeneity- and dynamics-robust estimator for average coefficients — the mean group estimator.

By the end of this section, you should be able to:

Define the mean group estimator.
Identify the assumptions and data requirements necessary for the MG estimator to be consistent.
Compare the MG estimator with alternative estimation methods (e.g., within estimators and IV techniques) in both static and dynamic settings.

6.1 Mean Group Estimation

The results of the previous section and section 3 leave us with key two questions:

In static models, can we estimate the average effect without imposing restrictions on the relationship between \(\bX_{it}\) and \(\bbeta_i\)?
In dynamic models, can we estimate \(\E[\lambda_i]\) at all?

As it turns out, there is indeed an estimator that is robust both to heterogeneity and dynamics. The mean group (MG) estimator, due to M. H. Pesaran and Smith (1995), permits estimating average coefficients in heterogeneous panels, regardless of the relationship between the coefficients and the covariates. It also does not require strict exogeneity and is compatible with dynamic models. However, this generality comes at a cost: the mean group estimator typically has stricter data requirements compared to non-robust alternatives.

6.1.1 Model

To formally define the mean group estimator, we return to the general model (2.7): \[ \bY_{it}^{\bx} = \bbeta_i'\bx + U_{it}, \] where we label the dimension of the vectors \(\bx\) and \(\bbeta_i\) as \(p\).

For convenience, express the corresponding realized outcome equations in unit-level matrix form by stacking observations across \(t\) for each unit \(i\): \[ \bY_i = \bX_i\bbeta_i + \bU_i, \] where \(\bX_i\) is now a \(T\times p\) matrix of covariates, with \(T\) representing the number of observations for each unit.

In what follows, we assume that the regressors \(\bX_{it}\) are orthogonal to the unobserved component \(U_{it}\): \[ \E[\bX_{it}U_{it}] =0. \tag{6.1}\] This assumption is compatible with both static and dynamic models and is implied by sequential exogeneity.

6.1.2 Definition

For the mean group estimator to be well-defined, we make two further assumptions:

Sufficient unit-level data: each unit must have at least as many observations as regressors, i.e., \(T\geq p\).
Non-singularity: \(\bX_i'\bX_i\) has rank maximal rank (\(p\)) for all units \(i\).

Under these assumptions, we can compute the OLS estimator of \(\bbeta_i\) for each unit \(i\) \[ \hat{\bbeta}_i = \left(\bX_i'\bX_i \right)^{-1}\bX_i'\bY_i. \tag{6.2}\]

The mean group estimator (M. H. Pesaran and Smith 1995) is defined as the average of unit-level estimators (6.2): \[ \hat{\bbeta}_{MG} = \dfrac{1}{N} \sum_{i=1}^N \hat{\bbeta}_i = \dfrac{1}{N} \sum_{i=1}^N \left(\bX_i'\bX_i \right)^{-1}\bX_i'\bY_i. \] In essence, the mean group estimator involves running separate regressions for each unit and averaging the resulting coefficients. The first step requires having enough observations per unit — at least one observation is available for each heterogeneous coefficient.

We introduce the MG estimator under the orthogonality condition (6.1). However, it is also possible to introduce an analogous estimator if \(\E[\bX_{it}U_{it}]\neq 0\), but instruments for \(\bX_{it}\) are available. In this case, suitable unit-level IV/2SLS estimators replace \(\hat{\bbeta}_i\) (e.g. Bai, Marcellino, and Kapetanios 2023).

6.1.3 Properties

To analyze the properties of the \(\hat{\bbeta}_{MG}\), we can substitute the model for \(\bY_i\) into each individual estimator. This substitution yields the sampling error form of the estimator: \[ \hat{\bbeta}_{MG} = \dfrac{1}{N} \sum_{i=1}^N \bbeta_i + \dfrac{1}{N} \sum_{i=1}^N \left(\bX_i'\bX_i \right)^{-1}\bX_i'\bU_i. \]

As \(N\) grows, the first term converges to the average effect: \[ \dfrac{1}{N} \sum_{i=1}^N \bbeta_i \xrightarrow{p} \E[\bbeta_i]. \] This convergence holds regardless of assumptions on the dependence between \(\bbeta_i\) and \(\bX_i\).

The second term’s behavior determines the estimator’s overall properties, depending on whether strict exogeneity holds:

If strict exogeneity holds, the mean group estimator is unbiased and consistent for \(\E[\bbeta_i]\) even for \(T\) fixed. In this case it directly holds that \[ \E\left[ \left(\bX_i'\bX_i \right)^{-1}\bX_i'\bU_i\right] =0 \] for any fixed value of \(T\).
If the model is dynamic or strict exogeneity fails for any other reason, the mean group estimator is typically only be consistent as both \(N, T\to\infty\). In this case it may be that \[ \E\left[ \left(\bX_i'\bX_i \right)^{-1}\bX_i'\bU_i\right] \neq 0 ,\] and \(\hat{\bbeta}_{MG}\) is biased for any given finite \(T\). However, under standard assumptions that limit dependence across \(t\), this bias is typically be of the order \(O(T^{-1})\) and vanishes as \(T\to\infty\).

We implicitly assume the existence of any required moment involving \(\left(\bX_i'\bX_i\right)^{-1}\). This assumption is not always innocent and imposes certain constraints on the distribution of covariates — requirements of sufficient variation. We do not discuss this point further, but see Graham and Powell (2012).

6.2 Comparing Estimators

6.2.1 Conditions for Consistency

We now have two estimation strategies per case. For the strictly exogenous case, we have within and mean group estimation. In the dynamic case, we can use dynamic panel IV estimators or again use the mean group estimators.

The following tables summarize and contrast the requirements for each estimation strategy to consistently estimate \(\E[\bbeta_i]\):

Strict exogeneity
Dynamic model with \(k\) lags

	Within	MG
Data requirements	\(T \geq 2\)	\(T \geq p\)
Assumptions on \((\beta_i, X_i)\)	\(\mathbb{E}[\tilde{X}_i'\tilde{X}_i\eta_i]=0\)	None
Asymptotic framework	Fixed-\(T\)	Fixed-\(T\)

	IV	MG
Data requirements	\(T \geq k+2\)	\(T \geq p+k\)
Assumptions on \((\beta_i, X_i)\)	No consistency	None
Asymptotic framework	No consistency	Large-\((N, T)\)

6.2.2 Efficiency

While the mean group estimator provides robustness to heterogeneity and dynamics, it is generally less efficient than alternative estimators when those alternatives are consistent. This loss of efficiency directly follows from the AM-HM inequality and is well-documented (H. Pesaran, Smith, and Im 1996; M. H. Pesaran, Shin, and Smith 1999; Hsiao, Pesaran, and Tahmiscioglu 1999).

However, one might argue that this loss of efficiency is irrelevant, as the conditions for the consistency of alternatives require unrealistic restrictions on the coefficients. Still, there are some ways of obtaining a more efficient robust estimator, such as applying a Mundlak device, see Breitung and Salish (2021).

6.3 Addressing Rank Deficiency

Before moving beyond \(\E[\bbeta_i]\), we return to the assumption that \(\bX_i'\bX_i\) is non-singular for all units \(i\). This requirement might not hold, in particular if \(T\) is only slightly larger than \(p\) and the model includes discrete regressors. In these cases, it is more likely that there exist units with insufficient individual variation.

If such a situation occurs in practice, a common solution is to drop the units with non-invertible \(\bX_i'\bX_i\). The mean group estimator is computed as an average over the remaining units. Formally, the corresponding mean group estimator is defined as \[ \hat{\bbeta}_{MG}^+ = \dfrac{1}{\sum_{i=1}^N \I\curl{\det(\bX_i'\bX_i)>0} } \sum_{i=1}^N \I\curl{\det(\bX_i'\bX_i)>0} \hat{\bbeta}_i. \] Every unit is checked to see if \(\det(\bX_i'\bX_i)>0\). If it is, their \(\hat{\bbeta}_i\) is computed and added to the average.

What does this modified estimator estimate? For simplicity, suppose that \(\E[\bU_i|\bX_i]=0\). Then, as \(N\to\infty\), \(\hat{\bbeta}_{MG}^+\) converges to the average for the units that have enough variation in their data (subject to the above warning regarding existence of moments): \[ \hat{\bbeta}_{MG}^+ \to \E\left[\bbeta_i| \I\curl{\det(\bX_i'\bX_i)>0} \right]. \] This limit may be viewed as an average treatment effect of the treated.

Next Section

In the next section, we move beyond \(\E[\bbeta_i]\) and discuss identification of the variance of \(\bbeta_i\).