Mean Group Estimation

Learning Average Effects in Linear Panel Data Models

Vladislav Morozov

Introduction

Lecture Info

Learning Outcomes

This lecture is about estimating the average of heterogeneous coefficients

By the end, you should be able to

Discuss when heterogeneous coefficients may arise
State the definition of the mean group estimator
Prove its theoretical properties

References

Mean group estimation rarely discussed in textbooks

Just two references:

28.5 in Pesaran (2015)
Some discussion in section 6 of notes from my graduate class on estimation with unobserved heterogeneity (section 6, Morozov 2025)

Motivation

Theoretical Motivation: Model and Object of Interest

Consider potential outcome model: \[ Y_{it}^{\bx} = \bx'\bbeta_{\textcolor{teal}{i}} + U_{it} \tag{1}\]

Heterogeneous coefficients realistic:

Different people react differently to same change in \(\bx\)
Average effect of going from \(\bx_1\) to \(\bx_2\) is \((\bx_2-\bx_1)'\E[\bbeta_i]\)

Want to learn \(\E[\bbeta_i]\)

Issue with FE Estimators and This Lecture

Last time proved that under model (1) FE estimators estimate weighted average \[ \hat{\bbeta}^{FE} \xrightarrow{p} \E[\bW(\tilde{\bX}_i)\bbeta_i] \] Outside of RCTs usually \(\E[\bW(\tilde{\bX}_i)\bbeta_i]\neq \E[\bbeta_i]\)

Weights \(\bW(\cdot)\) “nice”: positive definite and integrate to 1
But \(\bW(\cdot)\) — difficult to interpret

This lecture — way to estimate \(\E[\bbeta_i]\) with panel data

Empirical Motivation

Last time discussed relationship between pollution and labor market outcomes using US county-level data

What if pollution effect differs between counties?

E.g. due to different population characteristics (age, health, …)
Would be an example of model (1). FE estimate from last time would be \(\E[\bW(\tilde{\bX})\bbeta_i]\)
What is \(\E[\bbeta_i]\)?

Theory

Construction

Model and Data

Vector form of realized outcomes in model (1): \[ \bY_i = \bX_i\bbeta_i + \bU_i \] \(\bbeta_i\) — \(p\)-vector. Interested in \(\E[\bbeta_i]\)

Assumption: \(T\geq p\)

At least as many observations as there are coefficients

Individual Estimator

Key component — individual estimators \[ \hat{\bbeta}_i = (\bX_i'\bX_i)^{-1}\bX_i'\bY_i \]

\(\hat{\bbeta}_i\) — simply running OLS using the data only on unit \(i\)
We assume that \((\bX_i'\bX_i)^{-1}\) exists
- Assumption of variation in \(\bX_{it}\) for unit \(i\)
- May be tougher to satisfy if \(T\) is small and \(\bX_{it}\) has discrete regressors

Mean Group Estimator

Definition 1 The mean group (MG) estimator is defined as \[ \hat{\bbeta}^{MG} = \dfrac{1}{N}\sum_{i=1}^N \hat{\bbeta}_i \]

Cross-sectional average of individual OLS estimators

MG Estimation with Singular \((\bX_i'\bX_i)\)

So far: assumed that all units had invertible \((\bX_i'\bX_i)\)
If not true, can average only units which have invertible \((\bX_i'\bX_i)\) (=can compute \(\hat{\bbeta}_i\))

Can define MG estimator over such units as

\[ \small \hat{\bbeta}^{MG, +} = \dfrac{1}{\sum_{i=1}^N \I\curl{\det(\bX_i'\bX_i)>0} } \sum_{i=1}^N \I\curl{\det(\bX_i'\bX_i)>0} \hat{\bbeta}_i \]

Asymptotic Properties

Representation of \(\hat{\bbeta}^{MG}\)

Key expansion: \[ \begin{aligned} \hat{\bbeta}^{MG} & = \dfrac{1}{N}\sum_{i=1}^N \bbeta_i + \dfrac{1}{N} \sum_{i=1}^N (\bX_i'\bX_i)^{-1}\bX_i'\bU_i \end{aligned} \]

How to prove asymptotic properties?
Recall: usually want to impose assumptions on the “primitive” components \((\bbeta_i, \bX_i, \bU_i)\)

Technique: Represent in Terms of One Vector

Represent \(\hat{\bbeta}^{MG}\) as a continuous transformation of a single vector average: \[ \hat{\bbeta}^{MG} = \begin{pmatrix} \bI_p & \bI_p \end{pmatrix} \left[ \dfrac{1}{N} \sum_{i=1}^N \underbrace{\begin{pmatrix} \bbeta_i \\ (\bX_i'\bX_i)^{-1}\bX_i'\bU_i \end{pmatrix}}_{=\bW_i} \right] \]

Can now prove consistency and asymptotic normality results by applying LLN+CLT to \(\bW_i\), then use CMT (exercise)

Consistency of the MG Estimator

Proposition 1 If

Units are IID
\(\E[\bU_i|\bX_i]=0\); \(\E\left[\norm{(\bX_i'\bX_i)^{-1}\bX_i'\bU_i}\right]<\infty\), \(\E[\norm{\bbeta_i}]<\infty\)

Then as \(N\to\infty\), \(T\geq p\) fixed \[ \hat{\bbeta}^{MG} \xrightarrow{p} \E[\bbeta_i] \]

Asymptotic Distribution

Proposition 2 If

Units are IID
\(\E[\bU_i|\bbeta_i, \bX_i]=0\); \(\E\left[\norm{(\bX_i'\bX_i)^{-1}\bX_i'\bU_i}^2\right]<\infty\), \(\E[\norm{\bbeta_i}^2]<\infty\)

Then as \(N\to\infty\), \(T\geq p\) fixed \[\small \sqrt{N}\left(\hat{\bbeta}^{MG} - \E[\bbeta_i] \right) \xrightarrow{d} N\left(0, \var(\bbeta_i) + \var\left( (\bX_i'\bX_i)^{-1}\bX_i'\bU_i \right) \right) \]

Estimating the Asymptotic Variance

Need \(\widehat{\avar}(\hat{\bbeta}^{MG})\) if we want to do inference on \(\E[\bbeta_i]\)

Correct answer — the simplest one:

\(\hat{\bbeta}^{MG}\) — sample average of \(\hat{\bbeta}_i\)
Way to estimate variance — sample variance of \(\hat{\bbeta}_i\): \[ \small \begin{aligned} & \dfrac{1}{N}\sum_{i=1}^N(\hat{\bbeta_i}-\hat{\bbeta}^{MG})(\hat{\bbeta_i}-\hat{\bbeta}^{MG})'\xrightarrow{p} \var\left(\hat{\bbeta}_i \right) = \avar\left(\hat{\bbeta}^{MG}\right) \end{aligned} \]

Empirical Illustration

Context and Setting

Reminder: Empirical Question

Recall: studying relationship between pollution and earnings using approach of Borgschulte, Molitor, and Zou (2024)

Data: quarterly county-level data from the USA
Exogenous measure of pollution: number of days with wildfire smoke
Outcomes: average earnings in county

Data availabe from the Harvard dataverse

Reminder: Previous Empirical Strategy

Last time assumed that two random intercepts sufficient to capture important unobserved components:

County-season
State-year

Unit of analysis effectively county in given season (e.g. King County in summer is a single \(i\))

After that assumed that effect of pollution was same between places

Allowing Different Effects

What if pollution effects differ between counties and seasons?

Specify model as example of (1) \[ \small Y_{it}^{\bx} = \alpha_i + \beta_i\text{Smoke Days}_{it} + \text{State-Year FE} + U_{it} \] \(i\) — county in given season; \(t\) — years
Keep state-year effect to capture “global” effects
\(\E[\beta_i]\) — object of interest

Estimation and Results

Estimation Steps

Our approach to estimating with \(\E[\beta_i]\) with mean group estimator:

Eliminate state-year FE
Compute all individual effects
Compute MG estimator

Step (1) not part of MG estimation, included here due to our model

Eliminating State-Year FE

How do we eliminate state-year random intercept?
One-way within transformation:
- Average inside each (state-year)
- Take out that average from \(Y_{it}\) and \(X_{it}\)

Loading, preparing data. Eliminating state-year random intercepts

# Load data
columns_to_load = [
    "countyfip",                # FIPS
    "rfrnc_yr",                 # Year
    "rfrnc_qtroy",              # Quarter 
    "d_pc_qwi_payroll",         # Earnings (annual diff) 
    "hms_deep",                 # Number of smoke days
    "fe_countyqtroy",
    "fe_styr",
    "fe_stqtros",
    "seer_pop",                 # Population 
]
county_df = pd.read_csv(
  "data/county_quarter.tab", 
  sep="\t", 
  usecols=columns_to_load,
)

# Rename columns
column_rename_dict = {
  "countyfip":"fips",
  "rfrnc_yr":"year",
  "rfrnc_qtroy":"quarter",
  "d_pc_qwi_payroll":"diff_payroll", 
  "hms_deep":"smoke_days",
  "fe_countyqtroy":"fe_id_county_quarter",
  "fe_styr":"fe_id_state_year",
  "fe_stqtros":"fe_id_state_quarter",
  "seer_pop":"population",
}
county_df = county_df.rename(columns=column_rename_dict)
 
# Drop missing rows
county_df.dropna(inplace=True)

# Subtract state-year averages
county_df[["diff_payroll_no_state_year", "smoke_days_no_state_year"]] = (
  county_df.groupby("fe_id_state_year")[["diff_payroll", "smoke_days"]]
  .transform(lambda x: x-x.mean())
)

Computing The MG Estimator

Mean group not implemented anywhere (to the best of my knowledge), but very easy to do by hand

With a for-loop
By grouping the data based on county-quarter ID

A small function for running individual regressions:

Function for running unit-level OLS

def run_individual_regression(county_quarter_df: pd.DataFrame) -> np.float64:
  """
  Compute the OLS coefficient for 'smoke_days_no_state_year' on 'diff_payroll' 
  for a given county-quarter DataFrame.

  Parameters
  ----------
  county_quarter_df : pd.DataFrame
    DataFrame containing at least the columns 'diff_payroll' and 'smoke_days_no_state_year'.

  Returns
  -------
  np.float64
    Estimated coefficient for 'smoke_days_no_state_year'.

  Raises
  ------
  KeyError
    If required columns are missing.
  ValueError
    If regression cannot be performed (e.g., insufficient data).
  """ 

  try:
    y = county_quarter_df['diff_payroll_no_state_year']
    X = sm.add_constant(county_quarter_df['smoke_days_no_state_year'])
    model = sm.OLS(y, X)
    results = model.fit()
    coef = results.params['smoke_days_no_state_year']
    return coef
  except KeyError as e:  # Column missing
      raise ValueError(f"Required column missing: {e}")
  except ValueError as e:  # Regression-related issues
      raise ValueError(f"Regression could not be performed: {e}")

Apply OLS to each \(i\):

Running individual estimation

ind_ests = (
  county_df.groupby("fe_id_county_quarter") 
  .apply(lambda x: run_individual_regression(x), include_groups=False)
)
ind_ests.name = "ind_estimates"

# Merge back with data
ind_ests = (
  county_df[['fips', 'fe_id_county_quarter', 'quarter']]
    .drop_duplicates()
    .merge(ind_ests, left_on="fe_id_county_quarter", right_index=True)
)

Results

Our mean group estimates in Q3 (~summer)

ind_ests.loc[ind_ests["quarter"]==3, "ind_estimates"].mean()

np.float64(-6.10718118267323)

Negative and not far from FE value

But:

ind_ests.loc[ind_ests["quarter"]==3, "ind_estimates"].std()

np.float64(20.590048177645293)

Large standard deviation! Cannot reject \(\E[\beta_i]=0\) — do not have the finding of Borgschulte, Molitor, and Zou (2024) in this model

Visualizing Density of \(\hat{\beta}_i\)

Why Such a Large Standard Deviation?

Density of \(\hat{\beta}_i\) wide, could be due to two reasons:

Large \(\var(\beta_i)\). Means \(\E[\beta_i]\) would be difficult to estimate even without estimation noise
Large \(\var((\bX_i'\bX_i)^{-1}\bX_i'\bU_i)\) — estimation noise dominates

In first case collecting more \(T\) would not help

Visualizing Geography of Effects

Recap and Conclusions

Recap

In this lecture we

Proposed the mean group estimator as an approach to estimating \(\E[\bbeta_i]\)
Discussed its asymptotic properties

Next Questions

What if we don’t care about causal parameters and only want to predict outcomes?
When does strict exogeneity fail? What is the impact of that failure?
How to work with panel data in nonlinear settings?

References

Borgschulte, Mark, David Molitor, and Eric Yongchen Zou. 2024. “Air Pollution and the Labor Market: Evidence from Wildfire Smoke.” Review of Economics and Statistics 106 (6): 1558–75. https://doi.org/10.1162/rest_a_01243.

Morozov, Vladislav. 2025. “Econometrics with Unobserved Heterogeneity.” Econometrics with Unobserved Heterogeneity. https://zenodo.org/doi/10.5281/zenodo.15459848. https://doi.org/10.5281/ZENODO.15459848.

Pesaran, M. Hashem. 2015. Time Series and Panel Data Econometrics. Oxford University Press. https://doi.org/10.1093/acprof:oso/9780198736912.001.0001.