Fixed Effect Estimation

Beyond Binary Treatments: Fixed Effects Estimators and their Properties

Vladislav Morozov

Introduction

Lecture Info

Learning Outcomes

This lecture is about handling more general treatments in panel data using “fixed effect/random intercepts” estimators


By the end, you should be able to

  • Describe the fixed effect estimation procedure
  • Establish causal properties of such estimators under homogeneous and heterogeneous effects

References

Textbooks:

  • Chapter 16 in Huntington-Klein (2025)
  • Chapter 13, 14-1, 14-4, 14-5 in Wooldridge (2020)
  • Chapter 17 in Hansen (2022) (except dynamic panels and random effects)

Empirical Motivation

Empirical Question

How strongly does pollution affect labor market outcomes?


  • We know that pollution is bad for health
  • But how does it affect economic activity, particularly earnings and employment?

Challenge: Endogeneity

Cannot just regress labor market outcomes on overall pollution

  • Two-way causality, more economically active places tend to have more pollution
  • Simple regression will suffer from endogeneity


  • Can solve endogeneity with instrumental variables
  • But those are difficult to find

Another Approach

  • Find pollution not driven by (your own) economic activity
  • But some places may be more likely to have this pollution \(\Rightarrow\) this would affect decisions of people to live there


What if we could control for this likelihood?

  • How — topic of lecture
  • Application: how Borgschulte, Molitor, and Zou (2024) solve the issue

Motivation and Questions

Reminder: TWFE

Recall: for difference-in-differences showed that \[ \small \widehat{ATT}^{DiD} = \hat{\delta} \] where \(\hat{\delta}\) was the OLS estimator in regression \[\small Y_{i2}- Y_{i1} = \gamma + \delta D_{it} + U_{i2} \tag{1}\] for \(\delta\) — the ATT; \(\gamma\) — average change in outcomes (trend)

Reminder: More General Equation

Equation 1 obtained by differencing the two-way fixed effect equation: \[ Y_{it} = \alpha_i + \gamma_t + \delta D_{it} + U_{it}, \tag{2}\] where \[ \gamma_1 = 0, \quad \gamma_2 = \gamma \] and \(\alpha_i = Y_{i1}^0\) — baseline differences between units

Reminder: Estimation


  • We know how to apply OLS based on Equation 1: just regress \((Y_{i2}-Y_{i1})\) on \((1, D_{it})\) with OLS
  • Last time said that can also apply OLS based on Equation 2 directly (e.g. PanelOLS from linearmodels)
    • Treated \(\alpha_i\) as parameters
    • Got exactly the same results from two approaches

Lecture Questions

  1. How to apply OLS on Equation 2?
  • What are the regressors? How to “treat \(\alpha_i\) as parameters”?
  • How does it work in practice?
  • Is the estimator inspired by Equation 2 always equal to the one based on Equation 1?
  1. Can we apply the same approach with general (not just 0/1) treatment? What are the causal properties?

Fixed Effect Estimation

Random Intercept (Fixed Effect) Models

First Goal: Vector-Matrix Representation

First question: “treating \(\alpha_i\) as parameters”?


For now forget about \(D_{it}\) and \(\gamma\) and consider: \[ Y_{it} = \alpha_i + U_{it}, \quad i=1,\dots, N; t=1, \dots, T \tag{3}\] \(\alpha_i\) — individual-specific intercept (“unit fixed effect”). Data assumed balanced (same \(T\) for all units)


Want to represent Equation 3 in vector-matrix form

Vector-Matrix Forms for Panel Data I

Before that: more info on matrix forms for panel data.


Vector form as before: single observation (now fixed \(i\) and \(t\)) with vector of covariates: \[ Y_{it} = \bX_{it}'\bbeta + U_{it} \]

Vector-Matrix Forms for Panel Data II

Two key matrix forms:

  • Individual level. Let \(\bY_i = (Y_{i1}, \dots, Y_{iT})\), \(\bX_i = (\bX_{i1}, \dots, \bX_{iT})'\), then \[\small \bY_i = \bX_i\bbeta + \bU_i \]

  • Full sample. Let \(\bY = (\bY_1, \dots, \bY_N)\), \(\bX= (\bX_1', \dots, \bX_N')'\). Then \[ \small \bY = \bX\bbeta + \bU \] What are the dimensions of \(\bY_i, \bX_i, \bY, \bX\)?

Individual Matrix Form with Individual Intercepts


Model (3) in individual matrix form: \[ \bY_i = \mathbf{1}_T\alpha_i + \bU_i \] where \(\mathbf{1}_T\)\(T\)-vector of ones


Not that insightful

Full Sample Matrix Form with Individual Intercepts

Model (3) in full sample matrix form \[ \begin{aligned} \bY & = \bF \bLambda + \bU, \\ \bLambda & = (\alpha_1, \dots, \alpha_N)', \\ \bF & = \bI_N \otimes \mathbf{1}_T, \end{aligned} \tag{4}\] where \(\otimes\) is the Kronecker product. Intuition:

  • There are \(N\) regressors, \(i\)th regressor is the dummy of being the \(i\)th unit (0/1 regressor values)
  • \(\bLambda\) — associated parameter vector

More Complex Example: Two-Way Intercept Model

Now consider more general model: \[ Y_{it} = \alpha_i + \gamma_t + U_{it} \] Here want to treat both \(\alpha_i\) and \(\gamma_t\) as parameters


Individual matrix form: \[ \begin{aligned} \bY_i & = \mathbf{1}_T \alpha_i + \bI_T \bgamma + \bU_i\\ \bgamma & = (\gamma_1, \dots, \gamma_T)' \end{aligned} \]

Two-Way Model: Full Sample Matrix Form

Can write \[ \begin{aligned} \bY & = \bF\bLambda + \bU, \\ \bF & = \left(\bI_N \otimes \mathbf{1}_T, \mathbf{1}_N\otimes \bI_T \right)\\ \bLambda & = (\alpha_1, \dots, \alpha_N, \gamma_1, \dots, \gamma_T) \end{aligned} \]

  • Both \(\alpha_{\cdot}\) and \(\gamma_{\cdot}\) treated as parameters
  • Regressors in \(\bF\): \(N\) dummy regressors from before; \(T\) new dummy regressors, \(t\)th new regressor — indicator of \(t\)th period

Adding Other Covariates

Can write Equation 2 as \[ Y_{it} = \alpha_i + \gamma_t + \bX_{it}'\bbeta + U_{it}, \] for \(\bX_{it} = (D_{it})\) and \(\bbeta = (\delta)\)

More generally, consider any vector \(\bX_{it}\) not just binary treatments


Its matrix form is \[ \bY = \bF\bLambda + \bX\beta + \bU \]

Random-Intercept (Fixed Effects) Models

Definition 1 Models of the kind \[ \small \bY = \bF\bLambda +\bX\bbeta + \bU, \tag{5}\] where \(\bF\) is a matrix of 0s and 1s are called fixed effects or random intercept models

  • Fixed effects and random intercepts — often used interchangeably
  • Random intercepts — less ambiguous

Examples of Model (5)

  • Individual fixed effects/intercepts (one-way) \[ Y_{it} = \alpha_i + \bX_i'\bbeta + U_{it} \]
  • Two-way models (time and individual effects): \[ Y_{it} = \alpha_i + \gamma_t + \bX_i'\bbeta + U_{it} \]
  • Can include more complicated effects, see empirical illustration (where \(i\) — US counties, \(t\) — quarters; effects — county-season and state-year)

Fixed Effect (Within) Estimators

Estimation Strategies

Suppose \(\E[U_{it}|\bX_i]=0\). How to estimate parameters of Model (5)?


There are two main strategies:

  • Estimate including \(\bF\) and \(\bX\) as regressors — least squares dummy variable (LSDV) estimator
  • Get rid of \(\bF\), estimate after — within estimator

LSDV Estimation

LSDV — simply regress \(\bY\) on \((\bF, \bX)\): \[ (\hat{\bLambda}, \hat{\bbeta}^{LSDV}) = \argmin_{\bL, \bb} \norm{\bY - \bF\bL -\bX\bb }_2^2 \]

For example with two-way effects:

\[\small \begin{aligned} & \left(\hat{\alpha}_1, \dots, \hat{\alpha}_N, \hat{\gamma}_1, \dots, \hat{\gamma}_T, \hat{\bbeta}^{LSDV} \right)\\ & = \argmin_{a_1, \dots, a_N, g_1, \dots, g_T, \bb}\sum_{i=1}^N \sum_{t=1}^T \left(Y_{it} - a_i - g_t - \bX_{it}'\bb \right)^{2} \end{aligned} \]

Within Estimation I: One-Way Transformation

First consider one-way model \(Y_{it} = \alpha_{i} + \bX_{it}' + U_{it}\). For \(\bW_{it} = Y_{it}, \bX_{it}, U_{it}\), define the (one-way) within-transformed version of \(W_{it}\) as \[ \small \tilde{W}_{it} = W_{it} - \dfrac{1}{T}\sum_{s=1}^T W_{is} \tag{6}\]

Within transformation eliminated fixed effects (=\(\bF\)) \[ \small \tilde{Y}_{it} = \tilde{\bX}_{it}'\bbeta + \tilde{U}_{it} \]

Within Estimation II: Two-Way Transformations

Suppose \(Y_{it} = \alpha_i + \gamma_t + \bX_{it}'\bbeta + U_{it}\). Define (two-way) within-transformed variables as \[ \small \tilde{W}_{it} = W_{it} - \dfrac{1}{T}\sum_{s=1}^T W_{is} - \dfrac{1}{N} \sum_{j=1}^N W_{jt} + \dfrac{1}{NT} \sum_{j=1}^N \sum_{s=1}^T W_{js} \tag{7}\]

Again eliminated fixed effects (=\(\bF\)) \[ \small \tilde{Y}_{it} = \tilde{\bX}_{it}'\bbeta + \tilde{U}_{it} \]

More General Within Transformation

Consider general model: \[ \small \bY = \bF\bLambda + \bX\bbeta + \bU \]

There exists a linear transformation that eliminates \(\bF\): \[ \small \tilde{\bY} = \tilde{\bX}\bbeta + \tilde{\bU} \]

Called the FWL or the (generalized) within transformation

Within Estimation

Within estimation: just regressing \(\tilde{\bY}\) on \(\tilde{\bX}\) with OLS: \[ \hat{\bbeta}^{W} = \argmin_{\bb} \sum_{i=1}^N \sum_{t=1}^T (\tilde{Y}_{it} - \tilde{\bX}_{it}'\bb)^2 \]

  • \(\tilde{\bX}\) must have maximum column rank (no collinearity in transformed regressors)
  • Intuition in one-way case (only \(\alpha_i\)): some variation in \(\bX_{it}\) over time

Equivalence of Approaches

Proposition 1 \[ \hat{\bbeta}^{LSDV} = \hat{\bbeta}^{W} \]

  • Both approaches: same estimated values
  • Explains why we got same results from two regression approaches to DiD last time
  • Allows to use single name for both estimators. Usually called fixed effects or random intercept estimators

Which Approach to Use?

When to use LSDV vs. within estimation?

  • LSDV: only when you care about \(\Lambda\) and they have some economic meaning. Example: \(\alpha_i\) is the innate skill of worker \(i\) (e.g. de la Roca and Puga 2017)
  • Within: in all other cases

Sometimes impossible to compute LSDV estimator: number of fixed effects is too large to even simply store the data matrix:

  • Called the “high-dimensional” fixed effect case
  • In practice the within transformation is cleverly done indirectly (Correia 2016)

Pooled OLS


Another special case of model (5) — pooled OLS:

  • No fixed effects, directly regressing \(Y_{it}\) on \(\bX_{it}\)
  • Fully ignoring the panel structure
  • Involves no transformations and no \(\bF\)
  • Interpretation: the least flexible example of (5)

Implementation in Python

  • linearmodels supports both LSDV and within transformations
    • Defaults to eliminating effects
    • LSDV can be used with PanelOLS.fit(use_lsdv=True)
  • pyfixest was designed for high-dimensional FE estimation (can handle small examples too)
    • LSDV not available (to the best of my knowledge)
    • See empirical application for example usage

Causal Properties with General Treatment

Question: Causal Properties of FE Estimators

So far:

  • Now have described the estimation approach underlying DiD regression estimation
  • Noticed that it can handle general treatments \(\bX_{it}\), not just scalar binary \(D_{it}\)

What are the causal properties of such estimators? Under which models do they give meaningful results?

Reflection on our Approach

Note:

  • Approach this problem differently from past lectures:
    • Here first formulate an estimator and only after start understanding causal properties
    • Usually goes the other way: fix causal problem, try to figure out identification and estimation
  • Doing so for historic reasons — these estimators came first, serious causal thinking more recently

Models Considered


Will consider two kinds of models under strict exogeneity

  • Random intercept causal process with same \(\bbeta\) for everyone — reflects estimator structure
  • Model with heterogeneous effects \(\bbeta_i\)

Properties under Random Intercept Model

Causal Framework with Random Intercepts

  • Some vector of treatments \(\bX_{it}\)
  • Potential outcome of unit \(i\) in time \(t\) given by \[ Y^{\bx}_{it} = \alpha_i + \gamma_t + \bx'\bbeta + U_{it} \tag{8}\]
  • Will think about about exogeneity conditions later
  • Object of interest — \(\bbeta\) (plays the role of ATE, ATT, …)

For definiteness, we do two-way effects, but can apply same analysis for any configuration of random intercepts, just need to define \(\tilde{Y}_{it}\) appropriately

Sampling Setting

Work in the following setting

  • Units drawn IID
  • \(N\) large, \(T\) fixed
    • More typical kind of panel data (“micro” panel)
    • Contrast with “large” panel data with both \(N\) and \(T\) large

Causal Framework: Discussion of Model (8)

  • Contrast with less flexible causal model discussed before: \[ Y_{it}^{\bx} = \bx'\bbeta + U_{it} \]
  • Interpretations:
    • \(\alpha_i\) — often some characteristic of \(i\) that does not change over \(t\) in sample (e.g. innate intellect)
    • \(\gamma_t\) — shocks that affect everyone equally
    • Similar logic for other types of random interecepts

Estimator for \(\bbeta\)

  • Estimate \(\bbeta\) with the FE estimator, expressed as (Proposition 1) \[ \small \begin{aligned} \hat{\bbeta}^{FE} & = \left(\sum_{i=1}^N \sum_{t=1}^T \tilde{\bX}_{it}\tilde{\bX}_{it}' \right)^{-1}\sum_{i=1}^N \sum_{t=1}^T \tilde{\bX}_{it}\tilde{Y}_{it} \\ & = \left( \sum_{i=1}^N \tilde{\bX}_i'\tilde{\bX}_i \right)\sum_{i=1}^N \tilde{\bX}_i'\tilde{\bY}_i \end{aligned} \] with variables transformed as in Equation 7
  • Will discuss nonsingularity assumptions a bit later

Probability Limit of \(\hat{\bbeta}^{FE}\)

Realized data satisfies \(\tilde{\bY} = \tilde{\bX}_{it}'\bbeta + \tilde{U}_{it}\)


So as \(N\to\infty\) and \(T\) is fixed: \[ \small \hat{\bbeta}^{FE} \xrightarrow{p} \bbeta + \left( \E[\tilde{\bX}_{i}'\tilde{\bX}_i]\right)^{-1} \E[\tilde{\bX}_{i}'\tilde{\bU}_{i}] \]

\(T\) fixed — basically treat each unit a single \(T\)-dimensional observation (as in \(\tilde{\bU}_i\))

Rank (Nonsingularity) Conditions

So far needed to impose nonsingularity conditions

  • Sample: on \(\sum_{i=1}^N \tilde{\bX}_i'\tilde{\bX}_i\)
  • Population: on \(\E[\tilde{\bX}_{i}'\tilde{\bX}_i]\)

What does it require of \(\bX_{it}\)?

  • No collinearity
  • Also variation after the within transformation (one-way: variation over time; two-way: different variation over time for different units; etc.)

Towards Exogeneity Conditions

For consistency want \[ \small \E[\tilde{\bX}_i'\tilde{\bU}_i] = \sum_{t=1}^T \E\left[\tilde{\bX}_{it} \tilde{U}_{it} \right] =0 \]

Sufficient that for all \(t\) \[ \small \E\left[\tilde{\bX}_{it} \tilde{U}_{it} \right] =0 \tag{9}\]

What does this condition require of \(\bX_{it}\) and \(U_{it}\)?

Exogeneity in the One-Way Case

Under one-way transformation (6), Equation 9 becomes \[ \scriptsize \E\left[\tilde{\bX}_{it} \tilde{U}_{it} \right] = \E\left[ \bX_{it}U_{it} - \dfrac{\bX_{it}}{T}\sum_{s=1}^T U_{is} - \dfrac{U_{it}}{T}\sum_{r=1}^T \bX_{ir} + \dfrac{1}{T^2} \sum_{s=1}^T\sum_{r=1}^T \bX_{is} U_{ir}\right] = 0 \]

Here would be sufficient that for all \(t\), \(s\) \[ \small \E[\bX_{it}U_{is}] = 0 \tag{10}\] Intuition: \(\bX_{it}\) and \(U_{is}\) are uncorrelated across all points in time

Strict Exogeneity in the Panel Case

What about beyond one-way effects? Usually impose an assumption that covers all cases — panel data version of strict exogeneity :

Assumption (strict exogeneity): \[ \E[U_{it} | \bX_{i1}, \dots, \bX_{iT}] =0 \]

Much stronger than just \(\E[U_{it}|\bX_{it}]=0\)

Strict Exogeneity Implies Equation 9

Proposition 2 Let \(\E[U_{is} | \bX_{i1}, \dots, \bX_{iT}] =0\) for all \(s\). Then for any within transformation it holds for all \(t\) that \[ \E[\tilde{\bX}_{i}'\tilde{\bU}_i] =0 \]

  • Proof by properties of conditional expectations
  • Covers all configurations of random intercepts

Consistency Result

Proposition 3 Let

  1. \((\bX_i, \bU_i)\) be IID and model (8) be true
  2. \(\E[\tilde{\bX}_i'\tilde{\bX}_i]\) exist and be invertible
  3. \(\E[U_{it} | \bX_{i1}, \dots, \bX_{iT}] =0\)

Then as \(N\to\infty\)

\[ \hat{\bbeta}^{FE} \xrightarrow{p} \bbeta \]

Asymptotic Distribution

Proposition 4 Let

  1. \((\bX_i, \bU_i)\) be IID and model (8) be true
  2. \(\E[\tilde{\bX}_i'\tilde{\bX}_i]\) exist and be invertible; \(\E\left[\norm{\tilde{\bX}_i'\tilde{\bU}_i}^2\right]<\infty\)
  3. \(\E[U_{it} | \bX_{i1}, \dots, \bX_{iT}] =0\)

Then as \(N\to\infty\)

\[ \scriptsize \sqrt{N}\left(\hat{\bbeta}^{FE} - \bbeta\right) \xrightarrow{d} N\left(0, \left(\E[\tilde{\bX}_i'\tilde{\bX}_i] \right)^{-1} \E[\tilde{\bX}_i'\tilde{\bU}_i\tilde{\bU}_i'\tilde{\bX}_i] \left(\E[\tilde{\bX}_i'\tilde{\bX}_i] \right)^{-1}\right) \]

Discussion of Asymptotic Results for \(\hat{\bbeta}^{FE}\)

  • Can do inference and estimate errors in more or less standard way (confidence intervals, hypothesis tests, …)
  • Proof of asymptotic normality — exercise
  • Not examinable technical point: some within transformations (e.g. two-way) can create dependence across \(i\). This dependence disappears as \(N\to\infty\) and does not affect asymptotics

Properties under Heterogeneous Coefficient Model

Causal Framework with Heterogeneous Coefficients

Consider different potential outcomes setting: \[ \small Y_{it}^{\bx} = \bx'\bbeta_{\textcolor{teal}{i}} + U_{it} \tag{11}\] under strict exogeneity \(\E[U_{it} | \bX_{i1}, \dots, \bX_{iT}] =0\)

  • Special case: unit-specific intercepts
  • Does not nest two-way or other random intercepts with time variation
  • (11) interesting because it allows heterogeneous effects of same change in \(\bX_{it}\)

FE Estimator

Can still use the FE estimator: \[ \hat{\bbeta}^{FE} = \left(\sum_{i=1}^N \tilde{\bX}_i'\tilde{\bX}_i \right)^{-1}\sum_{i=1}^N \tilde{\bX}_i'\tilde{\bY}_i \] Can do any transformation such that \(\E[\tilde{\bX}_i'\tilde{\bX}_i]\) is invertible


What does \(\hat{\bbeta}^{FE}\) do under model (11)?

Expanding Estimator and Taking Limits

Substituting model (11) gets us \[ \small \begin{aligned} \hat{\bbeta}^{FE} & = \left(\sum_{i=1}^N \tilde{\bX}_i'\tilde{\bX}_i \right)^{-1}\sum_{i=1}^N \tilde{\bX}_i'\tilde{\bX}_i\bbeta_i + \left(\sum_{i=1}^N \tilde{\bX}_i'\tilde{\bX}_i \right)^{-1}\sum_{i=1}^N \tilde{\bX}_i'\tilde{\bU}_i\\ & \xrightarrow{p} \E\left[ \bW(\tilde{\bX}_i) \bbeta_i\right] \end{aligned} \] for \(\small\bW(\tilde{\bX}_i) = \left(\E\left[\tilde{\bX}_i'\tilde{\bX}_i\right] \right)^{-1} \tilde{\bX}_i'\tilde{\bX}_i\)

Discussion of \(\hat{\bbeta}^{FE}\) under Heterogeneous Effects

  • Result: FE estimator estimate weighted average of \(\bbeta_i\)
    • Weights are positive definite and sum to one
    • Weights depend on within transformation used
  • In general \(\E\left[ \bW(\tilde{\bX}_i) \bbeta_i\right]\neq \E[\bbeta_i]\) (except RCTs)
  • Careful: \(\E\left[ \bW(\tilde{\bX}_i) \bbeta_i\right]\) and \(\E[\bbeta_i]\) can even have different signs sometimes

Extension: Factor Models

Can generalize random intercept models to the form \[ Y_{it}^{\bx} = \balpha_i'\bgamma_t + \bx'\bbeta_i + U_{it} \]

  • \(\bgamma_t\) — unobserved “factors”, \(\balpha_i\) — factor loadings
  • Allows people to react differently to same shocks in \(\bgamma_t\)
  • See chapter 29 in Pesaran (2015)

Empirical Application

Context and Setting

Empirical Question

Recall empirical question:

How does pollution affect labor market outcomes?


  • Want to answer question without instruments
  • Need some “economic activity-exogenous” measure of pollution
  • Also need to control “predisposition” to such pollution

Data

Use data from Borgschulte, Molitor, and Zou (2024)

  • Quarterly data for all 3142 counties in the US over 2007-2019 (\(T\approx 48\))
  • Data on average earnings and employment in county
Loading and preparing data
# Load data
columns_to_load = [
    "countyfip",                # FIPS
    "rfrnc_yr",                 # Year
    "rfrnc_qtroy",              # Quarter 
    "d_pc_qwi_payroll",         # Earnings (annual diff) 
    "hms_deep",                 # Number of smoke days
    "fe_countyqtroy",
    "fe_styr",
    "fe_stqtros",
    "seer_pop",                 # Population 
]
county_df = pd.read_csv(
  "data/county_quarter.tab", 
  sep="\t", 
  usecols=columns_to_load,
)

# Rename columns
column_rename_dict = {
  "countyfip":"fips",
  "rfrnc_yr":"year",
  "rfrnc_qtroy":"quarter",
  "d_pc_qwi_payroll":"diff_payroll", 
  "hms_deep":"smoke_days",
  "fe_countyqtroy":"fe_id_county_quarter",
  "fe_styr":"fe_id_state_year",
  "fe_stqtros":"fe_id_state_quarter",
  "seer_pop":"population",
}
county_df = county_df.rename(columns=column_rename_dict)
 
# View data
county_df.dropna(inplace=True)
county_df.head(2)
fips year quarter smoke_days population fe_id_state_year fe_id_county_quarter fe_id_state_quarter diff_payroll
4 1001 2007 1 0.0 52405.0 2 1 5 25.800293
5 1001 2007 2 9.0 52405.0 2 2 6 33.243042

Pollution Measure

Pollution — number of smoke days because of wildfires

  • Wildfire smoke can travel far — “exogenous”
  • Some places at great risk of fires or persistent smoke — how to handle impact?

Distribution of Pollution Measure

Specification and Estimation

Key Variables and Homogeneity Assumption

  • Outcome: change in average earnings (employment — exercise)
  • Treatment: number of smoke days in quarter
  • Analysis level: \(i\) — counties, \(t\) — quarters (no aggregation)
  • Assumption: treatment has same effect in all \((i, t)\) and at all levels. Assumed potential outcomes model \[\small (\Delta \text{Earnings}_{it})^{\text{Smoke days}} = \beta\text{Smoke days} + \text{FEs} + U_{it} \]

Random Intercepts

Need to choose random intercepts/FEs so that strict exogeneity holds

Specification of Borgschulte, Molitor, and Zou (2024): include

  • County-season intercepts (capture effect of geography, differing per season)
  • State-year (capture overall economic trends)

About 13000 different random intercepts (a bit more complicated than \(\alpha_i\) and \(\delta_t\))

Estimation

  • Will use pyfixest this time to estimate
  • feols for fixed effect estimation
  • Regression formula in fml, random intercepts after |
import pyfixest as pf

results = pf.feols(
    fml="diff_payroll ~ smoke_days | fe_id_state_year + fe_id_county_quarter", 
    data=county_df, 
    vcov={"CRV1": "fips + fe_id_state_quarter",}, 
    weights="population",
)

Estimation Results

  • An additional day reduces quarterly earning about $5.20 on average — significant effect
  • Clustered standard errors (p. 77 in Cunningham 2021)
pf.etable(results)
diff_payroll
(1)
coef
smoke_days -5.217***
(0.774)
fe
fe_id_county_quarter x
fe_id_state_year x
stats
Observations 160346
S.E. type by: fips+fe_id_state_quarter
R2 -
Significance levels: * p < 0.05, ** p < 0.01, *** p < 0.001. Format of coefficient cell: Coefficient (Std. Error)

Recap and Conclusions

Recap

In this lecture we

  1. Discussed fixed effect estimators for DiD and beyond
    • LSDV
    • Within transformation
  2. Proved causal properties of FE estimators under
    • A random intercept model
    • Model with unit-specific coefficients

Next Questions


  • What if you want to estimate \(\E[\bbeta_i]\)?
  • When does strict exogeneity fail? Can you relax it?
  • What does panel data let you do in nonlinear settings?

References

Baltagi, Badi H. 2021. Econometric Analysis of Panel Data. Springer Texts in Business and Economics. Cham: Springer International Publishing. https://doi.org/10.1007/978-3-030-53953-5.
Borgschulte, Mark, David Molitor, and Eric Yongchen Zou. 2024. “Air Pollution and the Labor Market: Evidence from Wildfire Smoke.” Review of Economics and Statistics 106 (6): 1558–75. https://doi.org/10.1162/rest_a_01243.
Correia, Sergio. 2016. “A Feasible Estimator for Linear Models with Multi-Way Fixed Effects.”
Cunningham, Scott. 2021. Causal Inference: The Mixtape. Yale University Press. https://doi.org/10.2307/j.ctv1c29t27.
de la Roca, Jorge, and Diego Puga. 2017. “Learning By Working in Big Cities.” Review of Economic Studies 84 (1): 106–42. https://doi.org/10.1093/restud/rdw031.
Hansen, Bruce. 2022. Econometrics. Princeton_University_Press.
Huntington-Klein, Nick. 2025. The Effect: An Introduction to Research Design and Causality. S.l.: Chapman and Hall/CRC.
Pesaran, M. Hashem. 2015. Time Series and Panel Data Econometrics. Oxford University Press. https://doi.org/10.1093/acprof:oso/9780198736912.001.0001.
Wooldridge, Jeffrey M. 2020. Introductory Econometrics: A Modern Approach. Seventh edition. Boston, MA: Cengage.