Fixed Effect Estimation

Beyond Binary Treatments: Fixed Effects Estimators and their Properties

Vladislav Morozov

Introduction

Lecture Info

Learning Outcomes

This lecture is about handling more general treatments in panel data using “fixed effect/random intercepts” estimators

By the end, you should be able to

Describe the fixed effect estimation procedure
Establish causal properties of such estimators under homogeneous and heterogeneous effects

References

Textbooks:

Chapter 16 in Huntington-Klein (2025)
Chapter 13, 14-1, 14-4, 14-5 in Wooldridge (2020)
Chapter 17 in Hansen (2022) (except dynamic panels and random effects)

Empirical Motivation

Empirical Question

How strongly does pollution affect labor market outcomes?

We know that pollution is bad for health
But how does it affect economic activity, particularly earnings and employment?

Challenge: Endogeneity

Cannot just regress labor market outcomes on overall pollution

Two-way causality, more economically active places tend to have more pollution
Simple regression will suffer from endogeneity

Can solve endogeneity with instrumental variables
But those are difficult to find

Another Approach

Find pollution not driven by (your own) economic activity
But some places may be more likely to have this pollution $\Rightarrow$ this would affect decisions of people to live there

What if we could control for this likelihood?

How — topic of lecture
Application: how Borgschulte, Molitor, and Zou (2024) solve the issue

Motivation and Questions

Reminder: TWFE

Recall: for difference-in-differences showed that \[ \small \widehat{ATT}^{DiD} = \hat{\delta} \] where $\hat{\delta}$ was the OLS estimator in regression \[\small Y_{i2}- Y_{i1} = \gamma + \delta D_{it} + U_{i2} \tag{1}\] for $\delta$ — the ATT; $\gamma$ — average change in outcomes (trend)

Reminder: More General Equation

Equation 1 obtained by differencing the two-way fixed effect equation: \[ Y_{it} = \alpha_i + \gamma_t + \delta D_{it} + U_{it}, \tag{2}\] where \[ \gamma_1 = 0, \quad \gamma_2 = \gamma \] and $\alpha_i = Y_{i1}^0$ — baseline differences between units

Reminder: Estimation

We know how to apply OLS based on Equation 1: just regress $(Y_{i2}-Y_{i1})$ on $(1, D_{it})$ with OLS
Last time said that can also apply OLS based on Equation 2 directly (e.g. PanelOLS from linearmodels)
- Treated $\alpha_i$ as parameters
- Got exactly the same results from two approaches

Lecture Questions

How to apply OLS on Equation 2?

What are the regressors? How to “treat $\alpha_i$ as parameters”?
How does it work in practice?
Is the estimator inspired by Equation 2 always equal to the one based on Equation 1?

Can we apply the same approach with general (not just 0/1) treatment? What are the causal properties?

Fixed Effect Estimation

Random Intercept (Fixed Effect) Models

First Goal: Vector-Matrix Representation

First question: “treating $\alpha_i$ as parameters”?

For now forget about $D_{it}$ and $\gamma$ and consider: \[ Y_{it} = \alpha_i + U_{it}, \quad i=1,\dots, N; t=1, \dots, T \tag{3}\] $\alpha_i$ — individual-specific intercept (“unit fixed effect”). Data assumed balanced (same $T$ for all units)

Want to represent Equation 3 in vector-matrix form

Vector-Matrix Forms for Panel Data I

Before that: more info on matrix forms for panel data.

Vector form as before: single observation (now fixed $i$ and $t$) with vector of covariates: \[ Y_{it} = \bX_{it}'\bbeta + U_{it} \]

Vector-Matrix Forms for Panel Data II

Two key matrix forms:

Individual level. Let $\bY_i = (Y_{i1}, \dots, Y_{iT})$, $\bX_i = (\bX_{i1}, \dots, \bX_{iT})'$, then \[\small \bY_i = \bX_i\bbeta + \bU_i \]
Full sample. Let $\bY = (\bY_1, \dots, \bY_N)$, $\bX= (\bX_1', \dots, \bX_N')'$. Then \[ \small \bY = \bX\bbeta + \bU \] What are the dimensions of $\bY_i, \bX_i, \bY, \bX$?

Individual Matrix Form with Individual Intercepts

Model (3) in individual matrix form: \[ \bY_i = \mathbf{1}_T\alpha_i + \bU_i \] where $\mathbf{1}_T$ — $T$-vector of ones

Not that insightful

Full Sample Matrix Form with Individual Intercepts

Model (3) in full sample matrix form \[ \begin{aligned} \bY & = \bF \bLambda + \bU, \\ \bLambda & = (\alpha_1, \dots, \alpha_N)', \\ \bF & = \bI_N \otimes \mathbf{1}_T, \end{aligned} \tag{4}\] where $\otimes$ is the Kronecker product. Intuition:

There are $N$ regressors, $i$th regressor is the dummy of being the $i$th unit (0/1 regressor values)
$\bLambda$ — associated parameter vector

More Complex Example: Two-Way Intercept Model

Now consider more general model: \[ Y_{it} = \alpha_i + \gamma_t + U_{it} \] Here want to treat both $\alpha_i$ and $\gamma_t$ as parameters

Individual matrix form: \[ \begin{aligned} \bY_i & = \mathbf{1}_T \alpha_i + \bI_T \bgamma + \bU_i\\ \bgamma & = (\gamma_1, \dots, \gamma_T)' \end{aligned} \]

Two-Way Model: Full Sample Matrix Form

Can write \[ \begin{aligned} \bY & = \bF\bLambda + \bU, \\ \bF & = \left(\bI_N \otimes \mathbf{1}_T, \mathbf{1}_N\otimes \bI_T \right)\\ \bLambda & = (\alpha_1, \dots, \alpha_N, \gamma_1, \dots, \gamma_T) \end{aligned} \]

Both $\alpha_{\cdot}$ and $\gamma_{\cdot}$ treated as parameters
Regressors in $\bF$: $N$ dummy regressors from before; $T$ new dummy regressors, $t$th new regressor — indicator of $t$th period

Adding Other Covariates

Can write Equation 2 as \[ Y_{it} = \alpha_i + \gamma_t + \bX_{it}'\bbeta + U_{it}, \] for $\bX_{it} = (D_{it})$ and $\bbeta = (\delta)$

More generally, consider any vector $\bX_{it}$ — not just binary treatments

Its matrix form is \[ \bY = \bF\bLambda + \bX\beta + \bU \]

Random-Intercept (Fixed Effects) Models

Definition 1 Models of the kind \[ \small \bY = \bF\bLambda +\bX\bbeta + \bU, \tag{5}\] where $\bF$ is a matrix of 0s and 1s are called fixed effects or random intercept models

Fixed effects and random intercepts — often used interchangeably
Random intercepts — less ambiguous

Examples of Model (5)

Individual fixed effects/intercepts (one-way) \[ Y_{it} = \alpha_i + \bX_i'\bbeta + U_{it} \]
Two-way models (time and individual effects): \[ Y_{it} = \alpha_i + \gamma_t + \bX_i'\bbeta + U_{it} \]
Can include more complicated effects, see empirical illustration (where $i$ — US counties, $t$ — quarters; effects — county-season and state-year)

Fixed Effect (Within) Estimators

Estimation Strategies

Suppose $\E[U_{it}|\bX_i]=0$. How to estimate parameters of Model (5)?

There are two main strategies:

Estimate including $\bF$ and $\bX$ as regressors — least squares dummy variable (LSDV) estimator
Get rid of $\bF$, estimate after — within estimator

LSDV Estimation

LSDV — simply regress $\bY$ on $(\bF, \bX)$: \[ (\hat{\bLambda}, \hat{\bbeta}^{LSDV}) = \argmin_{\bL, \bb} \norm{\bY - \bF\bL -\bX\bb }_2^2 \]

For example with two-way effects:

\[\small \begin{aligned} & \left(\hat{\alpha}_1, \dots, \hat{\alpha}_N, \hat{\gamma}_1, \dots, \hat{\gamma}_T, \hat{\bbeta}^{LSDV} \right)\\ & = \argmin_{a_1, \dots, a_N, g_1, \dots, g_T, \bb}\sum_{i=1}^N \sum_{t=1}^T \left(Y_{it} - a_i - g_t - \bX_{it}'\bb \right)^{2} \end{aligned} \]

Within Estimation I: One-Way Transformation

First consider one-way model $Y_{it} = \alpha_{i} + \bX_{it}' + U_{it}$. For $\bW_{it} = Y_{it}, \bX_{it}, U_{it}$, define the (one-way) within-transformed version of $W_{it}$ as \[ \small \tilde{W}_{it} = W_{it} - \dfrac{1}{T}\sum_{s=1}^T W_{is} \tag{6}\]

Within transformation eliminated fixed effects (=$\bF$) \[ \small \tilde{Y}_{it} = \tilde{\bX}_{it}'\bbeta + \tilde{U}_{it} \]

Within Estimation II: Two-Way Transformations

Suppose $Y_{it} = \alpha_i + \gamma_t + \bX_{it}'\bbeta + U_{it}$. Define (two-way) within-transformed variables as \[ \small \tilde{W}_{it} = W_{it} - \dfrac{1}{T}\sum_{s=1}^T W_{is} - \dfrac{1}{N} \sum_{j=1}^N W_{jt} + \dfrac{1}{NT} \sum_{j=1}^N \sum_{s=1}^T W_{js} \tag{7}\]

Again eliminated fixed effects (=$\bF$) \[ \small \tilde{Y}_{it} = \tilde{\bX}_{it}'\bbeta + \tilde{U}_{it} \]

More General Within Transformation

Consider general model: \[ \small \bY = \bF\bLambda + \bX\bbeta + \bU \]

There exists a linear transformation that eliminates $\bF$: \[ \small \tilde{\bY} = \tilde{\bX}\bbeta + \tilde{\bU} \]

Called the FWL or the (generalized) within transformation

Within Estimation

Within estimation: just regressing $\tilde{\bY}$ on $\tilde{\bX}$ with OLS: \[ \hat{\bbeta}^{W} = \argmin_{\bb} \sum_{i=1}^N \sum_{t=1}^T (\tilde{Y}_{it} - \tilde{\bX}_{it}'\bb)^2 \]

$\tilde{\bX}$ must have maximum column rank (no collinearity in transformed regressors)
Intuition in one-way case (only $\alpha_i$): some variation in $\bX_{it}$ over time

Equivalence of Approaches

Proposition 1 \[ \hat{\bbeta}^{LSDV} = \hat{\bbeta}^{W} \]

Both approaches: same estimated values
Explains why we got same results from two regression approaches to DiD last time
Allows to use single name for both estimators. Usually called fixed effects or random intercept estimators

Which Approach to Use?

When to use LSDV vs. within estimation?

LSDV: only when you care about $\Lambda$ and they have some economic meaning. Example: $\alpha_i$ is the innate skill of worker $i$ (e.g. de la Roca and Puga 2017)
Within: in all other cases

Sometimes impossible to compute LSDV estimator: number of fixed effects is too large to even simply store the data matrix:

Called the “high-dimensional” fixed effect case
In practice the within transformation is cleverly done indirectly (Correia 2016)

Pooled OLS

Another special case of model (5) — pooled OLS:

No fixed effects, directly regressing $Y_{it}$ on $\bX_{it}$
Fully ignoring the panel structure
Involves no transformations and no $\bF$
Interpretation: the least flexible example of (5)

Implementation in Python

linearmodels supports both LSDV and within transformations
- Defaults to eliminating effects
- LSDV can be used with PanelOLS.fit(use_lsdv=True)
pyfixest was designed for high-dimensional FE estimation (can handle small examples too)
- LSDV not available (to the best of my knowledge)
- See empirical application for example usage

Causal Properties with General Treatment

Question: Causal Properties of FE Estimators

So far:

Now have described the estimation approach underlying DiD regression estimation
Noticed that it can handle general treatments $\bX_{it}$, not just scalar binary $D_{it}$

What are the causal properties of such estimators? Under which models do they give meaningful results?

Reflection on our Approach

Note:

Approach this problem differently from past lectures:
- Here first formulate an estimator and only after start understanding causal properties
- Usually goes the other way: fix causal problem, try to figure out identification and estimation
Doing so for historic reasons — these estimators came first, serious causal thinking more recently

Models Considered

Will consider two kinds of models under strict exogeneity

Random intercept causal process with same $\bbeta$ for everyone — reflects estimator structure
Model with heterogeneous effects $\bbeta_i$

Properties under Random Intercept Model

Causal Framework with Random Intercepts

Some vector of treatments $\bX_{it}$
Potential outcome of unit $i$ in time $t$ given by \[ Y^{\bx}_{it} = \alpha_i + \gamma_t + \bx'\bbeta + U_{it} \tag{8}\]
Will think about about exogeneity conditions later
Object of interest — $\bbeta$ (plays the role of ATE, ATT, …)

For definiteness, we do two-way effects, but can apply same analysis for any configuration of random intercepts, just need to define $\tilde{Y}_{it}$ appropriately

Sampling Setting

Work in the following setting

Units drawn IID
$N$ large, $T$ fixed
- More typical kind of panel data (“micro” panel)
- Contrast with “large” panel data with both $N$ and $T$ large

Causal Framework: Discussion of Model (8)

Contrast with less flexible causal model discussed before: \[ Y_{it}^{\bx} = \bx'\bbeta + U_{it} \]
Interpretations:
- $\alpha_i$ — often some characteristic of $i$ that does not change over $t$ in sample (e.g. innate intellect)
- $\gamma_t$ — shocks that affect everyone equally
- Similar logic for other types of random interecepts

Estimator for $\bbeta$

Estimate $\bbeta$ with the FE estimator, expressed as (Proposition 1) \[ \small \begin{aligned} \hat{\bbeta}^{FE} & = \left(\sum_{i=1}^N \sum_{t=1}^T \tilde{\bX}_{it}\tilde{\bX}_{it}' \right)^{-1}\sum_{i=1}^N \sum_{t=1}^T \tilde{\bX}_{it}\tilde{Y}_{it} \\ & = \left( \sum_{i=1}^N \tilde{\bX}_i'\tilde{\bX}_i \right)\sum_{i=1}^N \tilde{\bX}_i'\tilde{\bY}_i \end{aligned} \] with variables transformed as in Equation 7
Will discuss nonsingularity assumptions a bit later

Probability Limit of $\hat{\bbeta}^{FE}$

Realized data satisfies $\tilde{\bY} = \tilde{\bX}_{it}'\bbeta + \tilde{U}_{it}$

So as $N\to\infty$ and $T$ is fixed: \[ \small \hat{\bbeta}^{FE} \xrightarrow{p} \bbeta + \left( \E[\tilde{\bX}_{i}'\tilde{\bX}_i]\right)^{-1} \E[\tilde{\bX}_{i}'\tilde{\bU}_{i}] \]

$T$ fixed — basically treat each unit a single $T$-dimensional observation (as in $\tilde{\bU}_i$)

Rank (Nonsingularity) Conditions

So far needed to impose nonsingularity conditions

Sample: on $\sum_{i=1}^N \tilde{\bX}_i'\tilde{\bX}_i$
Population: on $\E[\tilde{\bX}_{i}'\tilde{\bX}_i]$

What does it require of $\bX_{it}$?

No collinearity
Also variation after the within transformation (one-way: variation over time; two-way: different variation over time for different units; etc.)

Towards Exogeneity Conditions

For consistency want \[ \small \E[\tilde{\bX}_i'\tilde{\bU}_i] = \sum_{t=1}^T \E\left[\tilde{\bX}_{it} \tilde{U}_{it} \right] =0 \]

Sufficient that for all $t$ \[ \small \E\left[\tilde{\bX}_{it} \tilde{U}_{it} \right] =0 \tag{9}\]

What does this condition require of $\bX_{it}$ and $U_{it}$?

Exogeneity in the One-Way Case

Under one-way transformation (6), Equation 9 becomes \[ \scriptsize \E\left[\tilde{\bX}_{it} \tilde{U}_{it} \right] = \E\left[ \bX_{it}U_{it} - \dfrac{\bX_{it}}{T}\sum_{s=1}^T U_{is} - \dfrac{U_{it}}{T}\sum_{r=1}^T \bX_{ir} + \dfrac{1}{T^2} \sum_{s=1}^T\sum_{r=1}^T \bX_{is} U_{ir}\right] = 0 \]

Here would be sufficient that for all $t$, $s$ \[ \small \E[\bX_{it}U_{is}] = 0 \tag{10}\] Intuition: $\bX_{it}$ and $U_{is}$ are uncorrelated across all points in time

Strict Exogeneity in the Panel Case

What about beyond one-way effects? Usually impose an assumption that covers all cases — panel data version of strict exogeneity :

Assumption (strict exogeneity): \[ \E[U_{it} | \bX_{i1}, \dots, \bX_{iT}] =0 \]

Much stronger than just $\E[U_{it}|\bX_{it}]=0$

Strict Exogeneity Implies Equation 9

Proposition 2 Let $\E[U_{is} | \bX_{i1}, \dots, \bX_{iT}] =0$ for all $s$. Then for any within transformation it holds for all $t$ that \[ \E[\tilde{\bX}_{i}'\tilde{\bU}_i] =0 \]

Proof by properties of conditional expectations
Covers all configurations of random intercepts

Consistency Result

Proposition 3 Let

$(\bX_i, \bU_i)$ be IID and model (8) be true
$\E[\tilde{\bX}_i'\tilde{\bX}_i]$ exist and be invertible
$\E[U_{it} | \bX_{i1}, \dots, \bX_{iT}] =0$

Then as $N\to\infty$

\[ \hat{\bbeta}^{FE} \xrightarrow{p} \bbeta \]

Asymptotic Distribution

Proposition 4 Let

$(\bX_i, \bU_i)$ be IID and model (8) be true
$\E[\tilde{\bX}_i'\tilde{\bX}_i]$ exist and be invertible; $\E\left[\norm{\tilde{\bX}_i'\tilde{\bU}_i}^2\right]<\infty$
$\E[U_{it} | \bX_{i1}, \dots, \bX_{iT}] =0$

Then as $N\to\infty$

\[ \scriptsize \sqrt{N}\left(\hat{\bbeta}^{FE} - \bbeta\right) \xrightarrow{d} N\left(0, \left(\E[\tilde{\bX}_i'\tilde{\bX}_i] \right)^{-1} \E[\tilde{\bX}_i'\tilde{\bU}_i\tilde{\bU}_i'\tilde{\bX}_i] \left(\E[\tilde{\bX}_i'\tilde{\bX}_i] \right)^{-1}\right) \]

Discussion of Asymptotic Results for $\hat{\bbeta}^{FE}$

Can do inference and estimate errors in more or less standard way (confidence intervals, hypothesis tests, …)
Proof of asymptotic normality — exercise
Not examinable technical point: some within transformations (e.g. two-way) can create dependence across $i$. This dependence disappears as $N\to\infty$ and does not affect asymptotics

Properties under Heterogeneous Coefficient Model

Causal Framework with Heterogeneous Coefficients

Consider different potential outcomes setting: \[ \small Y_{it}^{\bx} = \bx'\bbeta_{\textcolor{teal}{i}} + U_{it} \tag{11}\] under strict exogeneity $\E[U_{it} | \bX_{i1}, \dots, \bX_{iT}] =0$

Special case: unit-specific intercepts
Does not nest two-way or other random intercepts with time variation
(11) interesting because it allows heterogeneous effects of same change in $\bX_{it}$

FE Estimator

Can still use the FE estimator: \[ \hat{\bbeta}^{FE} = \left(\sum_{i=1}^N \tilde{\bX}_i'\tilde{\bX}_i \right)^{-1}\sum_{i=1}^N \tilde{\bX}_i'\tilde{\bY}_i \] Can do any transformation such that $\E[\tilde{\bX}_i'\tilde{\bX}_i]$ is invertible

What does $\hat{\bbeta}^{FE}$ do under model (11)?

Expanding Estimator and Taking Limits

Substituting model (11) gets us \[ \small \begin{aligned} \hat{\bbeta}^{FE} & = \left(\sum_{i=1}^N \tilde{\bX}_i'\tilde{\bX}_i \right)^{-1}\sum_{i=1}^N \tilde{\bX}_i'\tilde{\bX}_i\bbeta_i + \left(\sum_{i=1}^N \tilde{\bX}_i'\tilde{\bX}_i \right)^{-1}\sum_{i=1}^N \tilde{\bX}_i'\tilde{\bU}_i\\ & \xrightarrow{p} \E\left[ \bW(\tilde{\bX}_i) \bbeta_i\right] \end{aligned} \] for $\small\bW(\tilde{\bX}_i) = \left(\E\left[\tilde{\bX}_i'\tilde{\bX}_i\right] \right)^{-1} \tilde{\bX}_i'\tilde{\bX}_i$

Discussion of $\hat{\bbeta}^{FE}$ under Heterogeneous Effects

Result: FE estimator estimate weighted average of $\bbeta_i$
- Weights are positive definite and sum to one
- Weights depend on within transformation used
In general $\E\left[ \bW(\tilde{\bX}_i) \bbeta_i\right]\neq \E[\bbeta_i]$ (except RCTs)
Careful: $\E\left[ \bW(\tilde{\bX}_i) \bbeta_i\right]$ and $\E[\bbeta_i]$ can even have different signs sometimes

Extension: Factor Models

Can generalize random intercept models to the form \[ Y_{it}^{\bx} = \balpha_i'\bgamma_t + \bx'\bbeta_i + U_{it} \]

$\bgamma_t$ — unobserved “factors”, $\balpha_i$ — factor loadings
Allows people to react differently to same shocks in $\bgamma_t$
See chapter 29 in Pesaran (2015)

Empirical Application

Context and Setting

Empirical Question

Recall empirical question:

How does pollution affect labor market outcomes?

Want to answer question without instruments
Need some “economic activity-exogenous” measure of pollution
Also need to control “predisposition” to such pollution

Data

Use data from Borgschulte, Molitor, and Zou (2024)

Quarterly data for all 3142 counties in the US over 2007-2019 ($T\approx 48$)
Data on average earnings and employment in county

Loading and preparing data

# Load data
columns_to_load = [
    "countyfip",                # FIPS
    "rfrnc_yr",                 # Year
    "rfrnc_qtroy",              # Quarter 
    "d_pc_qwi_payroll",         # Earnings (annual diff) 
    "hms_deep",                 # Number of smoke days
    "fe_countyqtroy",
    "fe_styr",
    "fe_stqtros",
    "seer_pop",                 # Population 
]
county_df = pd.read_csv(
  "data/county_quarter.tab", 
  sep="\t", 
  usecols=columns_to_load,
)

# Rename columns
column_rename_dict = {
  "countyfip":"fips",
  "rfrnc_yr":"year",
  "rfrnc_qtroy":"quarter",
  "d_pc_qwi_payroll":"diff_payroll", 
  "hms_deep":"smoke_days",
  "fe_countyqtroy":"fe_id_county_quarter",
  "fe_styr":"fe_id_state_year",
  "fe_stqtros":"fe_id_state_quarter",
  "seer_pop":"population",
}
county_df = county_df.rename(columns=column_rename_dict)
 
# View data
county_df.dropna(inplace=True)
county_df.head(2)

	fips	year	quarter	smoke_days	population	fe_id_state_year	fe_id_county_quarter	fe_id_state_quarter	diff_payroll
4	1001	2007	1	0.0	52405.0	2	1	5	25.800293
5	1001	2007	2	9.0	52405.0	2	2	6	33.243042

Pollution Measure

Pollution — number of smoke days because of wildfires

Wildfire smoke can travel far — “exogenous”
Some places at great risk of fires or persistent smoke — how to handle impact?

Distribution of Pollution Measure

Specification and Estimation

Key Variables and Homogeneity Assumption

Outcome: change in average earnings (employment — exercise)
Treatment: number of smoke days in quarter
Analysis level: $i$ — counties, $t$ — quarters (no aggregation)
Assumption: treatment has same effect in all $(i, t)$ and at all levels. Assumed potential outcomes model \[\small (\Delta \text{Earnings}_{it})^{\text{Smoke days}} = \beta\text{Smoke days} + \text{FEs} + U_{it} \]

Random Intercepts

Need to choose random intercepts/FEs so that strict exogeneity holds

Specification of Borgschulte, Molitor, and Zou (2024): include

County-season intercepts (capture effect of geography, differing per season)
State-year (capture overall economic trends)

About 13000 different random intercepts (a bit more complicated than $\alpha_i$ and $\delta_t$)

Estimation

Will use pyfixest this time to estimate
feols for fixed effect estimation
Regression formula in fml, random intercepts after |

import pyfixest as pf

results = pf.feols(
    fml="diff_payroll ~ smoke_days | fe_id_state_year + fe_id_county_quarter", 
    data=county_df, 
    vcov={"CRV1": "fips + fe_id_state_quarter",}, 
    weights="population",
)

Estimation Results

An additional day reduces quarterly earning about $5.20 on average — significant effect
Clustered standard errors (p. 77 in Cunningham 2021)

pf.etable(results)

	diff_payroll
	(1)
coef
smoke_days	-5.217*** (0.774)
fe
fe_id_county_quarter	x
fe_id_state_year	x
stats
Observations	160346
S.E. type	by: fips+fe_id_state_quarter
R²	-
Significance levels: * p < 0.05, p < 0.01, * p < 0.001. Format of coefficient cell: Coefficient (Std. Error)

Recap and Conclusions

Recap

In this lecture we

Discussed fixed effect estimators for DiD and beyond
- LSDV
- Within transformation
Proved causal properties of FE estimators under
- A random intercept model
- Model with unit-specific coefficients

Next Questions

What if you want to estimate $\E[\bbeta_i]$?
When does strict exogeneity fail? Can you relax it?
What does panel data let you do in nonlinear settings?

References

Baltagi, Badi H. 2021. Econometric Analysis of Panel Data. Springer Texts in Business and Economics. Cham: Springer International Publishing. https://doi.org/10.1007/978-3-030-53953-5.

Borgschulte, Mark, David Molitor, and Eric Yongchen Zou. 2024. “Air Pollution and the Labor Market: Evidence from Wildfire Smoke.” Review of Economics and Statistics 106 (6): 1558–75. https://doi.org/10.1162/rest_a_01243.

Correia, Sergio. 2016. “A Feasible Estimator for Linear Models with Multi-Way Fixed Effects.”

Cunningham, Scott. 2021. Causal Inference: The Mixtape. Yale University Press. https://doi.org/10.2307/j.ctv1c29t27.

de la Roca, Jorge, and Diego Puga. 2017. “Learning By Working in Big Cities.” Review of Economic Studies 84 (1): 106–42. https://doi.org/10.1093/restud/rdw031.

Hansen, Bruce. 2022. Econometrics. Princeton_University_Press.

Huntington-Klein, Nick. 2025. The Effect: An Introduction to Research Design and Causality. S.l.: Chapman and Hall/CRC.

Pesaran, M. Hashem. 2015. Time Series and Panel Data Econometrics. Oxford University Press. https://doi.org/10.1093/acprof:oso/9780198736912.001.0001.

Wooldridge, Jeffrey M. 2020. Introductory Econometrics: A Modern Approach. Seventh edition. Boston, MA: Cengage.

Fixed Effect Estimation

Introduction

Lecture Info

Learning Outcomes

References

Empirical Motivation

Empirical Question

Challenge: Endogeneity

Another Approach

Motivation and Questions

Reminder: TWFE

Reminder: More General Equation

Reminder: Estimation

Lecture Questions

Fixed Effect Estimation

Random Intercept (Fixed Effect) Models

First Goal: Vector-Matrix Representation

Vector-Matrix Forms for Panel Data I

Vector-Matrix Forms for Panel Data II

Individual Matrix Form with Individual Intercepts

Full Sample Matrix Form with Individual Intercepts

More Complex Example: Two-Way Intercept Model

Two-Way Model: Full Sample Matrix Form

Adding Other Covariates

Random-Intercept (Fixed Effects) Models

Examples of Model (5)

Fixed Effect (Within) Estimators

Estimation Strategies

LSDV Estimation

Within Estimation I: One-Way Transformation

Within Estimation II: Two-Way Transformations

More General Within Transformation

Within Estimation

Equivalence of Approaches

Which Approach to Use?

Pooled OLS

Implementation in Python

Causal Properties with General Treatment

Question: Causal Properties of FE Estimators

Reflection on our Approach

Models Considered

Properties under Random Intercept Model

Causal Framework with Random Intercepts

Sampling Setting

Causal Framework: Discussion of Model (8)

Estimator for \(\bbeta\)

Probability Limit of \(\hat{\bbeta}^{FE}\)

Rank (Nonsingularity) Conditions

Towards Exogeneity Conditions

Exogeneity in the One-Way Case

Strict Exogeneity in the Panel Case

Strict Exogeneity Implies Equation 9

Consistency Result

Asymptotic Distribution

Discussion of Asymptotic Results for \(\hat{\bbeta}^{FE}\)

Properties under Heterogeneous Coefficient Model

Causal Framework with Heterogeneous Coefficients

FE Estimator

Expanding Estimator and Taking Limits

Discussion of \(\hat{\bbeta}^{FE}\) under Heterogeneous Effects

Extension: Factor Models

Empirical Application

Context and Setting

Empirical Question

Data

Pollution Measure

Distribution of Pollution Measure

Specification and Estimation

Key Variables and Homogeneity Assumption

Random Intercepts

Estimation

Estimation Results

Recap and Conclusions

Recap

Next Questions

References