4 Interlude: Standard Dynamic Panel Estimators
4.1 Exogeneity and Dynamic Models
4.1.1 Introduction
Throughout the previous section, we assumed that the idiosyncratic term \(u_{it}\) satisfied strict exogeneity with respect to the data: \[ \E[u_{it}|\bX_i] = 0. \tag{4.1}\]
However, strict exogeneity is not an innocent assumption. In particular, it implies that for all pairs of indices \(t, s\) it holds that \[ \E[u_{it}\bx_{is}] =0. \tag{4.2}\]
For \(t<s\), Equation 4.2 means that past shocks are uncorrelated with future values of \(\bx\). In other words, one cannot predict past shocks from future \(\bx\)s.
This requirement might fail if \(\bx\) is dynamic and its evolution is affected by \(u_{it}\). In this section, we will discuss this challenge and some traditional approaches to dealing with it with short panel data.
4.1.2 No Strict Exogeneity for Dynamic Models
A particularly clear example for failure of Equation 4.2 arises when \(\bx_{it}\) includes lagged values of the outcome. As we show now, strict exogeneity cannot hold in this case.
For simplicity, we momentarily forget about the cross-sectional dimension and consider a simple autoregressive model with 1 lag (AR(1)): \[ y_t = \alpha + \lambda y_{t-1} + u_{t}, \] where \(t=1, \dots, T\) and the innovation \(u_t\) satisfies \(\E[u_t]=0\).
To show that strict exogeneity fails, it is sufficient to find a single pair of \((t, s)\) such that Equation 4.2 is not true. Specifically, we will take \(s=t+1\) and show that \(\E[y_{it+1}u_t]\) is not necessarily zero.
To evaluate this expectation, we write the model at \(t+1\) \[ y_{t+1} = \alpha + \lambda y_t + u_{t+1}. \tag{4.3}\] We then substitute the model into \(\E[y_{it+1}u_t]\) \[ \begin{aligned} \E[y_{t+1}u_t] & = \lambda\E[y_t u_t] + \E[u_{t+1}u_t] \\ & = \lambda^2\E[y_{t-1}u_t] + \lambda\E[u_t^2] + \E[u_{t+1}u_t] . \end{aligned} \]
What can we say about the right hand side? In many contexts, we may reasonably assume that
- One cannot predict the shocks of tomorrow from what you have today, so that \[\E[y_{t-1}u_t]=0.\]
- \(u_{t}\) is uncorrelated over time, so that \[\E[u_{t+1}u_t]=0.\]
However, \(\lambda\E[u_t^2]\) is zero only in two unreasonable cases:
- If \(\lambda=0\), and so the model is not actually dynamic
- \(\E[u_t^2]=0\), and so the process is actually fully deterministic.
Both of these options are unpleasant. The first one goes against the idea of allowing any dynamics. The second is extremely unlikely with any kind of social data. We conclude that in dynamic models it is the case \[ \E[y_{it+1}u_t] \neq 0. \tag{4.4}\] A fortiori, strict exogeneity also cannot hold in dynamic models.
4.1.3 Sequential Exogeneity
So what can you do if you want to allow for dynamic models? Since strict exogeneity (4.2) is effectively impossible to satisfy, we need a weaker form of exogeneity.
One popular weaker assumption compatible with dynamic models is sequential exogeneity (or predeterminedness). In context of the simple univariate time series model (4.3), sequential exogeneity may be stated as \[ \E[u_{t}| y_{t-1}, y_{t-2}, \dots] =0. \] Intuitively, sequential exogeneity states that future shocks cannot be predicted using past covariates. However, it does not restrict our ability to predict the past from the future — the context which created problems for strict exogeneity.
More generally, consider a panel data linear model with homogeneous coefficients, a random intercept, and a dynamic process for the outcome: \[ y_{it} = \alpha_i + \sum_{k=1}^K \lambda_k y_{it-k} + \bbeta'\bx_{it} + u_{it}. \tag{4.5}\] In this model, a sequential exogeneity assumption takes form \[ \E[u_{it}|\curl{y_{is-1}, \bx_{is}}_{s\leq t}] =0. \] Model (4.5) is a dynamic counterpart of the static model (3.2) we discussed in the last section.
4.2 Handling Dynamic Panels with a Random Intercept
Before moving on and introducing heterogeneous coefficients in dynamic linear models, we will discuss model (4.5) in more detail. In particular, we will develop the standard workhorse instrumental variable estimators. We will analyze how these estimators fare in the more general model (2.7) in the next section.
4.2.1 Model
For the sake of simplicity, we will discuss a simple version of model (4.5) without extra covariates \(\bx_{it}\) and with only one lag of \(y_{it}\): \[ y_{it} = \alpha_i + \lambda y_{it-1} + u_{it}, \tag{4.6}\] where \(u_{it}\) satisfies sequential exogeneity in the form \(\E[u_{it}|y_{is}, s<t] = 0\). This random intercept dynamic model has extensively considered in the literature (e.g. Anderson and Hsiao 1982; Arellano and Bond 1991; Blundell and Bond 1998).
The key parameter of interest is \(\lambda\). By homogeneity of \(\lambda\), it also plays the role of the average coefficient for model (4.6).
Here, we will primarily focus on the large-\(N\), fixed-\(T\) case, though model (4.5) has also been extensively studied in the large-(\(N, T\)) case (e.g. Alvarez and Arellano 2003).
4.2.2 Endogeneity
To work towards an estimator for \(\lambda\), let us difference Equation 4.5 across \(t\) to eliminate the random intercept \(\alpha_i\). The differenced equation takes form \[ \begin{aligned} \Delta y_{it} & = \lambda \Delta y_{it-1} + \Delta u_{it}, \\ \Delta y_{it} & = y_{it} - y_{it-1}. \end{aligned} \tag{4.7}\] Equation 4.7 seems like a simple regression equation. It is tempting to just apply OLS and regress \(\Delta y_{it}\) on \(\Delta y_{it-1}\).
It turns out that this approach will fail as there is an endogeneity problem in Equation 4.7! Sequential exogeneity is not enough to guarantee that \[ \E[\Delta y_{it-1}\Delta u_{it}] = 0. \]
To see why, we expand \(\E[\Delta y_{it-1}\Delta u_{it}]\) as
\[
\begin{aligned}
& \E\left[(u_{it}-u_{it-1})(y_{it-1}-y_{it-2}) \right] \\
& = \E[ u_{it}y_{it-1} ] + \E[(u_{it-1}-u_{it})y_{it-2}] - \E[u_{it-1}y_{it-1}] \\
& = -\E[u_{it-1} y_{it-1}].
\end{aligned}
\] Sequential exogeneity is sufficient to immediately eliminate the first 2 expectations. For the last expectation remaining, we can substitute \(y_{it-1}\) to see that \[
\begin{aligned}
\E[u_{it-1}y_{it-1}] & = \lambda\E[u_{it-1}y_{it-2}] + \E[u_{it-1}^2]\\
& = \E[u_{it-1}^2]
\end{aligned}
\] As noted above, in general we expect that \(\E[u_{it-1}^2] \neq 0\). Hence we conclude that \[
\E[\Delta y_{it-1}\Delta u_{it}]\neq 0.
\tag{4.8}\] In other words, \(\Delta y_{it-1}\) is an endogenous regressor in the differenced Equation 4.7.
4.3 IV Estimation of Dynamic Panel Models
Given the endogeneity issue in first-differencing, instrumental variable methods offer a potential solution. We only need to find suitable instruments — variables \(z_{it}\) which satisfy relevance and exogeneity conditions: \[ \begin{aligned} \E[z_{it}\Delta y_{it-1}] \neq 0,\\ \E[z_{it}\Delta u_{it}] = 0. \end{aligned} \]
In most contexts, one has to look for external variables that can serve as instruments.
However, Equation 4.7 is a very special case where one can use instruments internal to the model!
4.3.1 The Anderson-Hsiao Estimator
In particular, consider using \(y_{it-2}\) as an instrument.
- Relevance seems to be relatively straightforward, as the target endogenous variable \(\Delta y_{it-1}= y_{it-1}-y_{it-2}\) actually has the instrument inside.
- Exogeneity can be justified by appealing to sequential exogeneity: \[ \E[y_{it-2}\Delta u_{it}] = \E[y_{it-2}u_{it} - y_{it-2}u_{it-1}]=0. \]
Hence \(y_{it-2}\) is a valid instrument!
The resulting IV estimator is known as the Anderson and Hsiao (1982) estimator and given by \[ \hat{\lambda}^{AH} = \dfrac{ \sum_{t=2}^T y_{it-2}\Delta y_{it} }{\sum_{t=2}^T y_{it-2}\Delta y_{it} }. \]
4.3.2 Further Dynamic Panel IV Estimators
Are there more instruments that can use for \(\Delta y_{it-1}\)? Using more instruments will induce overidentification and potentially lead to more precise estimators, solving one of the key drawbacks of the Anderson and Hsiao (1982) estimator (Arellano 1989).
As a partial answer, suppose we consider \(y_{it-k}\) for \(k\geq 2\). By applying sequential exogeneity again we can show that \(y_{it-k}\) is exogenous: \[ \E[y_{it-k}\Delta u_{it}] = \E[y_{it-k}u_{it} - y_{it-k}u_{it-1}]=0. \] Provided \(y_{it-k}\) is correlated with \(\Delta y_{it-1}\) and available in the data, it also becomes a valid instrument. Following this chain of reasoning and using all available lags \(y_{it-k}\) yields the Arellano and Bond (1991) estimator.
Even further instruments have been found for Equation 4.7 by Ahn and Schmidt (1995) and Blundell and Bond (1998), among others.
All of these estimators are implemented in
4.3.3 Properties of Dynamic Panel IV Estimators
Under some standard conditions, the above instrumental variable estimators will all be consistent and asymptotically normal for \(\lambda\) as \(N\to\infty\) and \(T\) is fixed. The matter is rather more delicate if both \(N\) and \(T\) are large, and the above dynamic panel IV estimators offer an interesting and natural example of the many moments bias. See Alvarez and Arellano (2003).
4.4 Extra: Endogeneity and the Within Estimator
One may think that we had the endogeneity problem (4.8) because we took first differences instead of applying the within transformation.
However, the same issue endogeneity issue affects the within estimator. First, for \(T=2\) the above discussion applies directly, as the within transformation is numerically identical to first differencing. Second, more generally, the within estimator will suffer from its own version of endogeneity. To see the issue, note that the within-transformed model (4.6) can be written as (see Equation 3.3): \[ \tilde{y}_{it} = \lambda \tilde{y}_{it-1} + \tilde{u}_{it}. \]
One can now immediately see that \(\tilde{y}_{it-1}\) will also be an endogenous regressor as
\[ \E[\tilde{y}_{it-1}\tilde{u}_{it}] = \E\left[\left(y_{it-1}- T^{-1}\sum_{s=0}^{T-1} y_{is} \right)\left(u_{it} - T^{-1}\sum_{r=1}^{T}u_{ir} \right) \right]. \]
The expectation on the right hand side contains “bad” expectations of the form \(\E[u_{it}y_{is}]\) with \(t<s\) (recall Equation 4.4).
We conclude that for any fixed \(T\) the within estimator will also suffer from endogeneity bias. Nickell (1981) shows that it is of the order \(O(T^{-1})\). Furthermore, this bias is likely to be large in practice (see Kiviet 1995).
Next Section
Next, we examine how the above “standard” dynamic panel IV estimators break down when individual coefficients vary across observations.