Exercises: Asymptotic Inference
Theoretical Exercises
Testing Scalar Restrictions
Let the outcome \(Y_i\), the covariates \(\bX_i\), and an unobserved component \(U_i\) be linked through the linear potential outcomes model \[ Y_i^{\bx} = \bx'\bbeta + U_i. \] Suppose that we observe an IID sample of data on \(Y_i, \bX_i\), that \(\E[U_i|\bX_i]=0\), that \(\E[\bX_i\bX_i']\) is invertible, and that \(\E[U_i^2\bX_i\bX_i']\) has maximal rank.
- Consider the hypotheses \(H_0: \beta_k = c\) and \(H_1: \beta_k\neq c\), where \(\beta_k\) is the \(k\)th coordinate of the \(\bbeta\) vector. Propose a consistent test for \(H_0\) vs \(H_1\) that has asymptotic size \(\alpha\).
- Now let \(\ba\neq 0\) be some known constant vector of the same dimension as \(\bbeta\). Consider the hypotheses \(H_0: \ba'\bbeta = c\) and \(H_1: \ba'\bbeta\neq c\). Propose a consistent \(t\)-test for \(H_0\) vs \(H_1\) that has asymptotic size \(\alpha\).
- Why do we require that \(\ba\neq 0\) in the previous question?
In both cases remember to show that your test is consistent and has the desired asymptotic size.
Click to see the solution
First subquestion: to choose our test statistic, we observe two facts:
- We are dealing with a scalar hypothesis,
- The OLS estimator is consistent and asymptotically normal (why?).
Accordingly, we can use the t-test. The t-statistic is given by \[ t = \dfrac{\hat{\bbeta}_k - c}{ \sqrt{ \widehat{\avar}(\hat{\bbeta})/N } }, \] where \(\hat{\bbeta}\) is the OLS estimator, \(\widehat{\avar}(\hat{\bbeta})\) is some consistent estimator of \(\avar(\bbeta)\) (e.g. the HC0 estimator from the lectures)
Our test is based on the following decision rule. Let \(z_{1-\alpha/2}\) be the \((1-\alpha/2)\)th quantile of the standard normal distribution. Then:
- If \(\abs{t}>z_{1-\alpha/2}\), we reject \(H_0\).
- If \(\abs{t}\leq z_{1-\alpha/2}\), we do not reject \(H_0\).
We now need to show that this test is consistent and has the desired asymptotic size.
Consistency: We need to show that the probability of rejecting \(H_0\) converges to 1 when \(H_0\) is false. Let \(\beta_k\) be the true value of the coefficient of interest, and write \[ t = \dfrac{\hat{\bbeta}_k - \beta_k}{ \sqrt{ \widehat{\avar}(\hat{\bbeta})/N } } + \dfrac{ \beta_k- c}{ \sqrt{ \widehat{\avar}(\hat{\bbeta})/N } }. \] By our asymptotic normality results, the first term converges in distribution to a \(N(0, 1)\) random variable. By our assumptions, \(\widehat{\avar}(\hat{\bbeta})\xrightarrow{p} \avar(\hat{\bbeta})\neq 0\). Under the alternative, \(\beta_k\neq c\), and so the second term diverges to \(\pm \infty\). It then follows that with probability approaching one \(\abs{t}> z_{1-\alpha/2}\) for any \(\beta_k\neq c\). In other words, consistency holds.
Asymptotic size: We need to show that the probability of rejecting \(H_0\) converges to \(\alpha\) when \(H_0\) is true. Under \(H_0\) it holds that \(\beta_k=c\), and thus our asymptotic results and Slutsky’s theorem imply that \[ t = \dfrac{\hat{\bbeta}_k - \beta_k}{ \sqrt{ \widehat{\avar}(\hat{\bbeta})/N } } \xrightarrow{d} N(0, 1). \] By definition of convergence of probability, definition of \(z_{1-\alpha/2}\) and the fact that \(z_{1-\alpha/2} = -z_{\alpha/2}\), it holds that \[ \begin{aligned} & P\left(\text{Reject} H_0|H_0 \right) = P\left(\abs{t}>z_{1-\alpha/2} |H_0\right) \\ & = P\left( \abs{ \dfrac{\hat{\beta}_k-c}{\sqrt{ \widehat{\avar}(\hat{\beta}_k)/N } }}> z_{1-\alpha/2}\Bigg|H_0 \right)\\ & \to \Phi(z_{\alpha/2}) + (1- \Phi(z_{1-\alpha/2})) = \alpha \end{aligned}. \] The test has asymptotic size \(\alpha\).
Second subquestion: the question explicitly asks for a \(t\)-test, and so we use the following \(t\)-statistic as the basis for our test: \[ t = \dfrac{\ba'\hat{\bbeta} - c}{ \sqrt{ \widehat{\avar}(\ba'\hat{\bbeta})/N } }, \tag{1}\] The key question is how to construct a suitable estimator \(\widehat{\avar}(\ba'\bbeta)\) for \(\avar(\ba'\bbeta)\).
By the continuous mapping theorem it holds that \[ \sqrt{N}(\ba'\hat{\bbeta}- \ba'\bbeta) \xrightarrow{d} N(0, \ba'\avar(\bbeta)\ba). \] By the continuous mapping theorem again: \[ \ba'\widehat{\avar}(\hat{\bbeta})\ba \xrightarrow{p}\ba'{\avar}(\hat{\bbeta})\ba = \avar(\ba'\hat{\bbeta}) \] Hence, we can use \(\ba'\widehat{\avar}(\hat{\bbeta})\ba\) as \(\widehat{\avar}(\ba'\bbeta)\) in Equation 1. With this choice, it follows by Slutsky’s theorem that \[ \dfrac{\ba'\hat{\bbeta} - \ba'\bbeta}{ \sqrt{ \widehat{\avar}(\ba'\hat{\bbeta})/N } } \xrightarrow{d} N(0, 1). \tag{2}\]
Our decision rule is analogous to the above one:
- If \(\abs{t}>z_{1-\alpha/2}\), we reject \(H_0\).
- If \(\abs{t}\leq z_{1-\alpha/2}\), we do not reject \(H_0\).
Consistency and asymptotic size can be shown entirely analogously to the above case by using Equation 2 (show them regardless to practice!).
Third subquestion: if \(\ba=0\), then the null hypothesis is trivially true and reduces to \(H_0: 0=c\). It is either trivially true or trivially false, depending on \(c\).
Testing Several Linear Restrictions
Let the outcome \(Y_i\), the covariates \(\bX_i\), and an unobserved component \(U_i\) be linked through the linear potential outcomes model \[ Y_i^{\bx} = \bx'\bbeta + U_i. \] Suppose that we observe an IID sample of data on \(Y_i, \bX_i\), that \(\E[U_i|\bX_i]=0\), that \(\E[\bX_i\bX_i']\) is invertible, and that \(\E[U_i^2\bX_i\bX_i']\) has maximal rank.
Let \(\bbeta = (\beta_1, \beta_2, \dots, \beta_p)\) with \(p\geq 4\). Consider the following two hypotheses on \(\bbeta\): \[ H_0: \begin{cases} \beta_1 = 0, \\ \beta_2 - \beta_3 = 1, \\ \beta_2 = 4\beta_4 + 5, \end{cases} \quad H_1: \text{at least one equality in $H_0$ fails} \] Propose a consistent test for \(H_0\) vs. \(H_1\) with asymptotic size \(\alpha\). Show that the test possesses these properties.
Click to see the solution
First, we write the null hypothesis in matrix form. We can do this by stacking the three equations in \(H_0\) into a single vector equation: \[ \begin{aligned} H_0: & \bR\bbeta = \bq, \\ \bR & = \begin{pmatrix} 1 & 0 & 0 & 0 & \cdots\\ 0 & 1 & -1 & 0 & \cdots\\ 0 & 1 & 0 & -4 & \cdots \end{pmatrix}, \quad \bq = \begin{pmatrix} 0\\ 1\\ 5 \end{pmatrix}, \end{aligned} \] where \(\bR\) has zero columns starting from the fifth column.
We can construct a Wald test for \(H_0\) vs. \(H_1\). The Wald statistic is defined as \[ W = N\left(\bR\hat{\bbeta} - \bq \right)' \left(\bR\widehat{\avar}(\hat{\bbeta})\bR' \right)^{-1} \left( \bR\hat{\bbeta} - \bq \right) \]
We propose the following test. Let \(c_{1-\alpha}\) be the \((1-\alpha)\)th quantile of the \(\chi^2_{3}\) distribution (3 is the number of constraints in \(H_0\)). Then
- If \(W>c_{1-\alpha}\), we reject \(H_0\).
- If \(W\leq c_{1-\alpha}\), we do not reject \(H_0\).
We now need to show that this test is consistent and has the desired asymptotic size.
Asymptotic size: Under \(H_0\) it holds that \(\bR\bbeta=\bq\), and so by our asymptotic results for the OLS estimator and the continuous mapping theorem under \(H_0\) \[ \sqrt{N}(\bR\hat{\bbeta}-\bq) \xrightarrow{d} N(0, \bR\avar(\hat{\bbeta})\bR'). \] By Slutsky’s theorem and the definition of \(\chi^2_{\cdot}\) random variables, it holds under \(H_0\) that \[ W \xrightarrow{d} \chi^2_3. \] By definition of \(c_{1-\alpha}\) and the definition of convergence in distribution \[ \begin{aligned} & P\left(\text{Reject} H_0|H_0 \right) = P\left(W>c_{1-\alpha} |H_0\right) \\ & \to P(\chi^2_3> c_{1-\alpha}) = \alpha . \end{aligned} \] The test has asymptotic size \(\alpha\).
Consistency: Under \(H_1\) we have that \(\bR\hat{\bbeta}\xrightarrow{p} \bR\bbeta\neq \bq\). In words, the outside terms in \(W\) converge to something \(\neq 0\). At the same time \(\left(\bR\widehat{\avar}(\hat{\bbeta})\bR' \right)^{-1} \xrightarrow{p} \left(\bR\avar(\hat{\bbeta})\bR'\right)^{-1}\) — a positive definite matrix. We conclude that \[ \begin{aligned} & \left(\bR\hat{\bbeta} - \bq \right)' \left(\bR\widehat{\avar}(\hat{\bbeta})\bR' \right)^{-1} \left( \bR\hat{\bbeta} - \bq \right) \\ & \xrightarrow{p} \left(\bR\bbeta -\bq\right)' \left(\bR\avar(\hat{\bbeta})\bR'\right)^{-1}(\bR\bbeta-\bq) \\ & >0, \end{aligned} \] where we use the definition of positive definitiness. Finally, recall that there is also an \(N\) term in \(W\). We conclude that overall \[ W\xrightarrow{p} \infty \] It follows that the probability of rejecting tends to 1 for any \(\bbeta\) in \(H_1\). In other words, consistency holds.
Inference on a Nonlinear Function of Parameters
Let the outcome \(Y_i\), the covariates \(\bX_i\), and an unobserved component \(U_i\) be linked through the linear potential outcomes model \[ Y_i^{\bx} = \bx'\bbeta + U_i. \] Suppose that we observe an IID sample of data on \(Y_i, \bX_i\), that \(\E[U_i|\bX_i]=0\), that \(\E[\bX_i\bX_i']\) is invertible, and that \(\E[U_i^2\bX_i\bX_i']\) has maximal rank. Also suppose that \(\bbeta\) has \(p\geq 2\) components, that \(\beta_1>0\) and \(\beta_2>0\), and that you are interested in \[ \gamma = \sqrt{\beta_1\beta_2}. \]
- Construct a confidence interval for \(\gamma\) with asymptotic coverage \((1-\alpha)\).
- Consider \(H_0: \gamma=1\) vs. \(H_1:\gamma\neq 1\). Construct a consistent test for \(H_0\) vs. \(H_1\) with asymptotic size \(\alpha\).
Remember to prove coverage, consistency, and size properties.
Click to see the solution
First subquestion: we can use the delta method to construct a confidence interval for \(\gamma\). The delta method states that if \(\hat{\bbeta}\) is consistent and asymptotically normal, then \(g(\hat{\bbeta})\) is also consistent and asymptotically normal for any continuously differentiable function \(g(\cdot)\).
For our \(\gamma\), we take \(g(\bw) = \sqrt{w_1w_2}\). This \(g(\cdot)\) differentiable in a neighborhood of \(\bbeta\). The Jacobian \(\bG(\cdot)\) of \(g(\cdot)\) is a \(1\times p\) matrix given by \[ \bG(\bw) = \left( \dfrac{w_2}{2\sqrt{w_1w_2}}, \dfrac{w_1}{2\sqrt{w_1w_2}}, 0, \dots, 0 \right). \] By assumptions of the problem, \(G(\bbeta)\) has maximal rank.
Since \(g(\bbeta) = \gamma\), the delta method tells us that \[ \begin{aligned} \sqrt{N}(g(\hat{\bbeta}) - \gamma) & \xrightarrow{d} N(0, \bG(\bbeta)\avar(\hat{\bbeta})\bG(\bbeta)'). \end{aligned} \tag{3}\]
We can now construct a confidence interval in a standard way by using \(g(\hat{\bbeta})\) as the estimator and Equation 3 as the distributional base. Consider the following interval: \[ \begin{aligned} S & = \left[g(\hat{\bbeta}) - z_{1-\alpha/2}\sqrt{\dfrac{\widehat{\avar}(g(\hat{\bbeta})) }{N} }, g(\hat{\bbeta}) + z_{1-\alpha/2}\sqrt{\dfrac{\widehat{\avar}(g(\hat{\bbeta})) }{N} } \right], \end{aligned} \] where \(z_{1-\alpha/2}\) is the \((1-\alpha/2)\)th quantile of the standard normal distribution and \[ \widehat{\avar}(g(\hat{\bbeta})) = \bG(\hat{\bbeta})\widehat{\avar}(\hat{\bbeta})G(\hat{\bbeta})', \] for some consistent estimator \(\widehat{\avar}(\hat{\bbeta})\). It follows that \(\widehat{\avar}(g(\hat{\bbeta})) \xrightarrow{p} {\avar}(g(\hat{\bbeta}))\)
To compute asymptotic coverage, we first observe that by Slutsky’s theorem and Equation 3 it holds that \[ \sqrt{N}\dfrac{g(\hat{\bbeta})- \gamma }{\sqrt{\widehat{\avar}(g(\hat{\bbeta}))} } \xrightarrow{d} N(0, 1) \] Then by definition of convergence in distribution \[ \begin{aligned} & P\left(\gamma \in S \right) = P(g(\bbeta)\in S)\\ & = P\left( -z_{1-\alpha/2} \leq \sqrt{N}\dfrac{g(\hat{\bbeta})- g(\bbeta) }{\sqrt{\widehat{\avar}(g(\hat{\bbeta}))} } \leq z_{1-\alpha/2} \right) \\ % & \to \Phi(z_{1-\alpha/2}) - \Phi(-z_{1-\alpha/2}) \\ & = 1-\alpha. \end{aligned} \] In other words, \(S\) has asymptotic coverage \(1-\alpha\).
Second subquestion: we are dealing with a scalar transformation, and so we can use a \(t\)-test. Define the following statistic: \[ t = \dfrac{ \sqrt{ \hat{\beta}_1\hat{\beta}_2} - 1}{\sqrt{\widehat{\avar}(g(\hat{\bbeta}))/N} }. \] Here \(\sqrt{ \hat{\beta}_1\hat{\beta}_2 } =g(\hat{\bbeta})\).
Our test proceeds as follows:
- If \(\abs{t}>z_{1-\alpha/2}\), then we reject \(H_0\).
- If \(\abs{t}\leq z_{1-\alpha/2}\), then we do not reject \(H_0\).
Consistency and asymptotic size are established exactly as before. As always, the key and only ingredient is the asymptotic normality result (3). Please write out the proofs in detail.
Note: if the transformation were vector-valued, we would only be able to use the Wald test. The Wald test can also be used here, please solve this question using a Wald test like in the lectures to practice!
Consistency of the HC0 Asymptotic Variance Estimator
Let the outcome \(Y_i\), the scalar covariate \(X_i\), and an unobserved component \(U_i\) be linked through the linear potential outcomes model \[ Y_i^{\bx} = \beta X_i + U_i \tag{4}\] Suppose that we observe an IID sample of data on \(Y_i, X_i\), that \(\E[U_i|X_i]=0\), that \(\E[X_i^2]\neq 0\), and that \(\E[U_i^2 X_i^2]\) exists. Let \(\hat{\beta}\) be the OLS estimator obtained by regressing \(Y_i\) on \(X_i\).
Recall the HC0 (White 1980) estimator for \(\avar(\hat{\beta})\). In the scalar model (4) it is given by \[ \begin{aligned} \widehat{\avar}(\hat{\beta}) & = \dfrac{ N^{-1} \sum_{i=1}^N \hat{U_i}^2 X_i^2 }{ \left( N^{-1}\sum_{i=1}^N X_i^2 \right)^2 }. \end{aligned} \] Show that \[ \widehat{\avar}(\hat{\beta}) \xrightarrow{p} \avar(\hat{\beta}) \equiv \dfrac{ \E[U_i^2X_i^2] }{\left(\E[X_i^2] \right)^2 }. \] State explicitly any additional moment assumptions you make.
Click to see the solution
First, we substitute our model (4) in place of \(Y_i\) in \(\hat{U}_i\): \[ \begin{aligned} \hat{U}_i^2 & = (Y_i - X_i\hat{\beta})^2 = (X_i(\beta-\hat{\beta}) + U_i)^2\\ & = U_i^2 + 2(\beta-\hat{\beta}) U_iX_i+ (\beta-\hat{\beta})^2 X_i \end{aligned} \]
We then substitute this expression for \(\hat{U}_i\) into the middle term of our asymptotic variance estimator: \[ \begin{aligned} & \dfrac{1}{N} \sum_{i=1}^N \hat{U}_i^2 X_i^2 \\ & = \dfrac{1}{N} \sum_{i=1}^N U_i^2X_i^2 + (\beta-\hat{\beta}) \dfrac{2}{N}\sum_{i=1}^N U_i X_i^3 \\ & \quad + (\beta-\hat{\beta})^2 \dfrac{1}{N}\sum_{i=1}^N X_i^4. \end{aligned} \tag{5}\]
To handle the averages in Equation 5, we use the law of large numbers. By assumption, \(\E[U_i^2X_i^2]\) exists. We also assume that \(\E[U_iX_i^3]\) and \(\E[X_i^4]\) also exist. Then by the law of large numbers it holds that \[ \begin{aligned} \dfrac{1}{N} \sum_{i=1}^N U_i^2X_i^2 & \xrightarrow{p} \E[U_i^2 X_i^2],\\ \dfrac{1}{N}\sum_{i=1}^N U_i X_i^3 & \xrightarrow{p} \E[U_iX_i^3], \\ \dfrac{1}{N}\sum_{i=1}^N X_i^4 & \xrightarrow{p} \E[X_i^4]. \end{aligned} \] At the same time, all the conditions of our consistency result hold, and so \(\hat{\beta}-\beta \xrightarrow{p} 0\). By the continuous mapping theorem \[ \begin{aligned} (\beta-\hat{\beta}) \dfrac{2}{N}\sum_{i=1}^N U_i X_i^3 \xrightarrow{p} 0, \\ (\beta-\hat{\beta})^2 \dfrac{1}{N}\sum_{i=1}^N X_i^4 \xrightarrow{p} 0. \end{aligned} \] By combining the above convergence results, the result on equivalence of separate and joint convergence in probability, and the continuous mapping theorem, we conclude that \[ \dfrac{1}{N} \sum_{i=1}^N \hat{U}_i^2 X_i^2 \xrightarrow{p} \E[U_i^2X_i^2]. \] At the same time, by the law of large numbers, continuous mapping theorem, and the assumption that \(\E[X_i^2]\neq 0\) it holds that \[ \dfrac{1}{ \left(N^{-1}\sum_{i=1}^N X_i^2\right)^2 } \xrightarrow{p} \dfrac{1}{\left( \E[X_i^2] \right)^2 }. \] Combining the previous two equations with the continuous mapping theorem, we obtain the desired result. In terms of assumptions, we have assumed the existence of \(\E[U_iX_i^3]\) and \(\E[X_i^4]\), in addition to the assumptions of the problem.
Applied Exercises
Applied exercises are from Wooldridge (2020). In all cases, use asymptotic \(t\)- and Wald tests with robust standard errors:
- C9 in chapter 4,
- C4 and C6 in chapter 5,
- C8 in chapter 7.
For some more code examples and discussion, look at chapters 4, 5, 7 in Heiss and Brunner (2024)