Evaluating Confidence Intervals

Understanding Actual Coverage and Length Properties

Vladislav Morozov

Introduction

Lecture Info

Learning Outcomes

This lecture is about evaluating confidence intervals


By the end, you should be able to

  • Describe key properties of confidence intervals
  • Write a small simulation organized in several functions
  • Use simulations to evaluate confidence interval coverage and length

References


Motivating Example

Setting and Estimators

Suppose that observe an IID sample \(X_1, \dots, X_N\) where \[ X_i \sim N(\mu, \sigma^2) \] We estimate \(\mu\) and \(\sigma^2\) with their maximum likelihood estimators

\[ \hat{\mu} = \dfrac{1}{N}\sum_{i=1}^N X_i, \quad \hat{\sigma}^2 = \dfrac{1}{N}\sum_{i=1}^{N}(X_i -\bar{X})^2 \]

Towards Confidence Intervals

In practice usually want to quantify uncertainty using confidence intervals

  • Can construct based on exact sampling distribution of \(\hat{\mu}\) and \(\hat{\sigma}^2\)
  • More often and generally, use asymptotic normality approximations, e.g. that if \(\mu=0\), then under any distribution of data

\[ \sqrt{N}(\hat{\sigma}^2 - \sigma^2) \xrightarrow{d} N(0, \E[X_i^4]-\sigma^4) \]

Motivating Issue: Errors in Mean vs. in Variance

  • Suppose that true values are \(\mu=0\) and \(\sigma^2 = 0.1\)
  • For the mean: you can make the same mistake in both directions: e.g. you can underestimate or overestimate the mean by \(0.5\)
  • You can also overestimate \(\sigma^2\) by 0.5

But you cannot underestimate \(\sigma^2\) by \(0.5\) (which would mean an estimate of \(-0.4\))

\(\Rightarrow\) Distribution of estimation errors \((\sigma^2-\sigma^2)\) cannot be symmetric

Simulation Question of Today

  • Asymmetry of error distribution \(\Rightarrow\) maybe asymptotic normal (symmetric) approximation is not good?
  • This effect may be more pronounced when \(\sigma\) is small (less space for underestimation)
  • Does it affect he “quality” of asymptotic confidence intervals for \(\sigma^2\)?

Today: check numerically whether confidence intervals properties change as \(\sigma^2\to 0\)

Theory Essentials for Confidence Intervals

Definitions and Examples

Confidence Sets: Definition

Definition 1  

  1. A \((1-\alpha)\times 100\%\) confidence set for \(\theta\) (\(\theta\in\R^p\)) is a random set \(S(X_1, \dots, X_N)\subseteq \R^p\)
    \[ \scriptsize P(\theta \in S(X_1, \dots, X_N)) = 1-\alpha \]

  2. \(S(\cdot, \cdots)\) is an asymptotic \((1-\alpha)\times 100\%\) confidence set for \(\theta\) if \(\lim_{N\to\infty} P(\theta \in S(X_1, \dots, X_N)) = 1-\alpha\)

  3. \(P(\theta \in S(X_1, \dots, X_N))\) is the coverage of \(S\)

Confidence Sets as Set Estimators

Suppose that we are interested in some \(\btheta\in \R^p\)

Confidence sets can be viewed as set estimators — return a whole set of values in \(\R^p\) as a collection of guesses for \(\btheta\)


  • Confidence sets — intervals estimators with coverage guarantees
  • Should contrast with point estimators (e.g. OLS, IV, etc.) — give only one point

Familiar Example: CI Based on Asymptotic Normality

Proposition 1 Suppose that \(\sqrt{N}(\hat{\theta} - \theta) \xrightarrow{d} N(0, \avar(\hat{\theta}))\). Let \(\widehat{\avar}(\hat{\theta})\xrightarrow{p} \avar(\hat{\theta})\).

The confidence interval \[ \small \hspace{-1.6cm} S = \left[ \hat{\theta} - z_{1-\alpha/2} \sqrt{ \frac{\widehat{\avar}(\hat{\theta})}{N} }, \hat{\theta}+ z_{1-\alpha/2} \sqrt{ \frac{\widehat{\avar}(\hat{\theta})}{N} } \right] \tag{1}\] has asymptotic coverage \((1-\alpha)\times 100\%\)

Connection to Tests: Test Inversion

There is an equivalent way to construct the confidence interval (1)

  • Recall \(t\)-statistic for \(H_0: \theta = c\)
  • \(S\) is the set of all \(c\) for which the \(t\)-test does not reject


An example of test inversion and equivalence between testing and confidence intervals

Interpetation of Confidence Interval

Recall interpretation of \(95\%\) coverage:

  • Suppose we draw many samples of the same size
  • In ~95% of the samples the CI will fall “on top” of the true parameter value


Notice: interpretation aligns with what we actually do in Monte Carlo

Metrics for Evaluating Confidence Intervals

Metrics for Evaluating Confidence Intervals


There are two key metrics for looking at confidence intervals in simulations

  1. Coverage: what the actual coverage is
  2. (Average) length

Generally prefer shorter intervals with coverage that is close to the nominal level

Metric: Actual Coverage

How do you evaluate coverage?


Draw many datasets and in each sample \(s\)

  • Compute CI
  • Check if CI include the true parameter and record inclusion as variable \(D_s = 0, 1\) (1 for included)

Report estimated coverage as average of \(D_s\) over \(s\)

Metric: Average Length

Similar story for length:

Draw many datasets and in each sample \(s\)

  • Compute CI = \([\hat{a}_s, \hat{b}_s]\)
  • Record \(\text{Length}_s = \hat{b}_s - \hat{a}_s\)

Report estimated average of \(\text{Length}_s\) over \(s\)

Can also report other characteristics of length distribution (e.g. quantiles)

Simulation Implementation

Completing Simulation Design

Recap: Simulation Setting

Let’s come back to our example of today. Already have

  • DGP of variables (normal with different variances)
  • Estimators: maximum likelihood
  • Confidence interval to evaluate: based on asympotic normality of \(\hat{\sigma}^2\)

Still need to choose remaining DGP parameters: mean \(\mu\), sample size \(N\), and values of \(\sigma\)

Choosing the Remaining Parameters

Can set \[ \mu = 0 \]

For sample sizes:

  • Consider two: \(N=20\) and \(N=100\)
  • If asymptotic approximation problems exist, they should be less present with larger sample sizes

For \(\sigma\): logarithmic grid between \(10^{-3}\) and \(10\)

Implementing The Simulation

Approach to Code: Functions

Our simulation

  • Quite small
  • Unlikely to need many more DGPs and different estimators

Today will organize as a small collection of functions

Contrast with the modular approach of the previous two examples

Code Structure

├── main.py
├── results 
│   └── ...
└── sim_infrastructure
    ├── __init__.py 
    ├── core.py
    └── plotting.py

main.py:

  1. Runs the simulation using function from sim_infrastructure.core
  2. Makes plots using functions from sim_infrastructure.plotting

core Simulation Module

sim_infrastructure/core.py
"""
Core simulation logic for evaluating confidence intervals for variance.

This module contains the core functions:
- mle_variance: maximum likelihood estimation of variance.
- asymptotic_ci_variance: construction of asymptotic confidence intervals.
- run_simulations: running Monte Carlo simulations for coverage and interval length.
"""

import numpy as np

from scipy.stats import norm

def mle_variance(data: np.ndarray) -> np.floating:
    """Estimate the variance of a normal distribution using MLE."""
    return np.var(data, ddof=0)  # MLE for variance is the sample variance with ddof=0

def asymptotic_ci_variance(data: np.ndarray, alpha: float = 0.05) -> tuple[float, float]:
    """Construct an asymptotic (Wald-type) confidence interval for the variance."""
    n = len(data)
    var_hat = mle_variance(data)
    se = np.sqrt(2 * var_hat**2 / n)
    z_crit = norm.ppf(1 - alpha / 2)
    lower = max(0, var_hat - z_crit * se)  # Ensure lower bound is non-negative
    upper = var_hat + z_crit * se
    return lower, upper

def run_simulations(
    true_vars: np.ndarray, 
    n_simulations: int = 1000,
    n_obs_list: list[int] = [20, 100],
    alpha: float = 0.05,
    seed: int = 1,
) -> dict[int, dict[float, dict]]:
    """
    Run Monte Carlo simulations for each true variance and sample size.

    Args:
        true_vars (np.ndarray): true variance values to simulate.
        n_simulations (int): number of Monte Carlo simulations per true variance.
        n_obs_list (list[int]): list of sample sizes to simulate.
        alpha (float): significance level for confidence intervals.
        seed (int): Random seed for reproducibility.

    Returns:
        Nested dictionary: {n_obs: {true_var: {'coverage': ..., 'errors': ..., 'lengths': ...}}}
    """
    rng = np.random.default_rng(seed)
    results = {n_obs: {} for n_obs in n_obs_list}

    # Loop over sample sizes
    for n_obs in n_obs_list:
        # Loop over true variables
        for true_var in true_vars:
            errors = []
            lengths = []
            coverage = []
            for _ in range(n_simulations):
                # Draw data and run estimation
                data = rng.normal(0, np.sqrt(true_var), n_obs)
                var_hat = mle_variance(data)

                # Record estimation error in variance
                errors.append(var_hat - true_var)

                # Construct CI and check if variance belongs to it
                ci_lower, ci_upper = asymptotic_ci_variance(data, alpha)
                lengths.append((ci_upper - ci_lower) / true_var) 
                coverage.append(ci_lower <= true_var <= ci_upper)
 
            results[n_obs][true_var] = {
                'coverage': np.mean(coverage),
                'errors': np.array(errors),
                'lengths': np.array(lengths)
            }

    return results

main.py

main.py
"""
Entry point for running simulations and generating plots.

Simulation 3: evaluating confidence intervals for variance of the normal
distribution based on the maximum likelihood estimator. Simultions consider
coverage and (scaled) length properties as true variance decreases to zero.

This script:
- Sets up simulation parameters.
- Runs the core simulation.
- Generates and saves all plots.
"""

from pathlib import Path

import numpy as np

from sim_infrastructure.core import run_simulations
from sim_infrastructure.plotting import plot_coverage_and_lengths, plot_error_kdes

# Simulation parameters: note upper case for constants
TRUE_VARS = np.logspace(-3, 1, 50)
N_OBS_LIST = [20, 100]
N_SIMS = 50000
CI_ALPHA = 0.05
SEED = 1
OUTPUT_DIR = Path() / "results"


def main():
    # Run simulations
    results = run_simulations(TRUE_VARS, N_SIMS, N_OBS_LIST, CI_ALPHA, SEED)

    # Generate plots
    plot_coverage_and_lengths(results, OUTPUT_DIR)
    plot_error_kdes(results, OUTPUT_DIR)


if __name__ == "__main__":
    main()

Core Organization: Discussion

  • Tightly couple code
    • Sacrifice extensibility
    • But much easier to write, much less infrastructure
  • Fine for our small targeted simulation
  • Inside simulation: store CI coverage and length; estimation error of maximum likelihood estimator

Simulation Results

Core Results

Visualizing Core Results

  • Our simulation produces coverage and length for each each value of \(\sigma\)
  • How to look at these results?


Can visually graphically with coverage and length plots

For length: scale by magnitude to target value to get a “normalized” view

Coverage Plot

Length Plot

Interpretation of Simulation Results

For the core question:

  • Remarkably stable coverage and length properties
  • No dependence on \(\sigma\)

\(\Rightarrow\) no issue with \(\sigma\approx 0\)


But do see coverage issues for \(N=20\) despite much longer intervals (around 87% real coverage with 95% nominal level). Why?

Why Coverage Issues for Smaller Samples?

Potential Explanations of Bad Coverage for \(N=20\)

Why the coverage issues for \(N=20\)?

  • Normal approximation might be poor (incorrect shape of interval around its center)
  • MLE estimator is biased downward (incorrect center of the interval)

Can look at distribution of estimation errors and how it compares to the normal distribution with same variance

Example Distribution Plot for \(N=20\)

Example Distribution Plot for \(N=100\)

Discussions of Distributional Approximation

From looking at the two plots:

  • The shape is not perfect for \(N=20\): longer right tail, some asymmetry
  • Bias is present: noticeable shift to the left

But much better approximation for \(N=100\)


\(\Rightarrow\) Coverage issues are due to small sample effects

Recap and Conclusions

Recap


In this lecture we

  • Reviewed key concepts for confidence intervals
  • Identified key metrics: coverage and length
  • Saw a practical small simulation organized around tightly couple functions

References

Andrews, Donald W. K. 1999. “Estimation When a Parameter Is on a Boundary.” Econometrica 67 (6): 1341–83. https://doi.org/10.1111/1468-0262.00082.
———. 2000. “Inconsistency of the Bootstrap When a Parameter Is on the Boundary of the Parameter Space.” Econometrica 68 (2): 399–405. https://doi.org/10.1111/1468-0262.00114.
———. 2001. “Testing When a Parameter Is on the Boundary of the Maintained Hypothesis.” Econometrica 69 (3): 683–734. https://doi.org/10.1111/1468-0262.00210.
Hansen, Bruce. 2022. Econometrics. Princeton_University_Press.