OptimalUnitAverager¶
- class unit_averaging.averager.OptimalUnitAverager(focus_function, ind_estimates, ind_covar_ests, unrestricted_units_bool=None)[source]¶
Bases:
BaseUnitAverager
Optimal weight scheme that minimizes the plug-in Mean Squared Error (MSE).
It supports two regimes: fixed-N and large-N, each with different approaches to weight allocation.
Fixed-N Regime: In the fixed-N regime, the weights of all units vary independently, subject only to the constraints of non-negativity and summing to 1. This is an “agnostic” scheme where the algorithm determines the optimal weight for each unit individually, without any grouping or restrictions. This regime is suitable when the number of units is small or when you want to allow maximum flexibility in weight allocation.
Large-N Regime: In the large-N regime, you can specify some units as “unrestricted” (free) while the remaining units are considered “restricted.” The key idea is that:
The weights of unrestricted units vary independently.
All restricted units receive equal weights.
The algorithm only chooses the weight of the restricted set as a whole.
This approach is particularly useful when you have a large number of restricted units. The average of a large restricted set will closely approximate the true average of the parameters. This allows for more efficient and precise shrinkage, as the algorithm can focus on optimizing the weights of the unrestricted units and the total weight of the restricted set.
- Parameters:
focus_function (BaseFocusFunction) – Focus function expressing the transformation of interest.
ind_estimates (np.ndarray | list | dict[str | int, np.ndarray | list]) – Individual unit estimates. Can be a list, numpy array, or dictionary. Each unit-specific estimate should be a NumPy array or list. The first dimension of
ind_estimates
indexes units (rows or dictionary entries).ind_covar_ests (np.ndarray | list | dict[str | int, np.ndarray | list]) – Individual unit covariance estimates. Can be a list, numpy array, or dictionary. Each unit-specific covariance estimate should be a NumPy array or list of lists. The first dimension of
ind_covar_ests
indexes units (rows or dictionary entries).unrestricted_units_bool (np.ndarray | list | dict[str | int, bool] | None) – Optional. Boolean array indicating which units are unrestricted for weight computations, with
True
meaning that a unit is unrestricted. If a dictionary, keys should match those in ind_estimates andind_covar_ests
. If None, all units are considered unrestricted. Defaults to None.
- Attributes:
ind_estimates (np.ndarray) – Array of individual unit estimates.
ind_covar_ests (np.ndarray) – Array of individual unit covariance estimates.
unrestricted_units_bool (np.ndarray) – Boolean array indicating which units are unrestricted.
keys (np.ndarray) – Array of keys corresponding to the units. The individual estimates are converted to numpy arrays internally. If
ind_estimates
is a dictionary, the keys are preserved in thekeys
attribute. Ifind_estimates
is a list or array,keys
defaults to numeric indices (0, 1, 2, …).weights (np.ndarray) – The computed weights for each unit.
estimate (float) – The computed unit averaging estimate.
focus_function (
BaseFocusFunction
) – Focus function expressing the transformation of interest.target_id (int | str) – The ID of the target unit. Initialized as None, set by calling
fit()
.
Example
>>> from unit_averaging import OptimalUnitAverager, InlineFocusFunction >>> import numpy as np >>> # Define a focus function >>> focus_function = InlineFocusFunction( ... lambda x: x[0] * x[1], ... lambda x: np.array([x[1], x[0]]), ... ) >>> # Define individual unit estimates >>> ind_estimates = { ... "unit1": np.array([5, 6]), ... "unit2": np.array([7, 8]), ... "unit3": np.array([9, 3]), ... "unit4": np.array([3, 10]), ... } >>> # Define individual unit covariance estimates >>> ind_covar_ests = { ... "unit1": np.array([[3, 0.25], [0.25, 3]]), ... "unit2": np.array([[4, 0.5], [0.5, 5]]), ... "unit3": np.array([[1, -0.25], [-0.25, 1]]), ... "unit4": np.array([[1, 0.5], [0.5, 1]]), ... } >>> # Define unrestricted units >>> unrestricted_units_bool = { ... "unit1": True, ... "unit2": True, ... "unit3": False, ... "unit4": False, ... } >>> # Create an OptimalUnitAverager instance >>> averager = OptimalUnitAverager( ... focus_function, ind_estimates, ind_covar_ests, unrestricted_units_bool ... ) >>> # Fit the averager to the target unit >>> averager.fit(target_id="unit1") >>> print(averager.weights.round(3)) # [0.324 0. 0.338 0.338] >>> print(averager.estimate) # 28.99
Methods
- average(focus_function=None)[source]¶
Perform unit averaging with the fitted weights.
This method computes the unit averaging estimate using the fitted weights. It can accept a different focus function and reuse the fitted weights.
- Parameters:
focus_function (BaseFocusFunction | None) – Focus function to use in computing the averaging estimator. Expresses the parameter of interest. If None, defaults to the focus function used in fitting.
- Returns:
The unit averaging estimate.
- Return type:
float
- Raises:
TypeError – If weights have not been fitted yet by calling
fit()
- fit(target_id)[source]¶
Compute the unit averaging weights and the averaging estimator.
- Parameters:
target_id (int | str) – ID of the target unit. This is specified in terms of the keys attribute, which are either numeric indices (if
ind_estimates
was an array or list) or dictionary keys (ifind_estimates
was a dictionary)- Raises:
ValueError – If the target unit is not found in the keys.