arviz_stats.loo_influence

Contents

arviz_stats.loo_influence#

arviz_stats.loo_influence(data, var_names=None, group='posterior_predictive', sample_dims=None, log_likelihood_var_name=None, kind='mean', standardize=True, probs=None, log_weights=None, pareto_k=None)[source]#

Compute influential observations based on leave-one-out (LOO) expectations.

Computes observation influence by measuring the change in posterior or posterior predictive summaries when leaving out each observation. The function supports various summary statistics.

Parameters:
data: DataTree or InferenceData

It should contain the selected group and log_likelihood.

var_names: str or list of str, optional

The name(s) of the variable(s) to compute the influence.

group: str

Group from which to compute weighted expectations. Defaults to posterior_predictive.

sample_dimsstr or sequence of hashable, optional

Defaults to rcParams["data.sample_dims"]

log_likelihood_var_name: str, optional

The name of the variable in the log_likelihood group to use for loo computation. When log_likelihood contains more than one variable and group is posterior, this must be provided.

kind: str, optional

The kind of expectation to compute. Available options are:

  • ‘mean’. Default.

  • ‘median’.

  • ‘sd’.

  • ‘var’.

  • ‘quantile’.

  • ‘octiles’.

standardize: bool

Whether to standardize the computed metric. It uses the standard deviation when kind=mean and MAD when kind=median. Ignored for the other values of kind.

probs: float or list of float, optional

The quantile(s) to compute when kind is ‘quantile’.

log_weightsxarray.DataArray, optional

Pre-computed smoothed log weights from PSIS. Must be provided together with pareto_k. If not provided, PSIS will be computed internally.

pareto_kxarray.DataArray, optional

Pre-computed Pareto k-hat diagnostic values. Must be provided together with log_weights.

Returns:
shiftxarray.DataArray or xarray.Dataset

Influential metric

khatxarray.DataArray or xarray.Dataset

Function-specific Pareto k-hat diagnostics for each observation.

Examples

Calculate influential observations based on the posterior median for the parameter mu:

In [1]: from arviz_stats import loo_influence
   ...: from arviz_base import load_arviz_data
   ...: dt = load_arviz_data("centered_eight")
   ...: shift, _ = loo_influence(dt, kind="median", var_names="mu", group="posterior")
   ...: shift
   ...: 
Out[1]: 
<xarray.Dataset> Size: 576B
Dimensions:  (school: 8)
Coordinates:
  * school   (school) <U16 512B 'Choate' 'Deerfield' ... 'Mt. Hermon'
Data variables:
    mu       (school) float64 64B 0.3999 0.1501 0.1494 ... 0.06267 0.5269 0.1137

Calculate influential observations based on 3 quantiles of the posterior predictive:

In [2]: shift, khat = loo_influence(dt, kind="quantile", probs=[0.25, 0.5, 0.75])
   ...: shift
   ...: 
Out[2]: 
<xarray.DataArray 'obs' (school: 8)> Size: 64B
array([3.26781662, 0.84869115, 0.75617294, 0.43329059, 1.76334321,
       0.34259657, 3.46998725, 0.65380409])
Coordinates:
  * school   (school) <U16 512B 'Choate' 'Deerfield' ... 'Mt. Hermon'