arviz_stats.loo_moment_match

Contents

arviz_stats.loo_moment_match#

arviz_stats.loo_moment_match(data, loo_orig, log_prob_upars_fn, log_lik_i_upars_fn, upars=None, var_name=None, reff=None, max_iters=30, k_threshold=None, split=True, cov=True, pointwise=None)[source]#

Compute moment matching for problematic observations in PSIS-LOO-CV.

Adjusts the results of a previously computed Pareto smoothed importance sampling leave-one-out cross-validation (PSIS-LOO-CV) object by applying a moment matching algorithm to observations with high Pareto k diagnostic values. The moment matching algorithm iteratively adjusts the posterior draws in the unconstrained parameter space to better approximate the leave-one-out posterior.

The moment matching algorithm is described in [1] and the PSIS-LOO-CV method is described in [2] and [3].

See the EABM chapter on Moment Matching for more details.

Parameters:
dataxarray.DataTree or InferenceData

Input data. It should contain the posterior and the log_likelihood groups.

loo_origELPDData

An existing ELPDData object from a previous loo result. Must contain pointwise Pareto k values (pointwise=True must have been used).

log_prob_upars_fncallable

Function that computes the log probability density of the full posterior distribution evaluated at unconstrained parameter draws. The function signature is log_prob_upars_fn(upars) where upars is a DataArray of unconstrained parameter draws with dimensions chain, draw, and a parameter dimension. It should return a DataArray with dimensions chain, draw.

log_lik_i_upars_fncallable

Function that computes the log-likelihood of a single left-out observation evaluated at unconstrained parameter draws. The function signature is log_lik_i_upars_fn(upars, i) where upars is a DataArray of unconstrained parameter draws and i is the integer index of the left-out observation. It should return a DataArray with dimensions chain, draw.

uparsxarray.DataArray, optional

Posterior draws transformed to the unconstrained parameter space. Must have chain and draw dimensions, plus one additional dimension containing all parameters. Parameter names can be provided as coordinate values on this dimension. If not provided, will attempt to use the unconstrained_posterior group from the input data if available.

var_namestr, optional

The name of the variable in log_likelihood group storing the pointwise log likelihood data to use for loo computation.

reff: float, optional

Relative MCMC efficiency, ess / n i.e. number of effective samples divided by the number of actual samples. Computed from trace by default.

max_itersint, default 30

Maximum number of moment matching iterations for each problematic observation.

k_thresholdfloat, optional

Threshold value for Pareto k values above which moment matching is applied. Defaults to \(\min(1 - 1/\log_{10}(S), 0.7)\), where S is the number of samples.

splitbool, default True

If True, only transform half of the draws and use multiple importance sampling to combine them with untransformed draws.

covbool, default True

If True, match the covariance structure during the transformation, in addition to the mean and marginal variances. If False, only match the mean and marginal variances.

pointwise: bool, optional

If True, the pointwise predictive accuracy will be returned. Defaults to rcParams["stats.ic_pointwise"]. Moment matching always requires pointwise data from loo_orig. This argument controls whether the returned object includes pointwise data.

Returns:
ELPDData

Object with the following attributes:

  • kind: “loo”

  • elpd: expected log pointwise predictive density

  • se: standard error of the elpd

  • p: effective number of parameters

  • n_samples: number of samples

  • n_data_points: number of data points

  • scale: “log”

  • warning: True if the estimated shape parameter of Pareto distribution is greater than good_k.

  • good_k: For a sample size S, the threshold is computed as min(1 - 1/log10(S), 0.7)

  • elpd_i: DataArray with the pointwise predictive accuracy, only if pointwise=True.

  • pareto_k: DataArray with moment-matched Pareto shape values, only if pointwise=True.

  • approx_posterior: False (not used for standard LOO)

  • log_weights: class:DataArray with smoothed log weights (updated for successfully moment-matched observations).

  • influence_pareto_k: DataArray with original (pre-moment-matching) Pareto shape values, only if pointwise=True.

  • n_eff_i: DataArray with effective sample size per observation, only if pointwise=True.

See also

loo

Standard PSIS-LOO-CV.

reloo

Exact re-fitting for problematic observations.

References

[1]

Paananen, T., Piironen, J., Buerkner, P.-C., Vehtari, A. (2021). Implicitly Adaptive Importance Sampling. Statistics and Computing. 31(2) (2021) https://doi.org/10.1007/s11222-020-09982-2 arXiv preprint https://arxiv.org/abs/1906.08850.

[2]

Vehtari et al. Practical Bayesian model evaluation using leave-one-out cross-validation and WAIC. Statistics and Computing. 27(5) (2017) https://doi.org/10.1007/s11222-016-9696-4 arXiv preprint https://arxiv.org/abs/1507.04544.

[3]

Vehtari et al. Pareto Smoothed Importance Sampling. Journal of Machine Learning Research, 25(72) (2024) https://jmlr.org/papers/v25/19-556.html arXiv preprint https://arxiv.org/abs/1507.02646