Statistic utilities

mef_agri.evaluation.stats_utils.py

Utilities for statistical computations.

mef_agri.evaluation.stats_utils.get_gamma_params(std, mean=None, mode=None)

Compute the shape and rate parameter for the gamma-distribution from the provided standard deviation as well as mean or mode.

Parameters:
  • std (float) – standard deviation of the gamma-distribution

  • mean (float, optional) – mean or expectation value of the gamma-distribution, defaults to None

  • mode (float, optional) – mode of the gamma-distribution, defaults to None

Returns:

shape and rate parameter of the gamma distribution

Return type:

tuple[float, float]

mef_agri.evaluation.stats_utils.get_beta_params(std, mean=None, mode=None)

Compute the shape parameters for the beta-distribution from the provided standard deviation as well as mean or mode.

Important Notes:

If mode is specified, it will be treated as mean value. The reason is, that the computation of the shape parameters with the mode equation of the beta-distribution cannot be solved analytically. Attempts to solve this numerically (e.g. scipy.optimize.fsolve) do not yield satisfying results.

Edge cases (mean values close to 0.0 or 1.0 with “low” std) lead to non-intuitive resulting distribution shapes or even to negative shape-parameters (i.e. beta-distribution is not defined for the provided combination of mean and std).

Parameters:
  • std (float) – standard deviation of the beta-distribution

  • mean (float, optional) – mean or expectation value of the beta-distribution, defaults to None

  • mode (float, optional) – mode of the beta-distribution, defaults to None

Returns:

shape paramaters of the beta-distribution

Return type:

tuple[float, float]

mef_agri.evaluation.stats_utils.get_truncnorm_params(std, mean, lower_thresh, upper_thresh)

Compute the a and b parameter which are used in scipy.stats.truncnorm.rvs() from the given standard deviation and mean value as well as from the lower and upper thresholds of the resulting truncated normal distribution.

Parameters:
  • std (float) – standard deviation of non-truncated normal distribution

  • mean (float) – mean value of non-truncated normal distribution, which will be the mode of the resulting truncated normal distr.

  • lower_thresh (float) – lower threshold of resulting truncated normal distr.

  • upper_thresh (float) – upper threshold of resulting truncated normal distr.

Returns:

a and b parameters used by scipy.stats.truncnorm.rvs()

Return type:

tuple[float, float]

mef_agri.evaluation.stats_utils.get_values_probs(value, std, lb, ub)

Creates a categorical distribution (details in RVSampler.get_sampled_values())

Parameters:
  • value (int) – most probable value (i.e. mode)

  • std (int) – number of discrete values on “one” side of value

  • lb (int) – lower bound

  • ub (int) – upper bound

Returns:

two arrays - first one contains the values and the second one the corresponding probabilities

Return type:

tuple[numpy.ndarray, numpy.ndarray]

class mef_agri.evaluation.stats_utils.DISTRIBUTIONS

Helper class which holds string values (class variables) to define the distribution type:

  • NORMAL_1D = 'normal'

  • GAMMA_1D = 'gamma'

  • BETA_1D = 'beta'

  • CATEGORICAL_1D = 'categorical'

  • TRUNCNORM_1D = 'truncnorm'

class mef_agri.evaluation.stats_utils.DISTRIBUTION_TYPE

Helper class which holds string values (class variables) to specify if a distribution is continuous or discrete:

  • DISCRETE = 'discrete'

  • CONTINUOUS = 'continuous'

class mef_agri.evaluation.stats_utils.RVSampler

Class which is used to sample model quantities. It is especially important in the case of multiprocessing as the random state of the scipy.stats distributions is initialized in each instanciation of this class (i.e. in each core used for evaluation). Otherwise, the samples within each process/core would be equal.

get_sampled_values(value, distr, size)

Get a sample from the distribution specified in the distr dictionary. Thus each dinfo dict has to contain at least a key distr_id to specify the distribution to be used for sampling. Depending on the chosen distribution, the provided value will be used accordingly.

Normal distribution

value will be used as mean value and distr should contain the following keys

  • 'std' (float) - standard deviation

Gamma distribution

value will be used as mode and the distr should contain the following keys

  • 'std' (float) - standard deviation

Beta distribution

value will be used as mean value and the distr should contain the following keys

  • 'std' (float) - standard deviation

Categorical distribution

value will be used as category/integer/discrete value with the highest probability and distr should contain the following keys

  • 'std' (float) - will be used as number of possible categories/values on “one side” of value

  • 'lb' (float) - lower bound of the possible values (truncates if std would exceed this value)

  • 'ub' (float) - upper bound of the possible values (truncates if std would exceed this value)

truncated Normal distribution

value will be the mean/mode of the resulting truncated normal distribution and distr should contain the following keys

  • 'std' (float): standard deviation of original normal distribution

  • 'lb' (float) - lower bound of the truncated normal distribution

  • 'ub' (float) - upper bound of the truncated normal dsitribution

Parameters:
  • value (float) – reference value for sampled distribution - usage depends on the sampling distribution

  • dinfo (dict) – dictionary containing the the values which are necessary to compute the paramters used in scipy.stats which is necessary to draw samples

  • size (int) – number of samples drawn from the distribution

Returns:

array containing sampled values (length according to size)

Return type:

numpy.ndarray