Statistic utilities¶
mef_agri.evaluation.stats_utils.py
Utilities for statistical computations.
- mef_agri.evaluation.stats_utils.get_gamma_params(std, mean=None, mode=None)¶
Compute the shape and rate parameter for the gamma-distribution from the provided standard deviation as well as mean or mode.
- Parameters:
std (float) – standard deviation of the gamma-distribution
mean (float, optional) – mean or expectation value of the gamma-distribution, defaults to None
mode (float, optional) – mode of the gamma-distribution, defaults to None
- Returns:
shape and rate parameter of the gamma distribution
- Return type:
tuple[float, float]
- mef_agri.evaluation.stats_utils.get_beta_params(std, mean=None, mode=None)¶
Compute the shape parameters for the beta-distribution from the provided standard deviation as well as mean or mode.
Important Notes:
If
mode
is specified, it will be treated as mean value. The reason is, that the computation of the shape parameters with the mode equation of the beta-distribution cannot be solved analytically. Attempts to solve this numerically (e.g. scipy.optimize.fsolve) do not yield satisfying results.Edge cases (
mean
values close to 0.0 or 1.0 with “low”std
) lead to non-intuitive resulting distribution shapes or even to negative shape-parameters (i.e. beta-distribution is not defined for the provided combination ofmean
andstd
).- Parameters:
std (float) – standard deviation of the beta-distribution
mean (float, optional) – mean or expectation value of the beta-distribution, defaults to None
mode (float, optional) – mode of the beta-distribution, defaults to None
- Returns:
shape paramaters of the beta-distribution
- Return type:
tuple[float, float]
- mef_agri.evaluation.stats_utils.get_truncnorm_params(std, mean, lower_thresh, upper_thresh)¶
Compute the a and b parameter which are used in
scipy.stats.truncnorm.rvs()
from the given standard deviation and mean value as well as from the lower and upper thresholds of the resulting truncated normal distribution.- Parameters:
std (float) – standard deviation of non-truncated normal distribution
mean (float) – mean value of non-truncated normal distribution, which will be the mode of the resulting truncated normal distr.
lower_thresh (float) – lower threshold of resulting truncated normal distr.
upper_thresh (float) – upper threshold of resulting truncated normal distr.
- Returns:
a
andb
parameters used byscipy.stats.truncnorm.rvs()
- Return type:
tuple[float, float]
- mef_agri.evaluation.stats_utils.get_values_probs(value, std, lb, ub)¶
Creates a categorical distribution (details in
RVSampler.get_sampled_values()
)- Parameters:
value (int) – most probable value (i.e. mode)
std (int) – number of discrete values on “one” side of
value
lb (int) – lower bound
ub (int) – upper bound
- Returns:
two arrays - first one contains the values and the second one the corresponding probabilities
- Return type:
tuple[numpy.ndarray, numpy.ndarray]
- class mef_agri.evaluation.stats_utils.DISTRIBUTIONS¶
Helper class which holds string values (class variables) to define the distribution type:
NORMAL_1D =
'normal'
GAMMA_1D =
'gamma'
BETA_1D =
'beta'
CATEGORICAL_1D =
'categorical'
TRUNCNORM_1D =
'truncnorm'
- class mef_agri.evaluation.stats_utils.DISTRIBUTION_TYPE¶
Helper class which holds string values (class variables) to specify if a distribution is continuous or discrete:
DISCRETE =
'discrete'
CONTINUOUS =
'continuous'
- class mef_agri.evaluation.stats_utils.RVSampler¶
Class which is used to sample model quantities. It is especially important in the case of multiprocessing as the random state of the scipy.stats distributions is initialized in each instanciation of this class (i.e. in each core used for evaluation). Otherwise, the samples within each process/core would be equal.
- get_sampled_values(value, distr, size)¶
Get a sample from the distribution specified in the distr dictionary. Thus each dinfo dict has to contain at least a key distr_id to specify the distribution to be used for sampling. Depending on the chosen distribution, the provided value will be used accordingly.
Normal distribution
value
will be used as mean value anddistr
should contain the following keys'std'
(float) - standard deviation
Gamma distribution
value
will be used as mode and thedistr
should contain the following keys'std'
(float) - standard deviation
Beta distribution
value
will be used as mean value and thedistr
should contain the following keys'std'
(float) - standard deviation
Categorical distribution
value
will be used as category/integer/discrete value with the highest probability anddistr
should contain the following keys'std'
(float) - will be used as number of possible categories/values on “one side” ofvalue
'lb'
(float) - lower bound of the possible values (truncates ifstd
would exceed this value)'ub'
(float) - upper bound of the possible values (truncates ifstd
would exceed this value)
truncated Normal distribution
value
will be the mean/mode of the resulting truncated normal distribution anddistr
should contain the following keys'std'
(float): standard deviation of original normal distribution'lb'
(float) - lower bound of the truncated normal distribution'ub'
(float) - upper bound of the truncated normal dsitribution
- Parameters:
value (float) – reference value for sampled distribution - usage depends on the sampling distribution
dinfo (dict) – dictionary containing the the values which are necessary to compute the paramters used in
scipy.stats
which is necessary to draw samplessize (int) – number of samples drawn from the distribution
- Returns:
array containing sampled values (length according to
size
)- Return type:
numpy.ndarray