Bonferroni-Dunn’s procedure for the adjusted p-value computation.
p_values: 2-D array or DataFrame containing the p-values obtained from a ranking test. control: int or string. Index or Name of the control algorithm.
APVs: DataFrame containing the adjusted p-values.
Finner’s procedure for the adjusted p-value computation.
p_values: 2-D array or DataFrame containing the p-values obtained from a ranking test. control: int or string. Index or Name of the control algorithm.
APVs: DataFrame containing the adjusted p-values.
Hochberg’s procedure for the adjusted p-value computation.
p_values: 2-D array or DataFrame containing the p-values obtained from a ranking test. control: int or string. Index or Name of the control algorithm.
APVs: DataFrame containing the adjusted p-values.
Holland’s procedure for the adjusted p-value computation.
p_values: 2-D array or DataFrame containing the p-values obtained from a ranking test. control: int or string. Index or Name of the control algorithm.
APVs: DataFrame containing the adjusted p-values.
Holm’s procedure for the adjusted p-value computation.
p_values: 2-D array or DataFrame containing the p-values obtained from a ranking test. control: optional int or string. Default None
Index or Name of the control algorithm. If control is provided, control vs all comparisons are considered, else all vs all.
APVs: DataFrame containing the adjusted p-values.
Li’s procedure for the adjusted p-value computation.
p_values: 2-D array or DataFrame containing the p-values obtained from a ranking test. control: optional int or string. Default None
Index or Name of the control algorithm. If control is provided, control vs all comparisons are considered, else all vs all.
APVs: DataFrame containing the adjusted p-values.
Bayesian version of the sign test.
data – An (n x 2) array or DataFrame contaning the results. In data, each column represents an algorithm and, and each row a problem.
rope_limits – array_like. Default [-0.01, 0.01]. Limits of the practical equivalence.
prior_strength – positive float. Default 0.5. Value of the prior strengt
prior_place – string {left, rope, right}. Default ‘left’. Place of the pseudo-observation z_0.
sample_size – integer. Default 10000. Total number of random_search samples generated
return_sample – boolean. Default False. If true, also return the samples drawn from the Dirichlet process.
List of posterior probabilities: [Pr(algorith_1 < algorithm_2), Pr(algorithm_1 equiv algorithm_2), Pr(algorithm_1 > algorithm_2)]
Bayesian version of the signed rank test.
data – An (n x 2) array or DataFrame contaning the results. In data, each column represents an algorithm and, and each row a problem.
rope_limits – array_like. Default [-0.01, 0.01]. Limits of the practical equivalence.
prior_strength – positive float. Default 0.5. Value of the prior strengt
prior_place – string {left, rope, right}. Default ‘left’. Place of the pseudo-observation z_0.
sample_size – integer. Default 10000. Total number of random_search samples generated
return_sample – boolean. Default False. If true, also return the samples drawn from the Dirichlet process.
List of posterior probabilities: [Pr(algorith_1 < algorithm_2), Pr(algorithm_1 equiv algorithm_2), Pr(algorithm_1 > algorithm_2)]
CDgraph plots the critical difference graph show in Janez Demsar’s 2006 work: * Statistical Comparisons of Classifiers over Multiple Data Sets. :param results: A 2-D array containing results from each algorithm. Each row of ‘results’ represents an algorithm, and each column a dataset. :param alpha: {0.1, 0.999}. Significance level for the critical difference. :param alg_names: Names of the tested algorithms.
Computes Nemenyi’s critical difference: * CD = q_alpha * sqrt(num_alg*(num_alg + 1)/(6*num_prob)) where q_alpha is the critical value, of the Studentized range statistic divided by sqrt(2). :param alpha: {0.1, 0.999}. Significance level. :param num_alg: number of tested algorithms. :param num_dataset: Number of problems/datasets where the algorithms have been tested.
Friedman Aligned Ranks post-hoc test.
data – An (n x 2) array or DataFrame contaning the results. In data, each column represents an algorithm and, and each row a problem.
control – optional int or string. Default None. Index or Name of the control algorithm. If control = None all FriedmanPosHocTest considers all possible comparisons among algorithms.
apv_procedure –
optional string. Default None. Name of the procedure for computing adjusted p-values. If apv_procedure is None, adjusted p-value are not computed, else the values are computed according to the specified procedure: For 1 vs all comparisons.
{‘Bonferroni’, ‘Holm’, ‘Hochberg’, ‘Holland’, ‘Finner’, ‘Li’}
{‘Shaffer’, ‘Holm’, ‘Nemenyi’}
Test statistic.
The p-value according to the Studentized range distribution.
Method of aligned ranks for the Friedman test.
..note:: Null Hypothesis: In a set of k (>=2) treaments (or tested algorithms), all the treatments are equivalent, so their average ranks should be equal.
data – An (n x 2) array or DataFrame contaning the results. In data, each column represents an algorithm and, and each row a problem.
The associated p-value.
Friedman’s aligned rank chi-square statistic.
Friedman post-hoc test.
data – An (n x 2) array or DataFrame contaning the results. In data, each column represents an algorithm and, and each row a problem.
control – optional int or string. Default None. Index or Name of the control algorithm. If control = None all FriedmanPosHocTest considers all possible comparisons among algorithms.
apv_procedure –
optional string. Default None. Name of the procedure for computing adjusted p-values. If apv_procedure is None, adjusted p-value are not computed, else the values are computed according to the specified procedure: For 1 vs all comparisons.
{‘Bonferroni’, ‘Holm’, ‘Hochberg’, ‘Holland’, ‘Finner’, ‘Li’}
{‘Shaffer’, ‘Holm’, ‘Nemenyi’}
Test statistic.
The p-value according to the Studentized range distribution.
Friedman ranking test.
..note:: Null Hypothesis: In a set of k (>=2) treaments (or tested algorithms), all the treatments are equivalent, so their average ranks should be equal.
data – An (n x 2) array or DataFrame contaning the results. In data, each column represents an algorithm and, and each row a problem.
The associated p-value.
Friedman’s chi-square.
Quade post-hoc test.
data – An (n x 2) array or DataFrame contaning the results. In data, each column represents an algorithm and, and each row a problem.
control – optional int or string. Default None. Index or Name of the control algorithm. If control = None all FriedmanPosHocTest considers all possible comparisons among algorithms.
apv_procedure –
optional string. Default None. Name of the procedure for computing adjusted p-values. If apv_procedure is None, adjusted p-value are not computed, else the values are computed according to the specified procedure: For 1 vs all comparisons.
{‘Bonferroni’, ‘Holm’, ‘Hochberg’, ‘Holland’, ‘Finner’, ‘Li’}
{‘Shaffer’, ‘Holm’, ‘Nemenyi’}
Test statistic.
The p-value according to the Studentized range distribution.
Quade test.
..note:: Null Hypothesis: In a set of k (>=2) treaments (or tested algorithms), all the treatments are equivalent, so their average ranks should be equal.
data – An (n x 2) array or DataFrame contaning the results. In data, each column represents an algorithm and, and each row a problem.
The associated p-value from the F-distribution.
Computed F-value.
Computes the rank of the elements in data.
data – 2-D matrix
descending – boolean (default False). If true, rank is sorted in descending order.
ranks, where ranks[i][j] == rank of the i-th row w.r.t the j-th column.
Given the results drawn from two algorithms/methods X and Y, the sign test analyses if there is a difference between X and Y.
Note
Null Hypothesis: Pr(X<Y)= 0.5
data – An (n x 2) array or DataFrame contaning the results. In data, each column represents an algorithm and, and each row a problem.
The associated p-value from the binomial distribution.
Number of successes.