Bayesian Performance Analysis for Algorithm Ranking Comparison

In the field of optimization and machine learning, the statistical assessment of results has played a key role in conducting algorithmic performance comparisons. Classically, null hypothesis statistical tests have been used. However, recently, alternatives based on Bayesian statistics have shown gre...

Full description

Saved in:

Bibliographic Details
Published in	IEEE transactions on evolutionary computation Vol. 26; no. 6; p. 1
Main Authors	Rojas-Delgado, Jairo, Ceberio, Josu, Calvo, Borja, Lozano, Jose A.
Format	Journal Article
Language	English
Published	New York IEEE 01.12.2022 The Institute of Electrical and Electronics Engineers, Inc. (IEEE)
Subjects	Algorithms Bayes methods Bayesian analysis Bayesian inference benchmarking Data models evolutionary algorithms Evolutionary computation Inference algorithms Machine learning Optimization Performance assessment Permutations probabilistic models on permutation spaces Ranking Ratings & rankings Sociology Source code Statistical analysis Statistical tests Statistics Uncertainty
Online Access	Get full text
ISSN	1089-778X 1941-0026 1941-0026
DOI	10.1109/TEVC.2022.3208110

Cover

More Information
Summary:	In the field of optimization and machine learning, the statistical assessment of results has played a key role in conducting algorithmic performance comparisons. Classically, null hypothesis statistical tests have been used. However, recently, alternatives based on Bayesian statistics have shown great potential in complex scenarios, especially when quantifying the uncertainty in the comparison. In this work, we delve deep into the Bayesian statistical assessment of experimental results by proposing a framework for the analysis of several algorithms on several problems/instances. To this end, experimental results are transformed to their corresponding rankings of algorithms, assuming that these rankings have been generated by a probability distribution (defined on permutation spaces). From the set of rankings, we estimate the posterior distribution of the parameters of the studied probability models, and several inferences concerning the analysis of the results are examined. Particularly, we study questions related to the probability of having one algorithm in the first position of the ranking or the probability that two algorithms are in the same relative position in the ranking. Not limited to that, the assumptions, strengths, and weaknesses of the models in each case are studied. To help other researchers to make use of this kind of analysis, we provide a Python package and source code implementation at.
Bibliography:	ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14
ISSN:	1089-778X 1941-0026 1941-0026
DOI:	10.1109/TEVC.2022.3208110