Bayesian Performance Analysis for Algorithm Ranking Comparison

In the field of optimization and machine learning, the statistical assessment of results has played a key role in conducting algorithmic performance comparisons. Classically, null hypothesis statistical tests have been used. However, recently, alternatives based on Bayesian statistics have shown gre...

Full description

Saved in:
Bibliographic Details
Published inIEEE transactions on evolutionary computation Vol. 26; no. 6; p. 1
Main Authors Rojas-Delgado, Jairo, Ceberio, Josu, Calvo, Borja, Lozano, Jose A.
Format Journal Article
LanguageEnglish
Published New York IEEE 01.12.2022
The Institute of Electrical and Electronics Engineers, Inc. (IEEE)
Subjects
Online AccessGet full text
ISSN1089-778X
1941-0026
1941-0026
DOI10.1109/TEVC.2022.3208110

Cover

Abstract In the field of optimization and machine learning, the statistical assessment of results has played a key role in conducting algorithmic performance comparisons. Classically, null hypothesis statistical tests have been used. However, recently, alternatives based on Bayesian statistics have shown great potential in complex scenarios, especially when quantifying the uncertainty in the comparison. In this work, we delve deep into the Bayesian statistical assessment of experimental results by proposing a framework for the analysis of several algorithms on several problems/instances. To this end, experimental results are transformed to their corresponding rankings of algorithms, assuming that these rankings have been generated by a probability distribution (defined on permutation spaces). From the set of rankings, we estimate the posterior distribution of the parameters of the studied probability models, and several inferences concerning the analysis of the results are examined. Particularly, we study questions related to the probability of having one algorithm in the first position of the ranking or the probability that two algorithms are in the same relative position in the ranking. Not limited to that, the assumptions, strengths, and weaknesses of the models in each case are studied. To help other researchers to make use of this kind of analysis, we provide a Python package and source code implementation at.
AbstractList In the field of optimization and machine learning, the statistical assessment of results has played a key role in conducting algorithmic performance comparisons. Classically, null hypothesis statistical tests have been used. However, recently, alternatives based on Bayesian statistics have shown great potential in complex scenarios, especially when quantifying the uncertainty in the comparison. In this work, we delve deep into the Bayesian statistical assessment of experimental results by proposing a framework for the analysis of several algorithms on several problems/instances. To this end, experimental results are transformed to their corresponding rankings of algorithms, assuming that these rankings have been generated by a probability distribution (defined on permutation spaces). From the set of rankings, we estimate the posterior distribution of the parameters of the studied probability models, and several inferences concerning the analysis of the results are examined. Particularly, we study questions related to the probability of having one algorithm in the first position of the ranking or the probability that two algorithms are in the same relative position in the ranking. Not limited to that, the assumptions, strengths, and weaknesses of the models in each case are studied. To help other researchers to make use of this kind of analysis, we provide a Python package and source code implementation at https://zenodo.org/record/6320599 .
In the field of optimization and machine learning, the statistical assessment of results has played a key role in conducting algorithmic performance comparisons. Classically, null hypothesis statistical tests have been used. However, recently, alternatives based on Bayesian statistics have shown great potential in complex scenarios, especially when quantifying the uncertainty in the comparison. In this work, we delve deep into the Bayesian statistical assessment of experimental results by proposing a framework for the analysis of several algorithms on several problems/instances. To this end, experimental results are transformed to their corresponding rankings of algorithms, assuming that these rankings have been generated by a probability distribution (defined on permutation spaces). From the set of rankings, we estimate the posterior distribution of the parameters of the studied probability models, and several inferences concerning the analysis of the results are examined. Particularly, we study questions related to the probability of having one algorithm in the first position of the ranking or the probability that two algorithms are in the same relative position in the ranking. Not limited to that, the assumptions, strengths, and weaknesses of the models in each case are studied. To help other researchers to make use of this kind of analysis, we provide a Python package and source code implementation at.
Author Lozano, Jose A.
Calvo, Borja
Ceberio, Josu
Rojas-Delgado, Jairo
Author_xml – sequence: 1
  givenname: Jairo
  orcidid: 0000-0003-1017-703X
  surname: Rojas-Delgado
  fullname: Rojas-Delgado, Jairo
  organization: Basque Center for Applied Mathematics, Bilbao, Spain
– sequence: 2
  givenname: Josu
  orcidid: 0000-0001-7120-6338
  surname: Ceberio
  fullname: Ceberio, Josu
  organization: Intelligent Systems Group, University of the Basque Country UPV/EHU, Donostia, Spain
– sequence: 3
  givenname: Borja
  surname: Calvo
  fullname: Calvo, Borja
  organization: Intelligent Systems Group, University of the Basque Country UPV/EHU, Donostia, Spain
– sequence: 4
  givenname: Jose A.
  orcidid: 0000-0002-4683-8111
  surname: Lozano
  fullname: Lozano, Jose A.
  organization: Basque Center for Applied Mathematics, Bilbao, Spain
BookMark eNp9kF1LwzAUhoNMcJv-APGm4HVnvpqmN8Ic8wMGikzxrmRpOjPbpCad0n9vRocXE7w6h5fzvOecdwQGxhoFwDmCE4RgdrWcv84mGGI8IRjyIB2BIcooiiHEbBB6yLM4TfnbCRh5v4EQ0QRlQ3B9IzrltTDRk3KldbUwUkVTI6rOax8FJZpWa-t0-15Hz8J8aLOOZrZuhNPemlNwXIrKq7N9HYOX2_lydh8vHu8eZtNFLAlhbVysJCoooYqzAq5IWSYEcaJgmaFUYBaElJWKCUqhLDgsRbhacUpXqUwkgwkZA9z7bk0jum9RVXnjdC1clyOY7xLIW_Ul810C-T6BAF32UOPs51b5Nt_YrQuv-RynFKeQcEbDFOqnpLPeO1X-cd5le-icHjBSt6LV1rRO6Opf8qIntVLqd1PGM4ZIRn4ABaWGuQ
CODEN ITEVF5
CitedBy_id crossref_primary_10_1145_3628605
crossref_primary_10_1145_3665650
crossref_primary_10_3390_systems11080389
Cites_doi 10.1016/0022-2496(91)90050-4
10.1016/j.swevo.2020.100665
10.1016/0022-2496(77)90026-8
10.1007/s10654-016-0149-3
10.1145/3466624
10.1007/s11009-016-9506-7
10.1109/TEVC.2013.2260548
10.1145/3319619.3326888
10.1214/aos/1176349843
10.1080/00031305.2016.1154108
10.1093/biomet/39.3-4.324
10.1145/1553374.1553423
10.1016/j.eswa.2010.12.075
10.1214/16-AAP1202
10.1016/j.spl.2018.11.012
10.1016/j.ejor.2005.02.001
10.1016/j.eswa.2012.01.152
10.2307/2685478
10.1002/nav.3800010110
10.1053/j.seminhematol.2008.04.003
10.1109/TEVC.2021.3081167
10.1016/j.swevo.2020.100837
10.1016/0377-2217(93)90182-M
ContentType Journal Article
Copyright Copyright The Institute of Electrical and Electronics Engineers, Inc. (IEEE) 2022
Copyright_xml – notice: Copyright The Institute of Electrical and Electronics Engineers, Inc. (IEEE) 2022
DBID 97E
RIA
RIE
AAYXX
CITATION
7SC
7SP
8FD
JQ2
L7M
L~C
L~D
ADTOC
UNPAY
DOI 10.1109/TEVC.2022.3208110
DatabaseName IEEE All-Society Periodicals Package (ASPP) 2005–Present
IEEE All-Society Periodicals Package (ASPP) 1998–Present
IEEE Electronic Library (IEL)
CrossRef
Computer and Information Systems Abstracts
Electronics & Communications Abstracts
Technology Research Database
ProQuest Computer Science Collection
Advanced Technologies Database with Aerospace
Computer and Information Systems Abstracts – Academic
Computer and Information Systems Abstracts Professional
Unpaywall for CDI: Periodical Content
Unpaywall
DatabaseTitle CrossRef
Technology Research Database
Computer and Information Systems Abstracts – Academic
Electronics & Communications Abstracts
ProQuest Computer Science Collection
Computer and Information Systems Abstracts
Advanced Technologies Database with Aerospace
Computer and Information Systems Abstracts Professional
DatabaseTitleList Technology Research Database

Database_xml – sequence: 1
  dbid: RIE
  name: IEEE Xplore
  url: https://proxy.k.utb.cz/login?url=https://ieeexplore.ieee.org/
  sourceTypes: Publisher
– sequence: 2
  dbid: UNPAY
  name: Unpaywall
  url: https://proxy.k.utb.cz/login?url=https://unpaywall.org/
  sourceTypes: Open Access Repository
DeliveryMethod fulltext_linktorsrc
Discipline Engineering
Statistics
Computer Science
EISSN 1941-0026
EndPage 1
ExternalDocumentID oai:zenodo.org:10600222
10_1109_TEVC_2022_3208110
9896139
Genre orig-research
GroupedDBID -~X
.DC
0R~
29I
4.4
5GY
5VS
6IF
6IK
6IL
6IN
97E
AAJGR
AARMG
AASAJ
AAWTH
ABAZT
ABJNI
ABQJQ
ABVLG
ACGFO
ACGFS
ACIWK
ADZIZ
AENEX
AETIX
AGQYO
AGSQL
AHBIQ
AI.
AIBXA
AKJIK
AKQYR
ALLEH
ALMA_UNASSIGNED_HOLDINGS
ATWAV
BEFXN
BFFAM
BGNUA
BKEBE
BPEOZ
CHZPO
CS3
EBS
EJD
HZ~
H~9
IEGSK
IFIPE
IFJZH
IPLJI
JAVBF
LAI
M43
O9-
OCL
P2P
PQQKQ
RIA
RIE
RIL
RNS
TN5
VH1
AAYXX
CITATION
7SC
7SP
8FD
JQ2
L7M
L~C
L~D
ADTOC
UNPAY
ID FETCH-LOGICAL-c336t-dbc1d434e86d0b3ff53183e0f917a263ff76fe6a440cd80fa002e844b7c5c6053
IEDL.DBID RIE
ISSN 1089-778X
1941-0026
IngestDate Sun Oct 26 04:09:46 EDT 2025
Sun Jun 29 15:27:44 EDT 2025
Wed Oct 01 02:39:37 EDT 2025
Thu Apr 24 23:00:02 EDT 2025
Wed Aug 27 02:29:13 EDT 2025
IsDoiOpenAccess false
IsOpenAccess true
IsPeerReviewed true
IsScholarly true
Issue 6
Language English
License https://ieeexplore.ieee.org/Xplorehelp/downloads/license-information/IEEE.html
https://doi.org/10.15223/policy-029
https://doi.org/10.15223/policy-037
cc-by
LinkModel DirectLink
MergedId FETCHMERGED-LOGICAL-c336t-dbc1d434e86d0b3ff53183e0f917a263ff76fe6a440cd80fa002e844b7c5c6053
Notes ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 14
ORCID 0000-0001-7120-6338
0000-0002-4683-8111
0000-0003-1017-703X
OpenAccessLink https://proxy.k.utb.cz/login?url=https://doi.org/10.1109/TEVC.2022.3208110
PQID 2742703864
PQPubID 85418
PageCount 1
ParticipantIDs crossref_citationtrail_10_1109_TEVC_2022_3208110
unpaywall_primary_10_1109_tevc_2022_3208110
proquest_journals_2742703864
crossref_primary_10_1109_TEVC_2022_3208110
ieee_primary_9896139
ProviderPackageCode CITATION
AAYXX
PublicationCentury 2000
PublicationDate 2022-12-01
PublicationDateYYYYMMDD 2022-12-01
PublicationDate_xml – month: 12
  year: 2022
  text: 2022-12-01
  day: 01
PublicationDecade 2020
PublicationPlace New York
PublicationPlace_xml – name: New York
PublicationTitle IEEE transactions on evolutionary computation
PublicationTitleAbbrev TEVC
PublicationYear 2022
Publisher IEEE
The Institute of Electrical and Electronics Engineers, Inc. (IEEE)
Publisher_xml – name: IEEE
– name: The Institute of Electrical and Electronics Engineers, Inc. (IEEE)
References ref12
ghaderinezhad (ref13) 2019
ref15
ref14
ref11
ref10
busa-fekete (ref22) 2021
ref1
ref17
ref16
ref18
busa-fekete (ref21) 2021
vitelli (ref19) 2018; 18
collas (ref20) 2021; 139
ref24
ref23
ref26
ref25
ref28
ref27
ref29
ref8
ref7
benavoli (ref2) 2017; 18
ref9
ref4
ref3
ref6
ref5
References_xml – ident: ref10
  doi: 10.1016/0022-2496(91)90050-4
– ident: ref6
  doi: 10.1016/j.swevo.2020.100665
– ident: ref17
  doi: 10.1016/0022-2496(77)90026-8
– ident: ref5
  doi: 10.1007/s10654-016-0149-3
– ident: ref1
  doi: 10.1145/3466624
– start-page: 23179
  year: 2021
  ident: ref22
  article-title: Identity testing for mallows model
  publication-title: Proc Adv Neural Inf Process Syst
– ident: ref12
  doi: 10.1007/s11009-016-9506-7
– ident: ref24
  doi: 10.1109/TEVC.2013.2260548
– ident: ref8
  doi: 10.1145/3319619.3326888
– volume: 18
  start-page: 1
  year: 2017
  ident: ref2
  article-title: Time for a change: A tutorial for comparing multiple classifiers through Bayesian analysis
  publication-title: J Mach Learn Res
– ident: ref14
  doi: 10.1214/aos/1176349843
– ident: ref3
  doi: 10.1080/00031305.2016.1154108
– ident: ref11
  doi: 10.1093/biomet/39.3-4.324
– ident: ref18
  doi: 10.1145/1553374.1553423
– ident: ref28
  doi: 10.1016/j.eswa.2010.12.075
– ident: ref15
  doi: 10.1214/16-AAP1202
– start-page: 9480
  year: 2021
  ident: ref21
  article-title: Private and non-private uniformity testing for ranking data
  publication-title: Proc Adv Neural Inf Process Syst
– start-page: 22
  year: 2019
  ident: ref13
  article-title: On the impact of the choice of the prior in Bayesian statistics
  publication-title: Bayesian Inference on Complicated Data
– volume: 139
  start-page: 2079
  year: 2021
  ident: ref20
  article-title: Concentric mixtures of mallows models for top-k rankings: Sampling and identifiability
  publication-title: Proc 38th Int Conf Mach Learn
– ident: ref16
  doi: 10.1016/j.spl.2018.11.012
– ident: ref26
  doi: 10.1016/j.ejor.2005.02.001
– ident: ref29
  doi: 10.1016/j.eswa.2012.01.152
– ident: ref23
  doi: 10.2307/2685478
– ident: ref25
  doi: 10.1002/nav.3800010110
– ident: ref4
  doi: 10.1053/j.seminhematol.2008.04.003
– ident: ref9
  doi: 10.1109/TEVC.2021.3081167
– ident: ref7
  doi: 10.1016/j.swevo.2020.100837
– volume: 18
  start-page: 1
  year: 2018
  ident: ref19
  article-title: Probabilistic preference learning with the mallows rank model
  publication-title: J Mach Learn Res
– ident: ref27
  doi: 10.1016/0377-2217(93)90182-M
SSID ssj0014519
Score 2.476756
Snippet In the field of optimization and machine learning, the statistical assessment of results has played a key role in conducting algorithmic performance...
SourceID unpaywall
proquest
crossref
ieee
SourceType Open Access Repository
Aggregation Database
Enrichment Source
Index Database
Publisher
StartPage 1
SubjectTerms Algorithms
Bayes methods
Bayesian analysis
Bayesian inference
benchmarking
Data models
evolutionary algorithms
Evolutionary computation
Inference algorithms
Machine learning
Optimization
Performance assessment
Permutations
probabilistic models on permutation spaces
Ranking
Ratings & rankings
Sociology
Source code
Statistical analysis
Statistical tests
Statistics
Uncertainty
SummonAdditionalLinks – databaseName: Unpaywall
  dbid: UNPAY
  link: http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV3dS8MwED9ke1Af_BbnF3nwSemsTZo2L8IcigiKiJP5VPKp4txEO2X-9SZtNjcF0ceGNGnvI7nj7n4HsIOxMIobHhgV64AQowImQu7gtVkSy0jT1BUKn1_Q0xY5a8dtDxbtamHG4_cHIdu_Pr5pWi8uiuo4sreXK6aq0tia3RWoti4uG7dlAj2zVmLaLiLIxLrH1rHwEUy3Rq7f5MQaE3dQ0VRlwr6c7nef-eCddzpjV83JfJmk9VogFLoMk8d6Pxd1-fENv_FPf7EAc97gRI1SQhZhSneXYH7YzAF53V6C2TFkwmU4POID7eor0eVXZQEaIpggO4Ianbvey0N-_4SueNF_ATVHPQ1XoHVyfN08DXyrhUBiTPNACXmgCCY6pSoU2JjY6boOjfXmeETtQEKNppyQUKo0NNySWKeEiETG0npEeBUq3V5XrwFKsFQ84VLQ1NWx2vNBkYQZTAyWIlZJDcIh8TPpcchdO4xOVvgjIcsctTJHrcxTqwa7o1eeSxCO3yYvO46OJrKUWZOF1WBzyOHMq-lr5uLU9sizH1qDvRHXf-zhBGhij_V_zd6AGfdYJsFsQiV_6esta8rkYtsL8SfoG-el
  priority: 102
  providerName: Unpaywall
Title Bayesian Performance Analysis for Algorithm Ranking Comparison
URI https://ieeexplore.ieee.org/document/9896139
https://www.proquest.com/docview/2742703864
https://doi.org/10.1109/TEVC.2022.3208110
UnpaywallVersion submittedVersion
Volume 26
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
journalDatabaseRights – providerCode: PRVIEE
  databaseName: IEEE Xplore
  customDbUrl:
  eissn: 1941-0026
  dateEnd: 99991231
  omitProxy: false
  ssIdentifier: ssj0014519
  issn: 1089-778X
  databaseCode: RIE
  dateStart: 19970101
  isFulltext: true
  titleUrlDefault: https://ieeexplore.ieee.org/
  providerName: IEEE
link http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV1LS8QwEB58HNSDj1VxfZGDJ7Vrt0nT5iKsiyKCIuLKeip5qrjuinYV_fUmfbk-EG8lpCTtJJmZzHzfAGxhLIzihntGhdojxCiPCZ87em0WhTLQNHZA4dMzetwhJ92wOwa7FRZGa50ln-mGe8xi-Wogh-6qbI_FzGofNg7jUUxzrFYVMXA0KXkyPbMWY9wtIphNn-1dHl61rScYBA0cWA3owLIjOigrqvLFvpwa9h_52yvv9UZUzdEcnJaTzDNM7hvDVDTk-zf-xv9-xTzMFjYnauWLZAHGdL8Gc2U9B1Rs7xrMjJAT1mDa2aE5jfMi7B_wN-3wluj8E2mASkYTZFtQq3czeLpLbx_QBc_qMaB2VeNwCTpHh5ftY68oveBJjGnqKSGbimCiY6p8gY0J3d7XvrHeHQ-obYio0ZQT4ksV-4bbg1XHhIhIhtJ6SHgZJvqDvl4BFGGpeMSloLHDtdrzQpGIGUwMliJUUR38UhiJLHjJXXmMXpL5Jz5LnPwSJ7-kkF8dtqtXHnNSjr86LzoZVB2L31-H9VLiSbFtnxMXt7ZHoJ1oHXaqVfBjjFS_yC9jrP4-xhpMu1559ss6TKRPQ71hbZhUbGaLdxMmO2fnresPWp_uBg
linkProvider IEEE
linkToHtml http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV3JTsMwEB2xHIADSwFRVh84AWlD7Cy-IEEFKksRQgX1FnkFRGkRpCD4euxslEWIW2TZspOxPTOZeW8ANjHmWjLNHC195RCipUO5yyy9Ng194akgskDh1nnQvCInHb8zAjslFkYplSafqZp9TGP5si8G9ldZnUbUaB86CuM-IcTP0FplzMASpWTp9NTYjFEnj2HuurTePrxuGF_Q82rYMzrQwmWHtFBaVuWLhTkx6D2yt1fW7Q4pm6MZaBXLzHJM7muDhNfE-zcGx_--xyxM51Yn2s-2yRyMqF4FZoqKDig_4BWYGqInrMCktUQzIud52Dtgb8oiLtHFJ9YAFZwmyLSg_e5N_-kuuX1AlyytyIAaZZXDBbg6Omw3mk5efMERGAeJI7nYlQQTFQXS5Vhr355-5Wrj3zEvMA1hoFXACHGFjFzNzNWqIkJ4KHxhfCS8CGO9fk8tAQqxkCxkggeRRbaaG0OSkGpMNBbcl2EV3EIYsciZyW2BjG6ceiguja38Yiu_OJdfFbbKIY8ZLcdfneetDMqO-eevwmoh8Tg_uM-xjVybS9AstArb5S74MUeiXsSXOZZ_n2MDJprt1ll8dnx-ugKTdkSWC7MKY8nTQK0Ziybh6-lG_gB5IO-j
linkToUnpaywall http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV3dS8MwED9ke1Af_BbnF3nwSemsTZo2L8IcigiKiJP5VPKp4txEO2X-9SZtNjcF0ceGNGnvI7nj7n4HsIOxMIobHhgV64AQowImQu7gtVkSy0jT1BUKn1_Q0xY5a8dtDxbtamHG4_cHIdu_Pr5pWi8uiuo4sreXK6aq0tia3RWoti4uG7dlAj2zVmLaLiLIxLrH1rHwEUy3Rq7f5MQaE3dQ0VRlwr6c7nef-eCddzpjV83JfJmk9VogFLoMk8d6Pxd1-fENv_FPf7EAc97gRI1SQhZhSneXYH7YzAF53V6C2TFkwmU4POID7eor0eVXZQEaIpggO4Ianbvey0N-_4SueNF_ATVHPQ1XoHVyfN08DXyrhUBiTPNACXmgCCY6pSoU2JjY6boOjfXmeETtQEKNppyQUKo0NNySWKeEiETG0npEeBUq3V5XrwFKsFQ84VLQ1NWx2vNBkYQZTAyWIlZJDcIh8TPpcchdO4xOVvgjIcsctTJHrcxTqwa7o1eeSxCO3yYvO46OJrKUWZOF1WBzyOHMq-lr5uLU9sizH1qDvRHXf-zhBGhij_V_zd6AGfdYJsFsQiV_6esta8rkYtsL8SfoG-el
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Bayesian+Performance+Analysis+for+Algorithm+Ranking+Comparison&rft.jtitle=IEEE+transactions+on+evolutionary+computation&rft.au=Rojas-Delgado%2C+Jairo&rft.au=Ceberio%2C+Josu&rft.au=Calvo%2C+Borja&rft.au=Lozano%2C+Jose+A&rft.date=2022-12-01&rft.pub=The+Institute+of+Electrical+and+Electronics+Engineers%2C+Inc.+%28IEEE%29&rft.issn=1089-778X&rft.eissn=1941-0026&rft.volume=26&rft.issue=6&rft.spage=1281&rft_id=info:doi/10.1109%2FTEVC.2022.3208110&rft.externalDBID=NO_FULL_TEXT
thumbnail_l http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=1089-778X&client=summon
thumbnail_m http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=1089-778X&client=summon
thumbnail_s http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=1089-778X&client=summon