Bayesian Performance Analysis for Algorithm Ranking Comparison
In the field of optimization and machine learning, the statistical assessment of results has played a key role in conducting algorithmic performance comparisons. Classically, null hypothesis statistical tests have been used. However, recently, alternatives based on Bayesian statistics have shown gre...
Saved in:
| Published in | IEEE transactions on evolutionary computation Vol. 26; no. 6; p. 1 |
|---|---|
| Main Authors | , , , |
| Format | Journal Article |
| Language | English |
| Published |
New York
IEEE
01.12.2022
The Institute of Electrical and Electronics Engineers, Inc. (IEEE) |
| Subjects | |
| Online Access | Get full text |
| ISSN | 1089-778X 1941-0026 1941-0026 |
| DOI | 10.1109/TEVC.2022.3208110 |
Cover
| Abstract | In the field of optimization and machine learning, the statistical assessment of results has played a key role in conducting algorithmic performance comparisons. Classically, null hypothesis statistical tests have been used. However, recently, alternatives based on Bayesian statistics have shown great potential in complex scenarios, especially when quantifying the uncertainty in the comparison. In this work, we delve deep into the Bayesian statistical assessment of experimental results by proposing a framework for the analysis of several algorithms on several problems/instances. To this end, experimental results are transformed to their corresponding rankings of algorithms, assuming that these rankings have been generated by a probability distribution (defined on permutation spaces). From the set of rankings, we estimate the posterior distribution of the parameters of the studied probability models, and several inferences concerning the analysis of the results are examined. Particularly, we study questions related to the probability of having one algorithm in the first position of the ranking or the probability that two algorithms are in the same relative position in the ranking. Not limited to that, the assumptions, strengths, and weaknesses of the models in each case are studied. To help other researchers to make use of this kind of analysis, we provide a Python package and source code implementation at. |
|---|---|
| AbstractList | In the field of optimization and machine learning, the statistical assessment of results has played a key role in conducting algorithmic performance comparisons. Classically, null hypothesis statistical tests have been used. However, recently, alternatives based on Bayesian statistics have shown great potential in complex scenarios, especially when quantifying the uncertainty in the comparison. In this work, we delve deep into the Bayesian statistical assessment of experimental results by proposing a framework for the analysis of several algorithms on several problems/instances. To this end, experimental results are transformed to their corresponding rankings of algorithms, assuming that these rankings have been generated by a probability distribution (defined on permutation spaces). From the set of rankings, we estimate the posterior distribution of the parameters of the studied probability models, and several inferences concerning the analysis of the results are examined. Particularly, we study questions related to the probability of having one algorithm in the first position of the ranking or the probability that two algorithms are in the same relative position in the ranking. Not limited to that, the assumptions, strengths, and weaknesses of the models in each case are studied. To help other researchers to make use of this kind of analysis, we provide a Python package and source code implementation at https://zenodo.org/record/6320599 . In the field of optimization and machine learning, the statistical assessment of results has played a key role in conducting algorithmic performance comparisons. Classically, null hypothesis statistical tests have been used. However, recently, alternatives based on Bayesian statistics have shown great potential in complex scenarios, especially when quantifying the uncertainty in the comparison. In this work, we delve deep into the Bayesian statistical assessment of experimental results by proposing a framework for the analysis of several algorithms on several problems/instances. To this end, experimental results are transformed to their corresponding rankings of algorithms, assuming that these rankings have been generated by a probability distribution (defined on permutation spaces). From the set of rankings, we estimate the posterior distribution of the parameters of the studied probability models, and several inferences concerning the analysis of the results are examined. Particularly, we study questions related to the probability of having one algorithm in the first position of the ranking or the probability that two algorithms are in the same relative position in the ranking. Not limited to that, the assumptions, strengths, and weaknesses of the models in each case are studied. To help other researchers to make use of this kind of analysis, we provide a Python package and source code implementation at. |
| Author | Lozano, Jose A. Calvo, Borja Ceberio, Josu Rojas-Delgado, Jairo |
| Author_xml | – sequence: 1 givenname: Jairo orcidid: 0000-0003-1017-703X surname: Rojas-Delgado fullname: Rojas-Delgado, Jairo organization: Basque Center for Applied Mathematics, Bilbao, Spain – sequence: 2 givenname: Josu orcidid: 0000-0001-7120-6338 surname: Ceberio fullname: Ceberio, Josu organization: Intelligent Systems Group, University of the Basque Country UPV/EHU, Donostia, Spain – sequence: 3 givenname: Borja surname: Calvo fullname: Calvo, Borja organization: Intelligent Systems Group, University of the Basque Country UPV/EHU, Donostia, Spain – sequence: 4 givenname: Jose A. orcidid: 0000-0002-4683-8111 surname: Lozano fullname: Lozano, Jose A. organization: Basque Center for Applied Mathematics, Bilbao, Spain |
| BookMark | eNp9kF1LwzAUhoNMcJv-APGm4HVnvpqmN8Ic8wMGikzxrmRpOjPbpCad0n9vRocXE7w6h5fzvOecdwQGxhoFwDmCE4RgdrWcv84mGGI8IRjyIB2BIcooiiHEbBB6yLM4TfnbCRh5v4EQ0QRlQ3B9IzrltTDRk3KldbUwUkVTI6rOax8FJZpWa-t0-15Hz8J8aLOOZrZuhNPemlNwXIrKq7N9HYOX2_lydh8vHu8eZtNFLAlhbVysJCoooYqzAq5IWSYEcaJgmaFUYBaElJWKCUqhLDgsRbhacUpXqUwkgwkZA9z7bk0jum9RVXnjdC1clyOY7xLIW_Ul810C-T6BAF32UOPs51b5Nt_YrQuv-RynFKeQcEbDFOqnpLPeO1X-cd5le-icHjBSt6LV1rRO6Opf8qIntVLqd1PGM4ZIRn4ABaWGuQ |
| CODEN | ITEVF5 |
| CitedBy_id | crossref_primary_10_1145_3628605 crossref_primary_10_1145_3665650 crossref_primary_10_3390_systems11080389 |
| Cites_doi | 10.1016/0022-2496(91)90050-4 10.1016/j.swevo.2020.100665 10.1016/0022-2496(77)90026-8 10.1007/s10654-016-0149-3 10.1145/3466624 10.1007/s11009-016-9506-7 10.1109/TEVC.2013.2260548 10.1145/3319619.3326888 10.1214/aos/1176349843 10.1080/00031305.2016.1154108 10.1093/biomet/39.3-4.324 10.1145/1553374.1553423 10.1016/j.eswa.2010.12.075 10.1214/16-AAP1202 10.1016/j.spl.2018.11.012 10.1016/j.ejor.2005.02.001 10.1016/j.eswa.2012.01.152 10.2307/2685478 10.1002/nav.3800010110 10.1053/j.seminhematol.2008.04.003 10.1109/TEVC.2021.3081167 10.1016/j.swevo.2020.100837 10.1016/0377-2217(93)90182-M |
| ContentType | Journal Article |
| Copyright | Copyright The Institute of Electrical and Electronics Engineers, Inc. (IEEE) 2022 |
| Copyright_xml | – notice: Copyright The Institute of Electrical and Electronics Engineers, Inc. (IEEE) 2022 |
| DBID | 97E RIA RIE AAYXX CITATION 7SC 7SP 8FD JQ2 L7M L~C L~D ADTOC UNPAY |
| DOI | 10.1109/TEVC.2022.3208110 |
| DatabaseName | IEEE All-Society Periodicals Package (ASPP) 2005–Present IEEE All-Society Periodicals Package (ASPP) 1998–Present IEEE Electronic Library (IEL) CrossRef Computer and Information Systems Abstracts Electronics & Communications Abstracts Technology Research Database ProQuest Computer Science Collection Advanced Technologies Database with Aerospace Computer and Information Systems Abstracts Academic Computer and Information Systems Abstracts Professional Unpaywall for CDI: Periodical Content Unpaywall |
| DatabaseTitle | CrossRef Technology Research Database Computer and Information Systems Abstracts – Academic Electronics & Communications Abstracts ProQuest Computer Science Collection Computer and Information Systems Abstracts Advanced Technologies Database with Aerospace Computer and Information Systems Abstracts Professional |
| DatabaseTitleList | Technology Research Database |
| Database_xml | – sequence: 1 dbid: RIE name: IEEE Xplore url: https://proxy.k.utb.cz/login?url=https://ieeexplore.ieee.org/ sourceTypes: Publisher – sequence: 2 dbid: UNPAY name: Unpaywall url: https://proxy.k.utb.cz/login?url=https://unpaywall.org/ sourceTypes: Open Access Repository |
| DeliveryMethod | fulltext_linktorsrc |
| Discipline | Engineering Statistics Computer Science |
| EISSN | 1941-0026 |
| EndPage | 1 |
| ExternalDocumentID | oai:zenodo.org:10600222 10_1109_TEVC_2022_3208110 9896139 |
| Genre | orig-research |
| GroupedDBID | -~X .DC 0R~ 29I 4.4 5GY 5VS 6IF 6IK 6IL 6IN 97E AAJGR AARMG AASAJ AAWTH ABAZT ABJNI ABQJQ ABVLG ACGFO ACGFS ACIWK ADZIZ AENEX AETIX AGQYO AGSQL AHBIQ AI. AIBXA AKJIK AKQYR ALLEH ALMA_UNASSIGNED_HOLDINGS ATWAV BEFXN BFFAM BGNUA BKEBE BPEOZ CHZPO CS3 EBS EJD HZ~ H~9 IEGSK IFIPE IFJZH IPLJI JAVBF LAI M43 O9- OCL P2P PQQKQ RIA RIE RIL RNS TN5 VH1 AAYXX CITATION 7SC 7SP 8FD JQ2 L7M L~C L~D ADTOC UNPAY |
| ID | FETCH-LOGICAL-c336t-dbc1d434e86d0b3ff53183e0f917a263ff76fe6a440cd80fa002e844b7c5c6053 |
| IEDL.DBID | RIE |
| ISSN | 1089-778X 1941-0026 |
| IngestDate | Sun Oct 26 04:09:46 EDT 2025 Sun Jun 29 15:27:44 EDT 2025 Wed Oct 01 02:39:37 EDT 2025 Thu Apr 24 23:00:02 EDT 2025 Wed Aug 27 02:29:13 EDT 2025 |
| IsDoiOpenAccess | false |
| IsOpenAccess | true |
| IsPeerReviewed | true |
| IsScholarly | true |
| Issue | 6 |
| Language | English |
| License | https://ieeexplore.ieee.org/Xplorehelp/downloads/license-information/IEEE.html https://doi.org/10.15223/policy-029 https://doi.org/10.15223/policy-037 cc-by |
| LinkModel | DirectLink |
| MergedId | FETCHMERGED-LOGICAL-c336t-dbc1d434e86d0b3ff53183e0f917a263ff76fe6a440cd80fa002e844b7c5c6053 |
| Notes | ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14 |
| ORCID | 0000-0001-7120-6338 0000-0002-4683-8111 0000-0003-1017-703X |
| OpenAccessLink | https://proxy.k.utb.cz/login?url=https://doi.org/10.1109/TEVC.2022.3208110 |
| PQID | 2742703864 |
| PQPubID | 85418 |
| PageCount | 1 |
| ParticipantIDs | crossref_citationtrail_10_1109_TEVC_2022_3208110 unpaywall_primary_10_1109_tevc_2022_3208110 proquest_journals_2742703864 crossref_primary_10_1109_TEVC_2022_3208110 ieee_primary_9896139 |
| ProviderPackageCode | CITATION AAYXX |
| PublicationCentury | 2000 |
| PublicationDate | 2022-12-01 |
| PublicationDateYYYYMMDD | 2022-12-01 |
| PublicationDate_xml | – month: 12 year: 2022 text: 2022-12-01 day: 01 |
| PublicationDecade | 2020 |
| PublicationPlace | New York |
| PublicationPlace_xml | – name: New York |
| PublicationTitle | IEEE transactions on evolutionary computation |
| PublicationTitleAbbrev | TEVC |
| PublicationYear | 2022 |
| Publisher | IEEE The Institute of Electrical and Electronics Engineers, Inc. (IEEE) |
| Publisher_xml | – name: IEEE – name: The Institute of Electrical and Electronics Engineers, Inc. (IEEE) |
| References | ref12 ghaderinezhad (ref13) 2019 ref15 ref14 ref11 ref10 busa-fekete (ref22) 2021 ref1 ref17 ref16 ref18 busa-fekete (ref21) 2021 vitelli (ref19) 2018; 18 collas (ref20) 2021; 139 ref24 ref23 ref26 ref25 ref28 ref27 ref29 ref8 ref7 benavoli (ref2) 2017; 18 ref9 ref4 ref3 ref6 ref5 |
| References_xml | – ident: ref10 doi: 10.1016/0022-2496(91)90050-4 – ident: ref6 doi: 10.1016/j.swevo.2020.100665 – ident: ref17 doi: 10.1016/0022-2496(77)90026-8 – ident: ref5 doi: 10.1007/s10654-016-0149-3 – ident: ref1 doi: 10.1145/3466624 – start-page: 23179 year: 2021 ident: ref22 article-title: Identity testing for mallows model publication-title: Proc Adv Neural Inf Process Syst – ident: ref12 doi: 10.1007/s11009-016-9506-7 – ident: ref24 doi: 10.1109/TEVC.2013.2260548 – ident: ref8 doi: 10.1145/3319619.3326888 – volume: 18 start-page: 1 year: 2017 ident: ref2 article-title: Time for a change: A tutorial for comparing multiple classifiers through Bayesian analysis publication-title: J Mach Learn Res – ident: ref14 doi: 10.1214/aos/1176349843 – ident: ref3 doi: 10.1080/00031305.2016.1154108 – ident: ref11 doi: 10.1093/biomet/39.3-4.324 – ident: ref18 doi: 10.1145/1553374.1553423 – ident: ref28 doi: 10.1016/j.eswa.2010.12.075 – ident: ref15 doi: 10.1214/16-AAP1202 – start-page: 9480 year: 2021 ident: ref21 article-title: Private and non-private uniformity testing for ranking data publication-title: Proc Adv Neural Inf Process Syst – start-page: 22 year: 2019 ident: ref13 article-title: On the impact of the choice of the prior in Bayesian statistics publication-title: Bayesian Inference on Complicated Data – volume: 139 start-page: 2079 year: 2021 ident: ref20 article-title: Concentric mixtures of mallows models for top-k rankings: Sampling and identifiability publication-title: Proc 38th Int Conf Mach Learn – ident: ref16 doi: 10.1016/j.spl.2018.11.012 – ident: ref26 doi: 10.1016/j.ejor.2005.02.001 – ident: ref29 doi: 10.1016/j.eswa.2012.01.152 – ident: ref23 doi: 10.2307/2685478 – ident: ref25 doi: 10.1002/nav.3800010110 – ident: ref4 doi: 10.1053/j.seminhematol.2008.04.003 – ident: ref9 doi: 10.1109/TEVC.2021.3081167 – ident: ref7 doi: 10.1016/j.swevo.2020.100837 – volume: 18 start-page: 1 year: 2018 ident: ref19 article-title: Probabilistic preference learning with the mallows rank model publication-title: J Mach Learn Res – ident: ref27 doi: 10.1016/0377-2217(93)90182-M |
| SSID | ssj0014519 |
| Score | 2.476756 |
| Snippet | In the field of optimization and machine learning, the statistical assessment of results has played a key role in conducting algorithmic performance... |
| SourceID | unpaywall proquest crossref ieee |
| SourceType | Open Access Repository Aggregation Database Enrichment Source Index Database Publisher |
| StartPage | 1 |
| SubjectTerms | Algorithms Bayes methods Bayesian analysis Bayesian inference benchmarking Data models evolutionary algorithms Evolutionary computation Inference algorithms Machine learning Optimization Performance assessment Permutations probabilistic models on permutation spaces Ranking Ratings & rankings Sociology Source code Statistical analysis Statistical tests Statistics Uncertainty |
| SummonAdditionalLinks | – databaseName: Unpaywall dbid: UNPAY link: http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV3dS8MwED9ke1Af_BbnF3nwSemsTZo2L8IcigiKiJP5VPKp4txEO2X-9SZtNjcF0ceGNGnvI7nj7n4HsIOxMIobHhgV64AQowImQu7gtVkSy0jT1BUKn1_Q0xY5a8dtDxbtamHG4_cHIdu_Pr5pWi8uiuo4sreXK6aq0tia3RWoti4uG7dlAj2zVmLaLiLIxLrH1rHwEUy3Rq7f5MQaE3dQ0VRlwr6c7nef-eCddzpjV83JfJmk9VogFLoMk8d6Pxd1-fENv_FPf7EAc97gRI1SQhZhSneXYH7YzAF53V6C2TFkwmU4POID7eor0eVXZQEaIpggO4Ianbvey0N-_4SueNF_ATVHPQ1XoHVyfN08DXyrhUBiTPNACXmgCCY6pSoU2JjY6boOjfXmeETtQEKNppyQUKo0NNySWKeEiETG0npEeBUq3V5XrwFKsFQ84VLQ1NWx2vNBkYQZTAyWIlZJDcIh8TPpcchdO4xOVvgjIcsctTJHrcxTqwa7o1eeSxCO3yYvO46OJrKUWZOF1WBzyOHMq-lr5uLU9sizH1qDvRHXf-zhBGhij_V_zd6AGfdYJsFsQiV_6esta8rkYtsL8SfoG-el priority: 102 providerName: Unpaywall |
| Title | Bayesian Performance Analysis for Algorithm Ranking Comparison |
| URI | https://ieeexplore.ieee.org/document/9896139 https://www.proquest.com/docview/2742703864 https://doi.org/10.1109/TEVC.2022.3208110 |
| UnpaywallVersion | submittedVersion |
| Volume | 26 |
| hasFullText | 1 |
| inHoldings | 1 |
| isFullTextHit | |
| isPrint | |
| journalDatabaseRights | – providerCode: PRVIEE databaseName: IEEE Xplore customDbUrl: eissn: 1941-0026 dateEnd: 99991231 omitProxy: false ssIdentifier: ssj0014519 issn: 1089-778X databaseCode: RIE dateStart: 19970101 isFulltext: true titleUrlDefault: https://ieeexplore.ieee.org/ providerName: IEEE |
| link | http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV1LS8QwEB58HNSDj1VxfZGDJ7Vrt0nT5iKsiyKCIuLKeip5qrjuinYV_fUmfbk-EG8lpCTtJJmZzHzfAGxhLIzihntGhdojxCiPCZ87em0WhTLQNHZA4dMzetwhJ92wOwa7FRZGa50ln-mGe8xi-Wogh-6qbI_FzGofNg7jUUxzrFYVMXA0KXkyPbMWY9wtIphNn-1dHl61rScYBA0cWA3owLIjOigrqvLFvpwa9h_52yvv9UZUzdEcnJaTzDNM7hvDVDTk-zf-xv9-xTzMFjYnauWLZAHGdL8Gc2U9B1Rs7xrMjJAT1mDa2aE5jfMi7B_wN-3wluj8E2mASkYTZFtQq3czeLpLbx_QBc_qMaB2VeNwCTpHh5ftY68oveBJjGnqKSGbimCiY6p8gY0J3d7XvrHeHQ-obYio0ZQT4ksV-4bbg1XHhIhIhtJ6SHgZJvqDvl4BFGGpeMSloLHDtdrzQpGIGUwMliJUUR38UhiJLHjJXXmMXpL5Jz5LnPwSJ7-kkF8dtqtXHnNSjr86LzoZVB2L31-H9VLiSbFtnxMXt7ZHoJ1oHXaqVfBjjFS_yC9jrP4-xhpMu1559ss6TKRPQ71hbZhUbGaLdxMmO2fnresPWp_uBg |
| linkProvider | IEEE |
| linkToHtml | http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV3JTsMwEB2xHIADSwFRVh84AWlD7Cy-IEEFKksRQgX1FnkFRGkRpCD4euxslEWIW2TZspOxPTOZeW8ANjHmWjLNHC195RCipUO5yyy9Ng194akgskDh1nnQvCInHb8zAjslFkYplSafqZp9TGP5si8G9ldZnUbUaB86CuM-IcTP0FplzMASpWTp9NTYjFEnj2HuurTePrxuGF_Q82rYMzrQwmWHtFBaVuWLhTkx6D2yt1fW7Q4pm6MZaBXLzHJM7muDhNfE-zcGx_--xyxM51Yn2s-2yRyMqF4FZoqKDig_4BWYGqInrMCktUQzIud52Dtgb8oiLtHFJ9YAFZwmyLSg_e5N_-kuuX1AlyytyIAaZZXDBbg6Omw3mk5efMERGAeJI7nYlQQTFQXS5Vhr355-5Wrj3zEvMA1hoFXACHGFjFzNzNWqIkJ4KHxhfCS8CGO9fk8tAQqxkCxkggeRRbaaG0OSkGpMNBbcl2EV3EIYsciZyW2BjG6ceiguja38Yiu_OJdfFbbKIY8ZLcdfneetDMqO-eevwmoh8Tg_uM-xjVybS9AstArb5S74MUeiXsSXOZZ_n2MDJprt1ll8dnx-ugKTdkSWC7MKY8nTQK0Ziybh6-lG_gB5IO-j |
| linkToUnpaywall | http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV3dS8MwED9ke1Af_BbnF3nwSemsTZo2L8IcigiKiJP5VPKp4txEO2X-9SZtNjcF0ceGNGnvI7nj7n4HsIOxMIobHhgV64AQowImQu7gtVkSy0jT1BUKn1_Q0xY5a8dtDxbtamHG4_cHIdu_Pr5pWi8uiuo4sreXK6aq0tia3RWoti4uG7dlAj2zVmLaLiLIxLrH1rHwEUy3Rq7f5MQaE3dQ0VRlwr6c7nef-eCddzpjV83JfJmk9VogFLoMk8d6Pxd1-fENv_FPf7EAc97gRI1SQhZhSneXYH7YzAF53V6C2TFkwmU4POID7eor0eVXZQEaIpggO4Ianbvey0N-_4SueNF_ATVHPQ1XoHVyfN08DXyrhUBiTPNACXmgCCY6pSoU2JjY6boOjfXmeETtQEKNppyQUKo0NNySWKeEiETG0npEeBUq3V5XrwFKsFQ84VLQ1NWx2vNBkYQZTAyWIlZJDcIh8TPpcchdO4xOVvgjIcsctTJHrcxTqwa7o1eeSxCO3yYvO46OJrKUWZOF1WBzyOHMq-lr5uLU9sizH1qDvRHXf-zhBGhij_V_zd6AGfdYJsFsQiV_6esta8rkYtsL8SfoG-el |
| openUrl | ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Bayesian+Performance+Analysis+for+Algorithm+Ranking+Comparison&rft.jtitle=IEEE+transactions+on+evolutionary+computation&rft.au=Rojas-Delgado%2C+Jairo&rft.au=Ceberio%2C+Josu&rft.au=Calvo%2C+Borja&rft.au=Lozano%2C+Jose+A&rft.date=2022-12-01&rft.pub=The+Institute+of+Electrical+and+Electronics+Engineers%2C+Inc.+%28IEEE%29&rft.issn=1089-778X&rft.eissn=1941-0026&rft.volume=26&rft.issue=6&rft.spage=1281&rft_id=info:doi/10.1109%2FTEVC.2022.3208110&rft.externalDBID=NO_FULL_TEXT |
| thumbnail_l | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=1089-778X&client=summon |
| thumbnail_m | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=1089-778X&client=summon |
| thumbnail_s | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=1089-778X&client=summon |