Two-Pass Softmax Algorithm
The softmax (also called softargmax) function is widely used in machine learning models to normalize real-valued scores into a probability distribution. To avoid floating-point overflow, the softmax function is conventionally implemented in three passes: the first pass to compute the normalization c...
Saved in:
| Published in | 2020 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW) pp. 386 - 395 |
|---|---|
| Main Authors | , |
| Format | Conference Proceeding |
| Language | English |
| Published |
IEEE
01.05.2020
|
| Subjects | |
| Online Access | Get full text |
| DOI | 10.1109/IPDPSW50202.2020.00074 |
Cover
| Abstract | The softmax (also called softargmax) function is widely used in machine learning models to normalize real-valued scores into a probability distribution. To avoid floating-point overflow, the softmax function is conventionally implemented in three passes: the first pass to compute the normalization constant, and two other passes to compute outputs from normalized inputs. We analyze two variants of the Three-Pass algorithm and demonstrate that in a well-optimized implementation on HPC-class processors performance of all three passes is limited by memory bandwidth.We then present a novel algorithm for softmax computation in just two passes. The proposed Two-Pass algorithm avoids both numerical overflow and the extra normalization pass by employing an exotic representation for intermediate values, where each value is represented as a pair of floating-point numbers: one representing the "mantissa" and another representing the "exponent".Performance evaluation demonstrates that on out-of-cache inputs on an Intel Skylake-X processor the new Two-Pass algorithm outperforms the traditional Three-Pass algorithm by up to 28% in AVX512 implementation, and by up to 18% in AVX2 implementation. The proposed Two-Pass algorithm also outperforms the traditional Three-Pass algorithm on Intel Broadwell and AMD Zen 2 processors. |
|---|---|
| AbstractList | The softmax (also called softargmax) function is widely used in machine learning models to normalize real-valued scores into a probability distribution. To avoid floating-point overflow, the softmax function is conventionally implemented in three passes: the first pass to compute the normalization constant, and two other passes to compute outputs from normalized inputs. We analyze two variants of the Three-Pass algorithm and demonstrate that in a well-optimized implementation on HPC-class processors performance of all three passes is limited by memory bandwidth.We then present a novel algorithm for softmax computation in just two passes. The proposed Two-Pass algorithm avoids both numerical overflow and the extra normalization pass by employing an exotic representation for intermediate values, where each value is represented as a pair of floating-point numbers: one representing the "mantissa" and another representing the "exponent".Performance evaluation demonstrates that on out-of-cache inputs on an Intel Skylake-X processor the new Two-Pass algorithm outperforms the traditional Three-Pass algorithm by up to 28% in AVX512 implementation, and by up to 18% in AVX2 implementation. The proposed Two-Pass algorithm also outperforms the traditional Three-Pass algorithm on Intel Broadwell and AMD Zen 2 processors. |
| Author | Dukhan, Marat Ablavatski, Artsiom |
| Author_xml | – sequence: 1 givenname: Marat surname: Dukhan fullname: Dukhan, Marat organization: Google Research,Mountain View,CA,USA – sequence: 2 givenname: Artsiom surname: Ablavatski fullname: Ablavatski, Artsiom organization: Google Research,Mountain View,CA,USA |
| BookMark | eNotjcFqwkAQQFeoB7X-QAviDySdmZ3dzR7FtlYQGtDSo-wmkzZgjCSBtn9foT083u29qbo5t2dRaoGQIoJ_2OaP-f7dAAGlVyAFAMcjNUVHGTpm4ybq7vDVJnno--W-rYYmfC9Xp4-2q4fP5laNq3DqZf7vmXp7fjqsX5Ld62a7Xu2SmkAPiRbgUrMvSopo2ekQLRYGwUiE8vopMmO9j6LZOgIvWSBhCz5WTJSVeqbu_7q1iBwvXd2E7ufo0YD2rH8BnHE3-A |
| ContentType | Conference Proceeding |
| DBID | 6IE 6IL CBEJK RIE RIL |
| DOI | 10.1109/IPDPSW50202.2020.00074 |
| DatabaseName | IEEE Electronic Library (IEL) Conference Proceedings IEEE Xplore POP ALL IEEE Xplore All Conference Proceedings IEEE Xplore (NTUSG) IEEE Proceedings Order Plans (POP All) 1998-Present |
| DatabaseTitleList | |
| Database_xml | – sequence: 1 dbid: RIE name: IEEE Electronic Library (IEL) url: https://proxy.k.utb.cz/login?url=https://ieeexplore.ieee.org/ sourceTypes: Publisher |
| DeliveryMethod | fulltext_linktorsrc |
| EISBN | 1728174457 9781728174457 |
| EndPage | 395 |
| ExternalDocumentID | 9150394 |
| Genre | orig-research |
| GroupedDBID | 6IE 6IL CBEJK RIE RIL |
| ID | FETCH-LOGICAL-i203t-3e04d349cd2b16473ab61c5105eb0d174c85699be3467209e8a2e4609bf4228d3 |
| IEDL.DBID | RIE |
| IngestDate | Mon Jul 08 05:38:32 EDT 2024 |
| IsPeerReviewed | false |
| IsScholarly | false |
| Language | English |
| LinkModel | DirectLink |
| MergedId | FETCHMERGED-LOGICAL-i203t-3e04d349cd2b16473ab61c5105eb0d174c85699be3467209e8a2e4609bf4228d3 |
| PageCount | 10 |
| ParticipantIDs | ieee_primary_9150394 |
| PublicationCentury | 2000 |
| PublicationDate | 2020-May |
| PublicationDateYYYYMMDD | 2020-05-01 |
| PublicationDate_xml | – month: 05 year: 2020 text: 2020-May |
| PublicationDecade | 2020 |
| PublicationTitle | 2020 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW) |
| PublicationTitleAbbrev | IPDPSW |
| PublicationYear | 2020 |
| Publisher | IEEE |
| Publisher_xml | – name: IEEE |
| Score | 1.7955717 |
| Snippet | The softmax (also called softargmax) function is widely used in machine learning models to normalize real-valued scores into a probability distribution. To... |
| SourceID | ieee |
| SourceType | Publisher |
| StartPage | 386 |
| SubjectTerms | Approximation algorithms Bandwidth Computational modeling Machine learning Machine learning algorithms Probability distribution Program processors SIMD softargmax softmax |
| Title | Two-Pass Softmax Algorithm |
| URI | https://ieeexplore.ieee.org/document/9150394 |
| hasFullText | 1 |
| inHoldings | 1 |
| isFullTextHit | |
| isPrint | |
| link | http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV3PS8MwFH5sO3lS2UTnD3rwaLc0adPmKOqYwqSwDXcbTfKqoltFOhT_el_aOUU8eAshkF-8vPeS7_sCcGq5CWRstS-tE9XWMSObi41PuYTME0qcbeTYyKNbOZyGN7No1oCzDRcGESvwGfZcsXrLt4VZuauyvqLoRaiwCc04kTVXa036DZjqX6eX6fguovjHEay4g2wxh-P78WtK5TQG2zD66q7Gijz1VqXumY9fSoz_Hc8OdL7peV66cTy70MBlG7qTt8JPKRT2xnSyLrJ37_z5vqDU_2HRgenganIx9NcfH_iPnInSF8hCK0JlLNdO70tkWgbGhUKomaUcwiSRVEqjoGOOM4VJxjGUTOncKXpZsQetZbHEffB4nGSC2mfUOhTMJIpMzuRMG64VRuIA2m5e85da22K-nlL37-pD2HIrWwP-jqBVvq7wmJxyqU-q3fgEapWLRQ |
| linkProvider | IEEE |
| linkToHtml | http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV1LTwIxEJ4gHvSkBoziaw8eXShtt7s9GpWAAtkEiNzI9oEYhTVmicZf73QX0RgP3pqmSV-Zzkz7fV8Bzg3VTREa5QvjRLVVSNDmQu1jLiGmESbOJnBs5F5ftEf8dhyMS3Cx5sJYa3Pwma27Yv6Wb1K9dFdlDYnRC5N8AzYDznlQsLVWtN8mkY1OfB0P7gOMgBzFijrQFnFIvh__puRuo7UDva8OC7TIU32Zqbr--KXF-N8R7UL1m6DnxWvXswclu6hAbfiW-jEGw94Az9Z58u5dPj-kmPzP5lUYtW6GV21_9fWB_0gJy3xmCTeMS22ocopfLFGiqV0wZBUxmEXoKBBSKsvwoKNE2iihlgsi1dRpehm2D-VFurAH4NEwShi2T7A1Z0RHEo1OT4nSVEkbsEOouHlNXgp1i8lqSrW_q89gqz3sdSfdTv_uCLbdKhfwv2MoZ69Le4IuOlOn-c58Au1XjpI |
| openUrl | ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Abook&rft.genre=proceeding&rft.title=2020+IEEE+International+Parallel+and+Distributed+Processing+Symposium+Workshops+%28IPDPSW%29&rft.atitle=Two-Pass+Softmax+Algorithm&rft.au=Dukhan%2C+Marat&rft.au=Ablavatski%2C+Artsiom&rft.date=2020-05-01&rft.pub=IEEE&rft.spage=386&rft.epage=395&rft_id=info:doi/10.1109%2FIPDPSW50202.2020.00074&rft.externalDocID=9150394 |