Two-Pass Softmax Algorithm

The softmax (also called softargmax) function is widely used in machine learning models to normalize real-valued scores into a probability distribution. To avoid floating-point overflow, the softmax function is conventionally implemented in three passes: the first pass to compute the normalization c...

Full description

Saved in:

Bibliographic Details
Published in	2020 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW) pp. 386 - 395
Main Authors	Dukhan, Marat, Ablavatski, Artsiom
Format	Conference Proceeding
Language	English
Published	IEEE 01.05.2020
Subjects	Approximation algorithms Bandwidth Computational modeling Machine learning Machine learning algorithms Probability distribution Program processors SIMD softargmax softmax
Online Access	Get full text
DOI	10.1109/IPDPSW50202.2020.00074

Cover

Abstract	The softmax (also called softargmax) function is widely used in machine learning models to normalize real-valued scores into a probability distribution. To avoid floating-point overflow, the softmax function is conventionally implemented in three passes: the first pass to compute the normalization constant, and two other passes to compute outputs from normalized inputs. We analyze two variants of the Three-Pass algorithm and demonstrate that in a well-optimized implementation on HPC-class processors performance of all three passes is limited by memory bandwidth.We then present a novel algorithm for softmax computation in just two passes. The proposed Two-Pass algorithm avoids both numerical overflow and the extra normalization pass by employing an exotic representation for intermediate values, where each value is represented as a pair of floating-point numbers: one representing the "mantissa" and another representing the "exponent".Performance evaluation demonstrates that on out-of-cache inputs on an Intel Skylake-X processor the new Two-Pass algorithm outperforms the traditional Three-Pass algorithm by up to 28% in AVX512 implementation, and by up to 18% in AVX2 implementation. The proposed Two-Pass algorithm also outperforms the traditional Three-Pass algorithm on Intel Broadwell and AMD Zen 2 processors.
AbstractList	The softmax (also called softargmax) function is widely used in machine learning models to normalize real-valued scores into a probability distribution. To avoid floating-point overflow, the softmax function is conventionally implemented in three passes: the first pass to compute the normalization constant, and two other passes to compute outputs from normalized inputs. We analyze two variants of the Three-Pass algorithm and demonstrate that in a well-optimized implementation on HPC-class processors performance of all three passes is limited by memory bandwidth.We then present a novel algorithm for softmax computation in just two passes. The proposed Two-Pass algorithm avoids both numerical overflow and the extra normalization pass by employing an exotic representation for intermediate values, where each value is represented as a pair of floating-point numbers: one representing the "mantissa" and another representing the "exponent".Performance evaluation demonstrates that on out-of-cache inputs on an Intel Skylake-X processor the new Two-Pass algorithm outperforms the traditional Three-Pass algorithm by up to 28% in AVX512 implementation, and by up to 18% in AVX2 implementation. The proposed Two-Pass algorithm also outperforms the traditional Three-Pass algorithm on Intel Broadwell and AMD Zen 2 processors.
Author	Dukhan, Marat Ablavatski, Artsiom
Author_xml	– sequence: 1 givenname: Marat surname: Dukhan fullname: Dukhan, Marat organization: Google Research,Mountain View,CA,USA – sequence: 2 givenname: Artsiom surname: Ablavatski fullname: Ablavatski, Artsiom organization: Google Research,Mountain View,CA,USA
BookMark	eNotjcFqwkAQQFeoB7X-QAviDySdmZ3dzR7FtlYQGtDSo-wmkzZgjCSBtn9foT083u29qbo5t2dRaoGQIoJ_2OaP-f7dAAGlVyAFAMcjNUVHGTpm4ybq7vDVJnno--W-rYYmfC9Xp4-2q4fP5laNq3DqZf7vmXp7fjqsX5Ld62a7Xu2SmkAPiRbgUrMvSopo2ekQLRYGwUiE8vopMmO9j6LZOgIvWSBhCz5WTJSVeqbu_7q1iBwvXd2E7ufo0YD2rH8BnHE3-A
ContentType	Conference Proceeding
DBID	6IE 6IL CBEJK RIE RIL
DOI	10.1109/IPDPSW50202.2020.00074
DatabaseName	IEEE Electronic Library (IEL) Conference Proceedings IEEE Xplore POP ALL IEEE Xplore All Conference Proceedings IEEE Xplore (NTUSG) IEEE Proceedings Order Plans (POP All) 1998-Present
DatabaseTitleList
Database_xml	– sequence: 1 dbid: RIE name: IEEE Electronic Library (IEL) url: https://proxy.k.utb.cz/login?url=https://ieeexplore.ieee.org/ sourceTypes: Publisher
DeliveryMethod	fulltext_linktorsrc
EISBN	1728174457 9781728174457
EndPage	395
ExternalDocumentID	9150394
Genre	orig-research
GroupedDBID	6IE 6IL CBEJK RIE RIL
ID	FETCH-LOGICAL-i203t-3e04d349cd2b16473ab61c5105eb0d174c85699be3467209e8a2e4609bf4228d3
IEDL.DBID	RIE
IngestDate	Mon Jul 08 05:38:32 EDT 2024
IsPeerReviewed	false
IsScholarly	false
Language	English
LinkModel	DirectLink
MergedId	FETCHMERGED-LOGICAL-i203t-3e04d349cd2b16473ab61c5105eb0d174c85699be3467209e8a2e4609bf4228d3
PageCount	10
ParticipantIDs	ieee_primary_9150394
PublicationCentury	2000
PublicationDate	2020-May
PublicationDateYYYYMMDD	2020-05-01
PublicationDate_xml	– month: 05 year: 2020 text: 2020-May
PublicationDecade	2020
PublicationTitle	2020 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW)
PublicationTitleAbbrev	IPDPSW
PublicationYear	2020
Publisher	IEEE
Publisher_xml	– name: IEEE
Score	1.7955717
Snippet	The softmax (also called softargmax) function is widely used in machine learning models to normalize real-valued scores into a probability distribution. To...
SourceID	ieee
SourceType	Publisher
StartPage	386
SubjectTerms	Approximation algorithms Bandwidth Computational modeling Machine learning Machine learning algorithms Probability distribution Program processors SIMD softargmax softmax
Title	Two-Pass Softmax Algorithm
URI	https://ieeexplore.ieee.org/document/9150394
hasFullText	1
inHoldings	1
isFullTextHit
isPrint
link	http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV3PS8MwFH5sO3lS2UTnD3rwaLc0adPmKOqYwqSwDXcbTfKqoltFOhT_el_aOUU8eAshkF-8vPeS7_sCcGq5CWRstS-tE9XWMSObi41PuYTME0qcbeTYyKNbOZyGN7No1oCzDRcGESvwGfZcsXrLt4VZuauyvqLoRaiwCc04kTVXa036DZjqX6eX6fguovjHEay4g2wxh-P78WtK5TQG2zD66q7Gijz1VqXumY9fSoz_Hc8OdL7peV66cTy70MBlG7qTt8JPKRT2xnSyLrJ37_z5vqDU_2HRgenganIx9NcfH_iPnInSF8hCK0JlLNdO70tkWgbGhUKomaUcwiSRVEqjoGOOM4VJxjGUTOncKXpZsQetZbHEffB4nGSC2mfUOhTMJIpMzuRMG64VRuIA2m5e85da22K-nlL37-pD2HIrWwP-jqBVvq7wmJxyqU-q3fgEapWLRQ
linkProvider	IEEE
linkToHtml	http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV1LTwIxEJ4gHvSkBoziaw8eXShtt7s9GpWAAtkEiNzI9oEYhTVmicZf73QX0RgP3pqmSV-Zzkz7fV8Bzg3VTREa5QvjRLVVSNDmQu1jLiGmESbOJnBs5F5ftEf8dhyMS3Cx5sJYa3Pwma27Yv6Wb1K9dFdlDYnRC5N8AzYDznlQsLVWtN8mkY1OfB0P7gOMgBzFijrQFnFIvh__puRuo7UDva8OC7TIU32Zqbr--KXF-N8R7UL1m6DnxWvXswclu6hAbfiW-jEGw94Az9Z58u5dPj-kmPzP5lUYtW6GV21_9fWB_0gJy3xmCTeMS22ocopfLFGiqV0wZBUxmEXoKBBSKsvwoKNE2iihlgsi1dRpehm2D-VFurAH4NEwShi2T7A1Z0RHEo1OT4nSVEkbsEOouHlNXgp1i8lqSrW_q89gqz3sdSfdTv_uCLbdKhfwv2MoZ69Le4IuOlOn-c58Au1XjpI
openUrl	ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Abook&rft.genre=proceeding&rft.title=2020+IEEE+International+Parallel+and+Distributed+Processing+Symposium+Workshops+%28IPDPSW%29&rft.atitle=Two-Pass+Softmax+Algorithm&rft.au=Dukhan%2C+Marat&rft.au=Ablavatski%2C+Artsiom&rft.date=2020-05-01&rft.pub=IEEE&rft.spage=386&rft.epage=395&rft_id=info:doi/10.1109%2FIPDPSW50202.2020.00074&rft.externalDocID=9150394