Transcoding unicode characters with AVX‐512 instructions

Intel includes in its recent processors a powerful set of instructions capable of processing 512‐bit registers with a single instruction (AVX‐512). Some of these instructions have no equivalent in earlier instruction sets. We leverage these instructions to efficiently transcode strings between the m...

Full description

Saved in:
Bibliographic Details
Published inSoftware, practice & experience Vol. 53; no. 12; pp. 2430 - 2462
Main Authors Clausecker, Robert, Lemire, Daniel
Format Journal Article
LanguageEnglish
Published Bognor Regis Wiley Subscription Services, Inc 01.12.2023
Subjects
Online AccessGet full text
ISSN0038-0644
1097-024X
1097-024X
DOI10.1002/spe.3261

Cover

Abstract Intel includes in its recent processors a powerful set of instructions capable of processing 512‐bit registers with a single instruction (AVX‐512). Some of these instructions have no equivalent in earlier instruction sets. We leverage these instructions to efficiently transcode strings between the most common formats: UTF‐8 and UTF‐16. With our novel algorithms, we are often twice as fast as the previous best solutions. For example, we transcode Chinese text from UTF‐8 to UTF‐16 at more than 5 GiB using fewer than 2 CPU instructions per character. To ensure reproducibility, we make our software freely available as an open‐source library. Our library is part of the popular Node.js JavaScript runtime.
AbstractList Intel includes in its recent processors a powerful set of instructions capable of processing 512‐bit registers with a single instruction (AVX‐512). Some of these instructions have no equivalent in earlier instruction sets. We leverage these instructions to efficiently transcode strings between the most common formats: UTF‐8 and UTF‐16. With our novel algorithms, we are often twice as fast as the previous best solutions. For example, we transcode Chinese text from UTF‐8 to UTF‐16 at more than 5 GiB using fewer than 2 CPU instructions per character. To ensure reproducibility, we make our software freely available as an open‐source library. Our library is part of the popular Node.js JavaScript runtime.
Intel includes in its recent processors a powerful set of instructions capable of processing 512‐bit registers with a single instruction (AVX‐512). Some of these instructions have no equivalent in earlier instruction sets. We leverage these instructions to efficiently transcode strings between the most common formats: UTF‐8 and UTF‐16. With our novel algorithms, we are often twice as fast as the previous best solutions. For example, we transcode Chinese text from UTF‐8 to UTF‐16 at more than 5 GiB s−1$$ {\mathrm{s}}^{-1} $$ using fewer than 2 CPU instructions per character. To ensure reproducibility, we make our software freely available as an open‐source library. Our library is part of the popular Node.js JavaScript runtime.
Author Clausecker, Robert
Lemire, Daniel
Author_xml – sequence: 1
  givenname: Robert
  surname: Clausecker
  fullname: Clausecker, Robert
  organization: Zuse Institute Berlin Germany
– sequence: 2
  givenname: Daniel
  surname: Lemire
  fullname: Lemire, Daniel
  organization: DOT‐Lab Research Center Université du Québec (TELUQ) Montréal Canada
BookMark eNp1j8tKw0AYhQepYFsFHyHgRhep_1wymbgrxRsU3FTpbvg7nbEpcRJnEkp3PoLP6JOYUreuzll8HM43IgNfe0vIJYUJBWC3sbETziQ9IUMKRZ4CE8sBGQJwlYIU4oyMYtwCUJoxOSR3i4A-mnpd-vek82XfbGI2GNC0NsRkV7abZPq2_Pn6zihLSh_b0Jm2rH08J6cOq2gv_nJMXh_uF7OndP7y-DybzlPDMmhTBaYwxuXWUmYNk1meM2MdOoEcUDm-ciA5rJA74Yq1AIEFz3InmaKZzZGPyc1xt_MN7ndYVboJ5QeGvaagD9K6l9YH6Z69OrJNqD87G1u9rbvg-3uaKdUDlCvZU9dHyoQ6xmDd_4O_cPFmSw
Cites_doi 10.1002/spe.3036
10.17487/rfc2781
10.1145/3524059.3532396
10.17487/rfc3629
10.1145/3297858.3304062
10.1145/1345206.1345222
10.1002/spe.2920
10.1145/3458336.3465293
ContentType Journal Article
Copyright 2023. This article is published under http://creativecommons.org/licenses/by/4.0/ (the “License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.
Copyright_xml – notice: 2023. This article is published under http://creativecommons.org/licenses/by/4.0/ (the “License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.
DBID AAYXX
CITATION
7SC
8FD
F28
FR3
JQ2
L7M
L~C
L~D
ADTOC
UNPAY
DOI 10.1002/spe.3261
DatabaseName CrossRef
Computer and Information Systems Abstracts
Technology Research Database
ANTE: Abstracts in New Technology & Engineering
Engineering Research Database
ProQuest Computer Science Collection
Advanced Technologies Database with Aerospace
Computer and Information Systems Abstracts – Academic
Computer and Information Systems Abstracts Professional
Unpaywall for CDI: Periodical Content
Unpaywall
DatabaseTitle CrossRef
Technology Research Database
Computer and Information Systems Abstracts – Academic
ProQuest Computer Science Collection
Computer and Information Systems Abstracts
Engineering Research Database
Advanced Technologies Database with Aerospace
ANTE: Abstracts in New Technology & Engineering
Computer and Information Systems Abstracts Professional
DatabaseTitleList CrossRef
Technology Research Database
Database_xml – sequence: 1
  dbid: UNPAY
  name: Unpaywall
  url: https://proxy.k.utb.cz/login?url=https://unpaywall.org/
  sourceTypes: Open Access Repository
DeliveryMethod fulltext_linktorsrc
Discipline Computer Science
EISSN 1097-024X
EndPage 2462
ExternalDocumentID 10.1002/spe.3261
10_1002_spe_3261
GroupedDBID -~X
.3N
.4S
.DC
.GA
.Y3
05W
0R~
10A
123
1L6
1OB
1OC
31~
33P
3EH
3R3
3SF
3WU
4.4
4ZD
50Y
50Z
51W
51X
52M
52N
52O
52P
52S
52T
52U
52W
52X
5VS
66C
702
7PT
8-0
8-1
8-3
8-4
8-5
85S
8UM
8WZ
930
9M8
A03
A6W
AAESR
AAEVG
AAHQN
AAMMB
AAMNL
AANHP
AANLZ
AAONW
AASGY
AAXRX
AAYCA
AAYXX
AAZKR
ABCQN
ABCUV
ABDPE
ABEFU
ABEML
ABIJN
ABLJU
ACAHQ
ACBWZ
ACCZN
ACFBH
ACGFS
ACIWK
ACNCT
ACPOU
ACRPL
ACSCC
ACXBN
ACXQS
ACYXJ
ADBBV
ADEOM
ADIZJ
ADKYN
ADMGS
ADMLS
ADMXK
ADNMO
ADOZA
ADXAS
ADZMN
AEFGJ
AEIGN
AEIMD
AENEX
AEUYR
AEYWJ
AFBPY
AFFPM
AFGKR
AFWVQ
AFZJQ
AGHNM
AGQPQ
AGXDD
AGYGG
AHBTC
AIDQK
AIDYY
AIQQE
AITYG
AIURR
AJXKR
ALAGY
ALMA_UNASSIGNED_HOLDINGS
ALUQN
ALVPJ
AMBMR
AMYDB
ARCSS
ASPBG
ATUGU
AUFTA
AVWKF
AZBYB
AZFZN
AZVAB
BAFTC
BDRZF
BFHJK
BHBCM
BMNLL
BNHUX
BROTX
BRXPI
BY8
CITATION
CS3
CWDTD
D-E
D-F
D0L
DCZOG
DPXWK
DR2
DRFUL
DRSTM
DU5
EBS
EJD
F00
F01
F04
FEDTE
G-S
G.N
GNP
GODZA
H.T
H.X
HBH
HF~
HGLYW
HHY
HVGLF
HZ~
IX1
J0M
JPC
KQQ
LATKE
LAW
LC2
LC3
LEEKS
LH4
LITHE
LOXES
LP6
LP7
LUTES
LW6
LYRES
M61
MEWTI
MK4
MRFUL
MRSTM
MSFUL
MSSTM
MXFUL
MXSTM
N04
N05
N9A
NF~
NNB
O66
O9-
OIG
P2P
P2W
P2X
P4D
PALCI
PQQKQ
PZZ
Q.N
Q11
QB0
QRW
R.K
RIWAO
RJQFR
ROL
RX1
RXW
RYL
S10
SAMSI
SUPJJ
TAE
TUS
TWZ
UB1
V2E
W8V
W99
WBKPD
WH7
WIB
WIH
WIK
WOHZO
WQJ
WXSBR
WYISQ
WZISG
XG1
XPP
XV2
YYP
ZCA
ZY4
ZZTAW
~02
~IA
~WT
7SC
8FD
F28
FR3
JQ2
L7M
L~C
L~D
ADTOC
UNPAY
ID FETCH-LOGICAL-c250t-80c9ccf7ee12ec265772cefaf4a30a8f3bf0630ba3f4f9d404a9357f62815e7a3
IEDL.DBID UNPAY
ISSN 0038-0644
1097-024X
IngestDate Tue Aug 19 23:45:44 EDT 2025
Fri Jul 25 12:12:29 EDT 2025
Wed Oct 01 03:27:33 EDT 2025
IsDoiOpenAccess true
IsOpenAccess true
IsPeerReviewed true
IsScholarly true
Issue 12
Language English
License cc-by
LinkModel DirectLink
MergedId FETCHMERGED-LOGICAL-c250t-80c9ccf7ee12ec265772cefaf4a30a8f3bf0630ba3f4f9d404a9357f62815e7a3
Notes ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 14
OpenAccessLink https://proxy.k.utb.cz/login?url=https://onlinelibrary.wiley.com/doi/pdfdirect/10.1002/spe.3261
PQID 2886131386
PQPubID 1046349
PageCount 33
ParticipantIDs unpaywall_primary_10_1002_spe_3261
proquest_journals_2886131386
crossref_primary_10_1002_spe_3261
ProviderPackageCode CITATION
AAYXX
PublicationCentury 2000
PublicationDate 2023-12-00
20231201
PublicationDateYYYYMMDD 2023-12-01
PublicationDate_xml – month: 12
  year: 2023
  text: 2023-12-00
PublicationDecade 2020
PublicationPlace Bognor Regis
PublicationPlace_xml – name: Bognor Regis
PublicationSubtitle Practice & Experience
PublicationTitle Software, practice & experience
PublicationYear 2023
Publisher Wiley Subscription Services, Inc
Publisher_xml – name: Wiley Subscription Services, Inc
References e_1_2_13_13_1
e_1_2_13_14_1
Intel (e_1_2_13_12_1) 2022
e_1_2_13_15_1
International Business Machines Corporation (e_1_2_13_4_1) 2022
e_1_2_13_16_1
e_1_2_13_10_1
e_1_2_13_20_1
e_1_2_13_11_1
e_1_2_13_8_1
e_1_2_13_7_1
Inoue H (e_1_2_13_9_1) 2008; 1
e_1_2_13_6_1
e_1_2_13_5_1
e_1_2_13_3_1
e_1_2_13_2_1
e_1_2_13_17_1
e_1_2_13_18_1
e_1_2_13_19_1
References_xml – ident: e_1_2_13_5_1
  doi: 10.1002/spe.3036
– ident: e_1_2_13_3_1
  doi: 10.1002/spe.3036
– ident: e_1_2_13_6_1
  doi: 10.17487/rfc2781
– ident: e_1_2_13_10_1
– ident: e_1_2_13_17_1
– ident: e_1_2_13_13_1
– ident: e_1_2_13_19_1
– volume-title: z/Architecture Principles of Operation
  year: 2022
  ident: e_1_2_13_4_1
– ident: e_1_2_13_16_1
  doi: 10.1145/3524059.3532396
– ident: e_1_2_13_7_1
  doi: 10.17487/rfc3629
– ident: e_1_2_13_11_1
– volume-title: 64 and IA‐32 Architectures Software Developer's Manual
  year: 2022
  ident: e_1_2_13_12_1
– ident: e_1_2_13_14_1
  doi: 10.1145/3297858.3304062
– ident: e_1_2_13_15_1
– volume: 1
  start-page: 1
  year: 2008
  ident: e_1_2_13_9_1
  article-title: Accelerating UTF‐8 decoding using SIMD instructions (in Japanese)
  publication-title: Inform Process Soc Jpn Trans Program
– ident: e_1_2_13_8_1
  doi: 10.1145/1345206.1345222
– ident: e_1_2_13_20_1
– ident: e_1_2_13_2_1
  doi: 10.1002/spe.2920
– ident: e_1_2_13_18_1
  doi: 10.1145/3458336.3465293
SSID ssj0011526
Score 2.359373
Snippet Intel includes in its recent processors a powerful set of instructions capable of processing 512‐bit registers with a single instruction (AVX‐512). Some of...
SourceID unpaywall
proquest
crossref
SourceType Open Access Repository
Aggregation Database
Index Database
StartPage 2430
SubjectTerms Algorithms
Libraries
Title Transcoding unicode characters with AVX‐512 instructions
URI https://www.proquest.com/docview/2886131386
https://onlinelibrary.wiley.com/doi/pdfdirect/10.1002/spe.3261
UnpaywallVersion publishedVersion
Volume 53
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
journalDatabaseRights – providerCode: PRVEBS
  databaseName: Inspec with Full Text
  customDbUrl:
  eissn: 1097-024X
  dateEnd: 20241102
  omitProxy: false
  ssIdentifier: ssj0011526
  issn: 1097-024X
  databaseCode: ADMLS
  dateStart: 20120701
  isFulltext: true
  titleUrlDefault: https://www.ebsco.com/products/research-databases/inspec-full-text
  providerName: EBSCOhost
– providerCode: PRVWIB
  databaseName: Wiley Online Library - Core collection (SURFmarket)
  issn: 1097-024X
  databaseCode: DR2
  dateStart: 19960101
  customDbUrl:
  isFulltext: true
  eissn: 1097-024X
  dateEnd: 99991231
  omitProxy: false
  ssIdentifier: ssj0011526
  providerName: Wiley-Blackwell
link http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwnV1dS8MwFA2yPfjk_MTJHFF87Wzz1da3IY4hOESc1KeSpgmIYyt0Q_TJn-Bv9Jd4049tCoL43pTm3pyck5vmBKEzCqyYeow4aeILh8G6x5EJ4NHX1DBOWWKKgv7NSAzH7DriUVVws2dhSn-IZcHNIqOYry3As9SU83y1u0_O80z3QH_A6qcpOGjxBmqOR7f9x9KLMXCAb4ttZddajhIW1e6za02_89FKZG4uppl8fZGTyRrfDFoorr-0_M3kubeYJz319sPE8f9d2UZblRTF_XLs7KANPd1FrfqaB1yhfg9dFHymZpblsD1MMks1VrXRc45tKRf3H6LP9w_gdfy08qTN99F4cHV_OXSqKxccBVpoDnylQqWMr7VHtCKCg_hW2kjDJHVlYCjkTlA3kZBJE6bMZTKk3DeCBB7XvqQHqDGdTfUhwlSGmivPEI-CSuNhKAMpEpOyEESfL1gbndSBj7PSWSMuPZRJDMGIbTDaqFNnJK6wlcckCECDeDQQbXS6zNKv7zj6y0Md1IDQ6GMQGPOkC9L6jnSrsfQFepDTwA
linkProvider Unpaywall
linkToUnpaywall http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwnV1dS8MwFA2yPfjk_MTJlCi-drb5auvbEMcQHD44qU8lTRMQR1fohuiTP8Hf6C_xph_bFATxvSnNvTk5JzfNCULnFFgx9Rhx0sQXDoN1jyMTwKOvqWGcssSUBf3bsRhN2E3Eo7rgZs_CVP4Qy4KbRUY5X1uA56mp5vl6d59cFLnug_6A1U9bcNDiLdSejO8Gj5UXY-AA35bbyq61HCUsatxn15p-56OVyNxcZLl8fZHT6RrfDDsobr60-s3kub-YJ3319sPE8f9d2UZbtRTFg2rs7KANne2iTnPNA65Rv4cuSz5TM8ty2B4mmaUaq8boucC2lIsHD9Hn-wfwOn5aedIW-2gyvL6_Gjn1lQuOAi00B75SoVLG19ojWhHBQXwrbaRhkroyMBRyJ6ibSMikCVPmMhlS7htBAo9rX9ID1MpmmT5EmMpQc-UZ4lFQaTwMZSBFYlIWgujzBeui0ybwcV45a8SVhzKJIRixDUYX9ZqMxDW2ipgEAWgQjwaii86WWfr1HUd_eaiHWhAafQwCY56c1KPoC4mx0tc
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Transcoding+unicode+characters+with+AVX%E2%80%90512+instructions&rft.jtitle=Software%2C+practice+%26+experience&rft.au=Clausecker%2C+Robert&rft.au=Lemire%2C+Daniel&rft.date=2023-12-01&rft.issn=0038-0644&rft.eissn=1097-024X&rft.volume=53&rft.issue=12&rft.spage=2430&rft.epage=2462&rft_id=info:doi/10.1002%2Fspe.3261&rft.externalDBID=n%2Fa&rft.externalDocID=10_1002_spe_3261
thumbnail_l http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=0038-0644&client=summon
thumbnail_m http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=0038-0644&client=summon
thumbnail_s http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=0038-0644&client=summon