TERSE/PROLIX ( TRPX ) – a new algorithm for fast and lossless compression and decompression of diffraction and cryo-EM data

High-throughput data collection in crystallography poses significant challenges in handling massive amounts of data. Here, TERSE/PROLIX (or TRPX for short) is presented, a novel lossless compression algorithm specifically designed for diffraction data. The algorithm is compared with established loss...

Full description

Saved in:
Bibliographic Details
Published inActa crystallographica. Section A, Foundations and advances Vol. 79; no. 6; pp. 536 - 541
Main Authors Matinyan, Senik, Abrahams, Jan Pieter
Format Journal Article
LanguageEnglish
Published International Union of Crystallography 01.11.2023
Subjects
Online AccessGet full text
ISSN2053-2733
2053-2733
DOI10.1107/S205327332300760X

Cover

Abstract High-throughput data collection in crystallography poses significant challenges in handling massive amounts of data. Here, TERSE/PROLIX (or TRPX for short) is presented, a novel lossless compression algorithm specifically designed for diffraction data. The algorithm is compared with established lossless compression algorithms implemented in gzip , bzip2 , CBF (crystallographic binary file), Zstandard ( zstd ), LZ4 and HDF5 with gzip , LZF and bitshuffle + LZ4 filters, in terms of compression efficiency and speed, using continuous-rotation electron diffraction data of an inorganic compound and raw cryo-EM data. The results show that TRPX significantly outperforms all these algorithms in terms of speed and compression rate. It was 60 times faster than bzip2 (which achieved a similar compression rate), and more than 3 times faster than LZ4 , which was the runner-up in terms of speed, but had a much worse compression rate. TRPX files are byte-order independent and upon compilation the algorithm occupies very little memory. It can therefore be readily implemented in hardware. By providing a tailored solution for diffraction and raw cryo-EM data, TRPX facilitates more efficient data analysis and interpretation while mitigating storage and transmission concerns. The C++20 compression/decompression code, custom TIFF library and an ImageJ / Fiji Java plugin for reading TRPX files are open-sourced on GitHub under the permissive MIT license.
AbstractList High-throughput data collection in crystallography poses significant challenges in handling massive amounts of data. Here, TERSE/PROLIX (or TRPX for short) is presented, a novel lossless compression algorithm specifically designed for diffraction data. The algorithm is compared with established lossless compression algorithms implemented in gzip , bzip2 , CBF (crystallographic binary file), Zstandard ( zstd ), LZ4 and HDF5 with gzip , LZF and bitshuffle + LZ4 filters, in terms of compression efficiency and speed, using continuous-rotation electron diffraction data of an inorganic compound and raw cryo-EM data. The results show that TRPX significantly outperforms all these algorithms in terms of speed and compression rate. It was 60 times faster than bzip2 (which achieved a similar compression rate), and more than 3 times faster than LZ4 , which was the runner-up in terms of speed, but had a much worse compression rate. TRPX files are byte-order independent and upon compilation the algorithm occupies very little memory. It can therefore be readily implemented in hardware. By providing a tailored solution for diffraction and raw cryo-EM data, TRPX facilitates more efficient data analysis and interpretation while mitigating storage and transmission concerns. The C++20 compression/decompression code, custom TIFF library and an ImageJ / Fiji Java plugin for reading TRPX files are open-sourced on GitHub under the permissive MIT license.
This article presents a fast and lossless algorithm for compressing diffraction data, achieving up to 85% reduction in file size while processing up to 2000 512 × 512 frames s−1. This breakthrough in compression technology is a significant step towards more efficient analysis and storage of large diffraction data sets. High-throughput data collection in crystallography poses significant challenges in handling massive amounts of data. Here, TERSE/PROLIX (or TRPX for short) is presented, a novel lossless compression algorithm specifically designed for diffraction data. The algorithm is compared with established lossless compression algorithms implemented in gzip, bzip2, CBF (crystallographic binary file), Zstandard(zstd), LZ4 and HDF5 with gzip, LZF and bitshuffle+LZ4 filters, in terms of compression efficiency and speed, using continuous-rotation electron diffraction data of an inorganic compound and raw cryo-EM data. The results show that TRPX significantly outperforms all these algorithms in terms of speed and compression rate. It was 60 times faster than bzip2 (which achieved a similar compression rate), and more than 3 times faster than LZ4, which was the runner-up in terms of speed, but had a much worse compression rate. TRPX files are byte-order independent and upon compilation the algorithm occupies very little memory. It can therefore be readily implemented in hardware. By providing a tailored solution for diffraction and raw cryo-EM data, TRPX facilitates more efficient data analysis and interpretation while mitigating storage and transmission concerns. The C++20 compression/decompression code, custom TIFF library and an ImageJ/Fiji Java plugin for reading TRPX files are open-sourced on GitHub under the permissive MIT license.
High-throughput data collection in crystallography poses significant challenges in handling massive amounts of data. Here, TERSE/PROLIX (or TRPX for short) is presented, a novel lossless compression algorithm specifically designed for diffraction data. The algorithm is compared with established lossless compression algorithms implemented in gzip, bzip2, CBF (crystallographic binary file), Zstandard(zstd), LZ4 and HDF5 with gzip, LZF and bitshuffle+LZ4 filters, in terms of compression efficiency and speed, using continuous-rotation electron diffraction data of an inorganic compound and raw cryo-EM data. The results show that TRPX significantly outperforms all these algorithms in terms of speed and compression rate. It was 60 times faster than bzip2 (which achieved a similar compression rate), and more than 3 times faster than LZ4, which was the runner-up in terms of speed, but had a much worse compression rate. TRPX files are byte-order independent and upon compilation the algorithm occupies very little memory. It can therefore be readily implemented in hardware. By providing a tailored solution for diffraction and raw cryo-EM data, TRPX facilitates more efficient data analysis and interpretation while mitigating storage and transmission concerns. The C++20 compression/decompression code, custom TIFF library and an ImageJ/Fiji Java plugin for reading TRPX files are open-sourced on GitHub under the permissive MIT license.High-throughput data collection in crystallography poses significant challenges in handling massive amounts of data. Here, TERSE/PROLIX (or TRPX for short) is presented, a novel lossless compression algorithm specifically designed for diffraction data. The algorithm is compared with established lossless compression algorithms implemented in gzip, bzip2, CBF (crystallographic binary file), Zstandard(zstd), LZ4 and HDF5 with gzip, LZF and bitshuffle+LZ4 filters, in terms of compression efficiency and speed, using continuous-rotation electron diffraction data of an inorganic compound and raw cryo-EM data. The results show that TRPX significantly outperforms all these algorithms in terms of speed and compression rate. It was 60 times faster than bzip2 (which achieved a similar compression rate), and more than 3 times faster than LZ4, which was the runner-up in terms of speed, but had a much worse compression rate. TRPX files are byte-order independent and upon compilation the algorithm occupies very little memory. It can therefore be readily implemented in hardware. By providing a tailored solution for diffraction and raw cryo-EM data, TRPX facilitates more efficient data analysis and interpretation while mitigating storage and transmission concerns. The C++20 compression/decompression code, custom TIFF library and an ImageJ/Fiji Java plugin for reading TRPX files are open-sourced on GitHub under the permissive MIT license.
Author Matinyan, Senik
Abrahams, Jan Pieter
Author_xml – sequence: 1
  givenname: Senik
  surname: Matinyan
  fullname: Matinyan, Senik
– sequence: 2
  givenname: Jan Pieter
  surname: Abrahams
  fullname: Abrahams, Jan Pieter
BookMark eNplUclOwzAQtRCI_QO4-QiHgJfESU4IobJIRaBSpN6iiRcISuxip5QekPgH_pAvIaUgttOM3qbRmw20bJ3VCO1Qsk8pSQ-uGUk4SzlnnJBUkNESWp9D0Rxb_rGvoe0Q7gkhnS1hgqyiNZ6mMc_ifB09D3uD697B1eCyfz7Cu3g4uBrhPfz28ooBWz3FUN86X7V3DTbOYwOhxWAVrl0ItQ4BS9eMfbdUzn4QSv9EnMGqMsaDbL8E0s9c1LvAClrYQisG6qC3P-cmujnpDY_Pov7l6fnxUT-SPEnbSBCe8JyqBATPoMxKZjTPIYUYTJZAVuoyz0tRUpAyIzSmuWIqzhVXnIiSUL6J2CJ3Yscwm0JdF2NfNeBnBSXFvM4i_KnzqTMdLkzjSdloJbVtPXwbHVTFb8ZWd8Wte-wiBRMi4V3C7meCdw8THdqiqYLUdQ1Wu0koWCYyIXISz6V0IZW-a9Zr8-_Af__m78y2nCg
Cites_doi 10.1016/j.ascom.2015.07.002
10.1107/S1600577517013522
10.1109/PROC.1967.5493
10.1093/mictod/qaad005
10.1017/S1431927615015664
10.1515/aot-2017-0053
10.1016/j.jsb.2006.05.009
10.1557/mrs.2016.93
10.1107/S1600577518000607
10.1016/j.ultramic.2021.113298
10.1107/S0907444997007257
10.1109/JPROC.2011.2155130
ContentType Journal Article
Copyright open access.
Matinyan and Abrahams 2023 2023
Copyright_xml – notice: open access.
– notice: Matinyan and Abrahams 2023 2023
DBID AAYXX
CITATION
7X8
5PM
ADTOC
UNPAY
DOI 10.1107/S205327332300760X
DatabaseName CrossRef
MEDLINE - Academic
PubMed Central (Full Participant titles)
Unpaywall for CDI: Periodical Content
Unpaywall
DatabaseTitle CrossRef
MEDLINE - Academic
DatabaseTitleList CrossRef

MEDLINE - Academic
Database_xml – sequence: 1
  dbid: UNPAY
  name: Unpaywall
  url: https://proxy.k.utb.cz/login?url=https://unpaywall.org/
  sourceTypes: Open Access Repository
DeliveryMethod fulltext_linktorsrc
Discipline Engineering
DocumentTitleAlternate Algorithm for diffraction and cryo-EM data compression
EISSN 2053-2733
EndPage 541
ExternalDocumentID 10.1107/s205327332300760x
PMC10626653
10_1107_S205327332300760X
GrantInformation_xml – fundername: Schweizerischer Nationalfonds zur Förderung der Wissenschaftlichen Forschung
  grantid: 205320_201012
– fundername: HORIZON EUROPE Marie Sklodowska-Curie Actions
  grantid: 956099
GroupedDBID .GA
10A
1OC
3SF
50Z
52M
52O
52U
52W
7PT
8UM
930
A03
AAEVG
AAHQN
AAMNL
AANLZ
AAXRX
AAYCA
AAYXX
AAZKR
ABCUV
ABDBF
ABJNI
ABPVW
ACAHQ
ACCZN
ACGFS
ACPOU
ACXBN
ACXQS
ADBBV
ADEOM
ADIZJ
ADKYN
ADMGS
ADMLS
ADOZA
ADXAS
ADZMN
AEIGN
AEIMD
AEUYR
AEYWJ
AFBPY
AFFPM
AFGKR
AFWVQ
AGHNM
AGYGG
AHBTC
AITYG
AIURR
ALMA_UNASSIGNED_HOLDINGS
ALUQN
ALVPJ
AMYDB
ATUGU
BDRZF
BFHJK
BMXJE
BROTX
BRXPI
BY8
CITATION
D-F
DCZOG
DR2
DRFUL
DRSTM
EBS
G-S
G.N
GODZA
HGLYW
LATKE
LEEKS
LITHE
LOXES
LP7
LUTES
LYRES
MRFUL
MRSTM
MSFUL
MSSTM
MXFUL
MXSTM
QB0
RCJ
SUPJJ
TUS
WIH
WIK
7X8
5PM
AANHP
ACRPL
ACYXJ
ADNMO
ADTOC
AGQPQ
EJD
LH4
LW6
UNPAY
ID FETCH-LOGICAL-c357t-6035391d5a638ab8b2fe39a7a4af85a8beb99b6b1acc801419d2d49d3d306b013
IEDL.DBID UNPAY
ISSN 2053-2733
IngestDate Sun Oct 26 04:13:57 EDT 2025
Tue Sep 30 17:11:38 EDT 2025
Tue Sep 30 23:07:00 EDT 2025
Wed Oct 01 00:30:01 EDT 2025
IsDoiOpenAccess true
IsOpenAccess true
IsPeerReviewed true
IsScholarly true
Issue 6
Language English
License This is an open-access article distributed under the terms of the Creative Commons Attribution (CC-BY) Licence, which permits unrestricted use, distribution, and reproduction in any medium, provided the original authors and source are cited.
cc-by
LinkModel DirectLink
MergedId FETCHMERGED-LOGICAL-c357t-6035391d5a638ab8b2fe39a7a4af85a8beb99b6b1acc801419d2d49d3d306b013
Notes ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 23
OpenAccessLink https://proxy.k.utb.cz/login?url=https://journals.iucr.org/a/issues/2023/06/00/lu5031/lu5031.pdf
PMID 37743849
PQID 2868669043
PQPubID 23479
PageCount 6
ParticipantIDs unpaywall_primary_10_1107_s205327332300760x
pubmedcentral_primary_oai_pubmedcentral_nih_gov_10626653
proquest_miscellaneous_2868669043
crossref_primary_10_1107_S205327332300760X
ProviderPackageCode CITATION
AAYXX
PublicationCentury 2000
PublicationDate 2023-11-01
PublicationDateYYYYMMDD 2023-11-01
PublicationDate_xml – month: 11
  year: 2023
  text: 2023-11-01
  day: 01
PublicationDecade 2020
PublicationTitle Acta crystallographica. Section A, Foundations and advances
PublicationYear 2023
Publisher International Union of Crystallography
Publisher_xml – name: International Union of Crystallography
References Paton (lu5031_bb8) 2021; 227
Hill (lu5031_bb3) 2016; 41
Stroppa (lu5031_bb10) 2023; 31
Tate (lu5031_bb12) 2016; 22
Ferrer (lu5031_bb2) 1998; 54
Kieffer (lu5031_bb4) 2018; 25
Robinson (lu5031_bb9) 1967; 55
lu5031_bb14
Abrahams (lu5031_bb1) 1993; 28
Loetgering (lu5031_bb5) 2017; 6
Tolle (lu5031_bb13) 2011; 99
Mokso (lu5031_bb7) 2017; 24
Masui (lu5031_bb6) 2015; 12
Tang (lu5031_bb11) 2007; 157
References_xml – volume: 12
  start-page: 181
  year: 2015
  ident: lu5031_bb6
  publication-title: Astron. Comput.
  doi: 10.1016/j.ascom.2015.07.002
– volume: 24
  start-page: 1250
  year: 2017
  ident: lu5031_bb7
  publication-title: J. Synchrotron Rad.
  doi: 10.1107/S1600577517013522
– volume: 55
  start-page: 356
  year: 1967
  ident: lu5031_bb9
  publication-title: Proc. IEEE
  doi: 10.1109/PROC.1967.5493
– volume: 31
  start-page: 10
  year: 2023
  ident: lu5031_bb10
  publication-title: Microscopy Today
  doi: 10.1093/mictod/qaad005
– volume: 22
  start-page: 237
  year: 2016
  ident: lu5031_bb12
  publication-title: Microsc. Microanal.
  doi: 10.1017/S1431927615015664
– volume: 6
  start-page: 475
  year: 2017
  ident: lu5031_bb5
  publication-title: Adv. Opt. Technol.
  doi: 10.1515/aot-2017-0053
– volume: 157
  start-page: 38
  year: 2007
  ident: lu5031_bb11
  publication-title: J. Struct. Biol.
  doi: 10.1016/j.jsb.2006.05.009
– volume: 41
  start-page: 399
  year: 2016
  ident: lu5031_bb3
  publication-title: MRS Bull.
  doi: 10.1557/mrs.2016.93
– volume: 25
  start-page: 612
  year: 2018
  ident: lu5031_bb4
  publication-title: J. Synchrotron Rad.
  doi: 10.1107/S1600577518000607
– volume: 227
  start-page: 113298
  year: 2021
  ident: lu5031_bb8
  publication-title: Ultramicroscopy
  doi: 10.1016/j.ultramic.2021.113298
– volume: 54
  start-page: 184
  year: 1998
  ident: lu5031_bb2
  publication-title: Acta Cryst. D
  doi: 10.1107/S0907444997007257
– volume: 28
  start-page: 3
  year: 1993
  ident: lu5031_bb1
  publication-title: Joint CCP4 and ESF-EACBM Newsletter on Protein Crystallography
– volume: 99
  start-page: 1334
  year: 2011
  ident: lu5031_bb13
  publication-title: Proc. IEEE
  doi: 10.1109/JPROC.2011.2155130
– ident: lu5031_bb14
SSID ssj0001105260
Score 2.3816528
Snippet High-throughput data collection in crystallography poses significant challenges in handling massive amounts of data. Here, TERSE/PROLIX (or TRPX for short) is...
High-throughput data collection in crystallography poses significant challenges in handling massive amounts of data. Here, TERSE/PROLIX (or TRPX for short) is...
This article presents a fast and lossless algorithm for compressing diffraction data, achieving up to 85% reduction in file size while processing up to 2000...
SourceID unpaywall
pubmedcentral
proquest
crossref
SourceType Open Access Repository
Aggregation Database
Index Database
StartPage 536
SubjectTerms Research Papers
Title TERSE/PROLIX ( TRPX ) – a new algorithm for fast and lossless compression and decompression of diffraction and cryo-EM data
URI https://www.proquest.com/docview/2868669043
https://pubmed.ncbi.nlm.nih.gov/PMC10626653
https://journals.iucr.org/a/issues/2023/06/00/lu5031/lu5031.pdf
UnpaywallVersion publishedVersion
Volume 79
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
journalDatabaseRights – providerCode: PRVEBS
  databaseName: EBSCOhost Academic Search Ultimate
  customDbUrl: https://search.ebscohost.com/login.aspx?authtype=ip,shib&custid=s3936755&profile=ehost&defaultdb=asn
  eissn: 2053-2733
  dateEnd: 20241102
  omitProxy: true
  ssIdentifier: ssj0001105260
  issn: 2053-2733
  databaseCode: ABDBF
  dateStart: 20150701
  isFulltext: true
  titleUrlDefault: https://search.ebscohost.com/direct.asp?db=asn
  providerName: EBSCOhost
– providerCode: PRVEBS
  databaseName: Inspec with Full Text
  customDbUrl:
  eissn: 2053-2733
  dateEnd: 20241102
  omitProxy: false
  ssIdentifier: ssj0001105260
  issn: 2053-2733
  databaseCode: ADMLS
  dateStart: 20150701
  isFulltext: true
  titleUrlDefault: https://www.ebsco.com/products/research-databases/inspec-full-text
  providerName: EBSCOhost
– providerCode: PRVWIB
  databaseName: Wiley Online Library - Core collection (SURFmarket)
  issn: 2053-2733
  databaseCode: DR2
  dateStart: 20140101
  customDbUrl:
  isFulltext: true
  eissn: 2053-2733
  dateEnd: 99991231
  omitProxy: false
  ssIdentifier: ssj0001105260
  providerName: Wiley-Blackwell
link http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwpV3LjtMwFLWgswAWvEeUx8hILHgozcOxEy9HqKMBMUPVmUphFfwKUxGSKk0EAxv-gT_kS7hO0qHtrBCrSLaT2PFxcm5877kIPTOMA4vVgcOlUE7ImHS4CYjDYkYNM14WeTYa-eiYHc7CtwlN-jynNhamf4LL0bxRVacT7LY9WFoznbgecz3PzRsKYOwPo4XOrqIdRoGMD9DO7Hiy_8GmlAN42cAT0u9l-jZgzhbaMiDedkvq2-bX6C_F3HaQvNYUC3H-VeT52tfn4Bb6uOp353TyedTUcqS-b0k6_sfAbqObPTPF-x2U7qArpriLbqzpFd5DP4D7nozdyfT9uzcJfn46nSQv8O-fv7DAwM6xyD-V1bw--4KBCeNMLGssCo1zGHkOL1Rs3dc7t9uirdBmvaTMsM3WUnWRFm0DVZ2XzvgIWzfW-2h2MD59fej02RscRWhUO8wjlHBfUwFLXMhYBpkhXEQiFFlMRSyN5Fwy6QulrISNz3WgQ66JBivG_p3dRYOiLMwDBHeJNDUqpCRSYcxCAcQjIyIKhQ8ElkVD9HI1i-miE-lIW-PGi9KTrSlPhujpap5TWEp2f0QUpmyWaRADQhn3QjJE8QYALq5qxbg3a4r5WSvKDaY1cB0Kp766wMqlzlzC38N_av0IXbdY6UIhH6NBXTXmCXCiWu6BNTAN9nr0_wEZpQlF
linkProvider Unpaywall
linkToUnpaywall http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwpV3LbtQwFLVgugAWlKcYKMhILHgok4djJ15WaKqCaBm1HSmsgl-hI0IyyiSC0k3_gT_kS7hOMmVmukKsItlOYsfHybnxveci9MIwDixWBw6XQjkhY9LhJiAOixk1zHhZ5Nlo5INDtj8N3yc06fOc2liY_gkuRrNGVZ1OsNv2YGHNdOJ6zPU8N28ogLE_jOY6u462GAUyPkBb08PJ7iebUg7gZQNPSL-X6duAOVtoy4B42y2pH-tfo78Uc9NB8kZTzMXZd5HnK1-fvW30ednvzunk66ip5Uj93JB0_I-B3UG3e2aKdzso3UXXTHEP3VrRK7yPzoH7Ho_dydHHD-8S_PLkaJK8wr8vfmGBgZ1jkX8pq1l9-g0DE8aZWNRYFBrnMPIcXqjYuq93brdFW6HNakmZYZutpeoiLdoGqjornfEBtm6sD9B0b3zydt_pszc4itCodphHKOG-pgKWuJCxDDJDuIhEKLKYilgayblk0hdKWQkbn-tAh1wTDVaM_Tv7EA2KsjCPENwl0tSokJJIhTELBRCPjIgoFD4QWBYN0evlLKbzTqQjbY0bL0qPN6Y8GaLny3lOYSnZ_RFRmLJZpEEMCGXcC8kQxWsAuLyqFeNerylmp60oN5jWwHUonPrmEitXOnMFf4__qfUTdNNipQuF3EGDumrMU-BEtXzW4_4P5-QIXA
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=TERSE%2FPROLIX+%28+TRPX+%29+%E2%80%93+a+new+algorithm+for+fast+and+lossless+compression+and+decompression+of+diffraction+and+cryo-EM+data&rft.jtitle=Acta+crystallographica.+Section+A%2C+Foundations+and+advances&rft.au=Matinyan%2C+Senik&rft.au=Abrahams%2C+Jan+Pieter&rft.date=2023-11-01&rft.issn=2053-2733&rft.eissn=2053-2733&rft.volume=79&rft.issue=6&rft.spage=536&rft.epage=541&rft_id=info:doi/10.1107%2FS205327332300760X&rft.externalDBID=n%2Fa&rft.externalDocID=10_1107_S205327332300760X
thumbnail_l http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=2053-2733&client=summon
thumbnail_m http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=2053-2733&client=summon
thumbnail_s http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=2053-2733&client=summon