TERSE/PROLIX ( TRPX ) – a new algorithm for fast and lossless compression and decompression of diffraction and cryo-EM data
High-throughput data collection in crystallography poses significant challenges in handling massive amounts of data. Here, TERSE/PROLIX (or TRPX for short) is presented, a novel lossless compression algorithm specifically designed for diffraction data. The algorithm is compared with established loss...
Saved in:
| Published in | Acta crystallographica. Section A, Foundations and advances Vol. 79; no. 6; pp. 536 - 541 |
|---|---|
| Main Authors | , |
| Format | Journal Article |
| Language | English |
| Published |
International Union of Crystallography
01.11.2023
|
| Subjects | |
| Online Access | Get full text |
| ISSN | 2053-2733 2053-2733 |
| DOI | 10.1107/S205327332300760X |
Cover
| Abstract | High-throughput data collection in crystallography poses significant challenges in handling massive amounts of data. Here,
TERSE/PROLIX
(or
TRPX
for short) is presented, a novel lossless compression algorithm specifically designed for diffraction data. The algorithm is compared with established lossless compression algorithms implemented in
gzip
,
bzip2
, CBF (crystallographic binary file),
Zstandard
(
zstd
),
LZ4
and HDF5 with
gzip
,
LZF
and
bitshuffle
+
LZ4
filters, in terms of compression efficiency and speed, using continuous-rotation electron diffraction data of an inorganic compound and raw cryo-EM data. The results show that
TRPX
significantly outperforms all these algorithms in terms of speed and compression rate. It was 60 times faster than
bzip2
(which achieved a similar compression rate), and more than 3 times faster than
LZ4
, which was the runner-up in terms of speed, but had a much worse compression rate.
TRPX
files are byte-order independent and upon compilation the algorithm occupies very little memory. It can therefore be readily implemented in hardware. By providing a tailored solution for diffraction and raw cryo-EM data,
TRPX
facilitates more efficient data analysis and interpretation while mitigating storage and transmission concerns. The C++20 compression/decompression code, custom TIFF library and an
ImageJ
/
Fiji
Java plugin for reading
TRPX
files are open-sourced on GitHub under the permissive MIT license. |
|---|---|
| AbstractList | High-throughput data collection in crystallography poses significant challenges in handling massive amounts of data. Here,
TERSE/PROLIX
(or
TRPX
for short) is presented, a novel lossless compression algorithm specifically designed for diffraction data. The algorithm is compared with established lossless compression algorithms implemented in
gzip
,
bzip2
, CBF (crystallographic binary file),
Zstandard
(
zstd
),
LZ4
and HDF5 with
gzip
,
LZF
and
bitshuffle
+
LZ4
filters, in terms of compression efficiency and speed, using continuous-rotation electron diffraction data of an inorganic compound and raw cryo-EM data. The results show that
TRPX
significantly outperforms all these algorithms in terms of speed and compression rate. It was 60 times faster than
bzip2
(which achieved a similar compression rate), and more than 3 times faster than
LZ4
, which was the runner-up in terms of speed, but had a much worse compression rate.
TRPX
files are byte-order independent and upon compilation the algorithm occupies very little memory. It can therefore be readily implemented in hardware. By providing a tailored solution for diffraction and raw cryo-EM data,
TRPX
facilitates more efficient data analysis and interpretation while mitigating storage and transmission concerns. The C++20 compression/decompression code, custom TIFF library and an
ImageJ
/
Fiji
Java plugin for reading
TRPX
files are open-sourced on GitHub under the permissive MIT license. This article presents a fast and lossless algorithm for compressing diffraction data, achieving up to 85% reduction in file size while processing up to 2000 512 × 512 frames s−1. This breakthrough in compression technology is a significant step towards more efficient analysis and storage of large diffraction data sets. High-throughput data collection in crystallography poses significant challenges in handling massive amounts of data. Here, TERSE/PROLIX (or TRPX for short) is presented, a novel lossless compression algorithm specifically designed for diffraction data. The algorithm is compared with established lossless compression algorithms implemented in gzip, bzip2, CBF (crystallographic binary file), Zstandard(zstd), LZ4 and HDF5 with gzip, LZF and bitshuffle+LZ4 filters, in terms of compression efficiency and speed, using continuous-rotation electron diffraction data of an inorganic compound and raw cryo-EM data. The results show that TRPX significantly outperforms all these algorithms in terms of speed and compression rate. It was 60 times faster than bzip2 (which achieved a similar compression rate), and more than 3 times faster than LZ4, which was the runner-up in terms of speed, but had a much worse compression rate. TRPX files are byte-order independent and upon compilation the algorithm occupies very little memory. It can therefore be readily implemented in hardware. By providing a tailored solution for diffraction and raw cryo-EM data, TRPX facilitates more efficient data analysis and interpretation while mitigating storage and transmission concerns. The C++20 compression/decompression code, custom TIFF library and an ImageJ/Fiji Java plugin for reading TRPX files are open-sourced on GitHub under the permissive MIT license. High-throughput data collection in crystallography poses significant challenges in handling massive amounts of data. Here, TERSE/PROLIX (or TRPX for short) is presented, a novel lossless compression algorithm specifically designed for diffraction data. The algorithm is compared with established lossless compression algorithms implemented in gzip, bzip2, CBF (crystallographic binary file), Zstandard(zstd), LZ4 and HDF5 with gzip, LZF and bitshuffle+LZ4 filters, in terms of compression efficiency and speed, using continuous-rotation electron diffraction data of an inorganic compound and raw cryo-EM data. The results show that TRPX significantly outperforms all these algorithms in terms of speed and compression rate. It was 60 times faster than bzip2 (which achieved a similar compression rate), and more than 3 times faster than LZ4, which was the runner-up in terms of speed, but had a much worse compression rate. TRPX files are byte-order independent and upon compilation the algorithm occupies very little memory. It can therefore be readily implemented in hardware. By providing a tailored solution for diffraction and raw cryo-EM data, TRPX facilitates more efficient data analysis and interpretation while mitigating storage and transmission concerns. The C++20 compression/decompression code, custom TIFF library and an ImageJ/Fiji Java plugin for reading TRPX files are open-sourced on GitHub under the permissive MIT license.High-throughput data collection in crystallography poses significant challenges in handling massive amounts of data. Here, TERSE/PROLIX (or TRPX for short) is presented, a novel lossless compression algorithm specifically designed for diffraction data. The algorithm is compared with established lossless compression algorithms implemented in gzip, bzip2, CBF (crystallographic binary file), Zstandard(zstd), LZ4 and HDF5 with gzip, LZF and bitshuffle+LZ4 filters, in terms of compression efficiency and speed, using continuous-rotation electron diffraction data of an inorganic compound and raw cryo-EM data. The results show that TRPX significantly outperforms all these algorithms in terms of speed and compression rate. It was 60 times faster than bzip2 (which achieved a similar compression rate), and more than 3 times faster than LZ4, which was the runner-up in terms of speed, but had a much worse compression rate. TRPX files are byte-order independent and upon compilation the algorithm occupies very little memory. It can therefore be readily implemented in hardware. By providing a tailored solution for diffraction and raw cryo-EM data, TRPX facilitates more efficient data analysis and interpretation while mitigating storage and transmission concerns. The C++20 compression/decompression code, custom TIFF library and an ImageJ/Fiji Java plugin for reading TRPX files are open-sourced on GitHub under the permissive MIT license. |
| Author | Matinyan, Senik Abrahams, Jan Pieter |
| Author_xml | – sequence: 1 givenname: Senik surname: Matinyan fullname: Matinyan, Senik – sequence: 2 givenname: Jan Pieter surname: Abrahams fullname: Abrahams, Jan Pieter |
| BookMark | eNplUclOwzAQtRCI_QO4-QiHgJfESU4IobJIRaBSpN6iiRcISuxip5QekPgH_pAvIaUgttOM3qbRmw20bJ3VCO1Qsk8pSQ-uGUk4SzlnnJBUkNESWp9D0Rxb_rGvoe0Q7gkhnS1hgqyiNZ6mMc_ifB09D3uD697B1eCyfz7Cu3g4uBrhPfz28ooBWz3FUN86X7V3DTbOYwOhxWAVrl0ItQ4BS9eMfbdUzn4QSv9EnMGqMsaDbL8E0s9c1LvAClrYQisG6qC3P-cmujnpDY_Pov7l6fnxUT-SPEnbSBCe8JyqBATPoMxKZjTPIYUYTJZAVuoyz0tRUpAyIzSmuWIqzhVXnIiSUL6J2CJ3Yscwm0JdF2NfNeBnBSXFvM4i_KnzqTMdLkzjSdloJbVtPXwbHVTFb8ZWd8Wte-wiBRMi4V3C7meCdw8THdqiqYLUdQ1Wu0koWCYyIXISz6V0IZW-a9Zr8-_Af__m78y2nCg |
| Cites_doi | 10.1016/j.ascom.2015.07.002 10.1107/S1600577517013522 10.1109/PROC.1967.5493 10.1093/mictod/qaad005 10.1017/S1431927615015664 10.1515/aot-2017-0053 10.1016/j.jsb.2006.05.009 10.1557/mrs.2016.93 10.1107/S1600577518000607 10.1016/j.ultramic.2021.113298 10.1107/S0907444997007257 10.1109/JPROC.2011.2155130 |
| ContentType | Journal Article |
| Copyright | open access. Matinyan and Abrahams 2023 2023 |
| Copyright_xml | – notice: open access. – notice: Matinyan and Abrahams 2023 2023 |
| DBID | AAYXX CITATION 7X8 5PM ADTOC UNPAY |
| DOI | 10.1107/S205327332300760X |
| DatabaseName | CrossRef MEDLINE - Academic PubMed Central (Full Participant titles) Unpaywall for CDI: Periodical Content Unpaywall |
| DatabaseTitle | CrossRef MEDLINE - Academic |
| DatabaseTitleList | CrossRef MEDLINE - Academic |
| Database_xml | – sequence: 1 dbid: UNPAY name: Unpaywall url: https://proxy.k.utb.cz/login?url=https://unpaywall.org/ sourceTypes: Open Access Repository |
| DeliveryMethod | fulltext_linktorsrc |
| Discipline | Engineering |
| DocumentTitleAlternate | Algorithm for diffraction and cryo-EM data compression |
| EISSN | 2053-2733 |
| EndPage | 541 |
| ExternalDocumentID | 10.1107/s205327332300760x PMC10626653 10_1107_S205327332300760X |
| GrantInformation_xml | – fundername: Schweizerischer Nationalfonds zur Förderung der Wissenschaftlichen Forschung grantid: 205320_201012 – fundername: HORIZON EUROPE Marie Sklodowska-Curie Actions grantid: 956099 |
| GroupedDBID | .GA 10A 1OC 3SF 50Z 52M 52O 52U 52W 7PT 8UM 930 A03 AAEVG AAHQN AAMNL AANLZ AAXRX AAYCA AAYXX AAZKR ABCUV ABDBF ABJNI ABPVW ACAHQ ACCZN ACGFS ACPOU ACXBN ACXQS ADBBV ADEOM ADIZJ ADKYN ADMGS ADMLS ADOZA ADXAS ADZMN AEIGN AEIMD AEUYR AEYWJ AFBPY AFFPM AFGKR AFWVQ AGHNM AGYGG AHBTC AITYG AIURR ALMA_UNASSIGNED_HOLDINGS ALUQN ALVPJ AMYDB ATUGU BDRZF BFHJK BMXJE BROTX BRXPI BY8 CITATION D-F DCZOG DR2 DRFUL DRSTM EBS G-S G.N GODZA HGLYW LATKE LEEKS LITHE LOXES LP7 LUTES LYRES MRFUL MRSTM MSFUL MSSTM MXFUL MXSTM QB0 RCJ SUPJJ TUS WIH WIK 7X8 5PM AANHP ACRPL ACYXJ ADNMO ADTOC AGQPQ EJD LH4 LW6 UNPAY |
| ID | FETCH-LOGICAL-c357t-6035391d5a638ab8b2fe39a7a4af85a8beb99b6b1acc801419d2d49d3d306b013 |
| IEDL.DBID | UNPAY |
| ISSN | 2053-2733 |
| IngestDate | Sun Oct 26 04:13:57 EDT 2025 Tue Sep 30 17:11:38 EDT 2025 Tue Sep 30 23:07:00 EDT 2025 Wed Oct 01 00:30:01 EDT 2025 |
| IsDoiOpenAccess | true |
| IsOpenAccess | true |
| IsPeerReviewed | true |
| IsScholarly | true |
| Issue | 6 |
| Language | English |
| License | This is an open-access article distributed under the terms of the Creative Commons Attribution (CC-BY) Licence, which permits unrestricted use, distribution, and reproduction in any medium, provided the original authors and source are cited. cc-by |
| LinkModel | DirectLink |
| MergedId | FETCHMERGED-LOGICAL-c357t-6035391d5a638ab8b2fe39a7a4af85a8beb99b6b1acc801419d2d49d3d306b013 |
| Notes | ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 23 |
| OpenAccessLink | https://proxy.k.utb.cz/login?url=https://journals.iucr.org/a/issues/2023/06/00/lu5031/lu5031.pdf |
| PMID | 37743849 |
| PQID | 2868669043 |
| PQPubID | 23479 |
| PageCount | 6 |
| ParticipantIDs | unpaywall_primary_10_1107_s205327332300760x pubmedcentral_primary_oai_pubmedcentral_nih_gov_10626653 proquest_miscellaneous_2868669043 crossref_primary_10_1107_S205327332300760X |
| ProviderPackageCode | CITATION AAYXX |
| PublicationCentury | 2000 |
| PublicationDate | 2023-11-01 |
| PublicationDateYYYYMMDD | 2023-11-01 |
| PublicationDate_xml | – month: 11 year: 2023 text: 2023-11-01 day: 01 |
| PublicationDecade | 2020 |
| PublicationTitle | Acta crystallographica. Section A, Foundations and advances |
| PublicationYear | 2023 |
| Publisher | International Union of Crystallography |
| Publisher_xml | – name: International Union of Crystallography |
| References | Paton (lu5031_bb8) 2021; 227 Hill (lu5031_bb3) 2016; 41 Stroppa (lu5031_bb10) 2023; 31 Tate (lu5031_bb12) 2016; 22 Ferrer (lu5031_bb2) 1998; 54 Kieffer (lu5031_bb4) 2018; 25 Robinson (lu5031_bb9) 1967; 55 lu5031_bb14 Abrahams (lu5031_bb1) 1993; 28 Loetgering (lu5031_bb5) 2017; 6 Tolle (lu5031_bb13) 2011; 99 Mokso (lu5031_bb7) 2017; 24 Masui (lu5031_bb6) 2015; 12 Tang (lu5031_bb11) 2007; 157 |
| References_xml | – volume: 12 start-page: 181 year: 2015 ident: lu5031_bb6 publication-title: Astron. Comput. doi: 10.1016/j.ascom.2015.07.002 – volume: 24 start-page: 1250 year: 2017 ident: lu5031_bb7 publication-title: J. Synchrotron Rad. doi: 10.1107/S1600577517013522 – volume: 55 start-page: 356 year: 1967 ident: lu5031_bb9 publication-title: Proc. IEEE doi: 10.1109/PROC.1967.5493 – volume: 31 start-page: 10 year: 2023 ident: lu5031_bb10 publication-title: Microscopy Today doi: 10.1093/mictod/qaad005 – volume: 22 start-page: 237 year: 2016 ident: lu5031_bb12 publication-title: Microsc. Microanal. doi: 10.1017/S1431927615015664 – volume: 6 start-page: 475 year: 2017 ident: lu5031_bb5 publication-title: Adv. Opt. Technol. doi: 10.1515/aot-2017-0053 – volume: 157 start-page: 38 year: 2007 ident: lu5031_bb11 publication-title: J. Struct. Biol. doi: 10.1016/j.jsb.2006.05.009 – volume: 41 start-page: 399 year: 2016 ident: lu5031_bb3 publication-title: MRS Bull. doi: 10.1557/mrs.2016.93 – volume: 25 start-page: 612 year: 2018 ident: lu5031_bb4 publication-title: J. Synchrotron Rad. doi: 10.1107/S1600577518000607 – volume: 227 start-page: 113298 year: 2021 ident: lu5031_bb8 publication-title: Ultramicroscopy doi: 10.1016/j.ultramic.2021.113298 – volume: 54 start-page: 184 year: 1998 ident: lu5031_bb2 publication-title: Acta Cryst. D doi: 10.1107/S0907444997007257 – volume: 28 start-page: 3 year: 1993 ident: lu5031_bb1 publication-title: Joint CCP4 and ESF-EACBM Newsletter on Protein Crystallography – volume: 99 start-page: 1334 year: 2011 ident: lu5031_bb13 publication-title: Proc. IEEE doi: 10.1109/JPROC.2011.2155130 – ident: lu5031_bb14 |
| SSID | ssj0001105260 |
| Score | 2.3816528 |
| Snippet | High-throughput data collection in crystallography poses significant challenges in handling massive amounts of data. Here,
TERSE/PROLIX
(or
TRPX
for short) is... High-throughput data collection in crystallography poses significant challenges in handling massive amounts of data. Here, TERSE/PROLIX (or TRPX for short) is... This article presents a fast and lossless algorithm for compressing diffraction data, achieving up to 85% reduction in file size while processing up to 2000... |
| SourceID | unpaywall pubmedcentral proquest crossref |
| SourceType | Open Access Repository Aggregation Database Index Database |
| StartPage | 536 |
| SubjectTerms | Research Papers |
| Title | TERSE/PROLIX ( TRPX ) – a new algorithm for fast and lossless compression and decompression of diffraction and cryo-EM data |
| URI | https://www.proquest.com/docview/2868669043 https://pubmed.ncbi.nlm.nih.gov/PMC10626653 https://journals.iucr.org/a/issues/2023/06/00/lu5031/lu5031.pdf |
| UnpaywallVersion | publishedVersion |
| Volume | 79 |
| hasFullText | 1 |
| inHoldings | 1 |
| isFullTextHit | |
| isPrint | |
| journalDatabaseRights | – providerCode: PRVEBS databaseName: EBSCOhost Academic Search Ultimate customDbUrl: https://search.ebscohost.com/login.aspx?authtype=ip,shib&custid=s3936755&profile=ehost&defaultdb=asn eissn: 2053-2733 dateEnd: 20241102 omitProxy: true ssIdentifier: ssj0001105260 issn: 2053-2733 databaseCode: ABDBF dateStart: 20150701 isFulltext: true titleUrlDefault: https://search.ebscohost.com/direct.asp?db=asn providerName: EBSCOhost – providerCode: PRVEBS databaseName: Inspec with Full Text customDbUrl: eissn: 2053-2733 dateEnd: 20241102 omitProxy: false ssIdentifier: ssj0001105260 issn: 2053-2733 databaseCode: ADMLS dateStart: 20150701 isFulltext: true titleUrlDefault: https://www.ebsco.com/products/research-databases/inspec-full-text providerName: EBSCOhost – providerCode: PRVWIB databaseName: Wiley Online Library - Core collection (SURFmarket) issn: 2053-2733 databaseCode: DR2 dateStart: 20140101 customDbUrl: isFulltext: true eissn: 2053-2733 dateEnd: 99991231 omitProxy: false ssIdentifier: ssj0001105260 providerName: Wiley-Blackwell |
| link | http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwpV3LjtMwFLWgswAWvEeUx8hILHgozcOxEy9HqKMBMUPVmUphFfwKUxGSKk0EAxv-gT_kS7hO0qHtrBCrSLaT2PFxcm5877kIPTOMA4vVgcOlUE7ImHS4CYjDYkYNM14WeTYa-eiYHc7CtwlN-jynNhamf4LL0bxRVacT7LY9WFoznbgecz3PzRsKYOwPo4XOrqIdRoGMD9DO7Hiy_8GmlAN42cAT0u9l-jZgzhbaMiDedkvq2-bX6C_F3HaQvNYUC3H-VeT52tfn4Bb6uOp353TyedTUcqS-b0k6_sfAbqObPTPF-x2U7qArpriLbqzpFd5DP4D7nozdyfT9uzcJfn46nSQv8O-fv7DAwM6xyD-V1bw--4KBCeNMLGssCo1zGHkOL1Rs3dc7t9uirdBmvaTMsM3WUnWRFm0DVZ2XzvgIWzfW-2h2MD59fej02RscRWhUO8wjlHBfUwFLXMhYBpkhXEQiFFlMRSyN5Fwy6QulrISNz3WgQ66JBivG_p3dRYOiLMwDBHeJNDUqpCRSYcxCAcQjIyIKhQ8ElkVD9HI1i-miE-lIW-PGi9KTrSlPhujpap5TWEp2f0QUpmyWaRADQhn3QjJE8QYALq5qxbg3a4r5WSvKDaY1cB0Kp766wMqlzlzC38N_av0IXbdY6UIhH6NBXTXmCXCiWu6BNTAN9nr0_wEZpQlF |
| linkProvider | Unpaywall |
| linkToUnpaywall | http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwpV3LbtQwFLVgugAWlKcYKMhILHgok4djJ15WaKqCaBm1HSmsgl-hI0IyyiSC0k3_gT_kS7hOMmVmukKsItlOYsfHybnxveci9MIwDixWBw6XQjkhY9LhJiAOixk1zHhZ5Nlo5INDtj8N3yc06fOc2liY_gkuRrNGVZ1OsNv2YGHNdOJ6zPU8N28ogLE_jOY6u462GAUyPkBb08PJ7iebUg7gZQNPSL-X6duAOVtoy4B42y2pH-tfo78Uc9NB8kZTzMXZd5HnK1-fvW30ednvzunk66ip5Uj93JB0_I-B3UG3e2aKdzso3UXXTHEP3VrRK7yPzoH7Ho_dydHHD-8S_PLkaJK8wr8vfmGBgZ1jkX8pq1l9-g0DE8aZWNRYFBrnMPIcXqjYuq93brdFW6HNakmZYZutpeoiLdoGqjornfEBtm6sD9B0b3zydt_pszc4itCodphHKOG-pgKWuJCxDDJDuIhEKLKYilgayblk0hdKWQkbn-tAh1wTDVaM_Tv7EA2KsjCPENwl0tSokJJIhTELBRCPjIgoFD4QWBYN0evlLKbzTqQjbY0bL0qPN6Y8GaLny3lOYSnZ_RFRmLJZpEEMCGXcC8kQxWsAuLyqFeNerylmp60oN5jWwHUonPrmEitXOnMFf4__qfUTdNNipQuF3EGDumrMU-BEtXzW4_4P5-QIXA |
| openUrl | ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=TERSE%2FPROLIX+%28+TRPX+%29+%E2%80%93+a+new+algorithm+for+fast+and+lossless+compression+and+decompression+of+diffraction+and+cryo-EM+data&rft.jtitle=Acta+crystallographica.+Section+A%2C+Foundations+and+advances&rft.au=Matinyan%2C+Senik&rft.au=Abrahams%2C+Jan+Pieter&rft.date=2023-11-01&rft.issn=2053-2733&rft.eissn=2053-2733&rft.volume=79&rft.issue=6&rft.spage=536&rft.epage=541&rft_id=info:doi/10.1107%2FS205327332300760X&rft.externalDBID=n%2Fa&rft.externalDocID=10_1107_S205327332300760X |
| thumbnail_l | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=2053-2733&client=summon |
| thumbnail_m | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=2053-2733&client=summon |
| thumbnail_s | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=2053-2733&client=summon |