TERSE/PROLIX ( TRPX ) – a new algorithm for fast and lossless compression and decompression of diffraction and cryo-EM data

High-throughput data collection in crystallography poses significant challenges in handling massive amounts of data. Here, TERSE/PROLIX (or TRPX for short) is presented, a novel lossless compression algorithm specifically designed for diffraction data. The algorithm is compared with established loss...

Full description

Saved in:
Bibliographic Details
Published inActa crystallographica. Section A, Foundations and advances Vol. 79; no. 6; pp. 536 - 541
Main Authors Matinyan, Senik, Abrahams, Jan Pieter
Format Journal Article
LanguageEnglish
Published International Union of Crystallography 01.11.2023
Subjects
Online AccessGet full text
ISSN2053-2733
2053-2733
DOI10.1107/S205327332300760X

Cover

More Information
Summary:High-throughput data collection in crystallography poses significant challenges in handling massive amounts of data. Here, TERSE/PROLIX (or TRPX for short) is presented, a novel lossless compression algorithm specifically designed for diffraction data. The algorithm is compared with established lossless compression algorithms implemented in gzip , bzip2 , CBF (crystallographic binary file), Zstandard ( zstd ), LZ4 and HDF5 with gzip , LZF and bitshuffle + LZ4 filters, in terms of compression efficiency and speed, using continuous-rotation electron diffraction data of an inorganic compound and raw cryo-EM data. The results show that TRPX significantly outperforms all these algorithms in terms of speed and compression rate. It was 60 times faster than bzip2 (which achieved a similar compression rate), and more than 3 times faster than LZ4 , which was the runner-up in terms of speed, but had a much worse compression rate. TRPX files are byte-order independent and upon compilation the algorithm occupies very little memory. It can therefore be readily implemented in hardware. By providing a tailored solution for diffraction and raw cryo-EM data, TRPX facilitates more efficient data analysis and interpretation while mitigating storage and transmission concerns. The C++20 compression/decompression code, custom TIFF library and an ImageJ / Fiji Java plugin for reading TRPX files are open-sourced on GitHub under the permissive MIT license.
Bibliography:ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 23
ISSN:2053-2733
2053-2733
DOI:10.1107/S205327332300760X