TERSE/PROLIX ( TRPX ) – a new algorithm for fast and lossless compression and decompression of diffraction and cryo-EM data
High-throughput data collection in crystallography poses significant challenges in handling massive amounts of data. Here, TERSE/PROLIX (or TRPX for short) is presented, a novel lossless compression algorithm specifically designed for diffraction data. The algorithm is compared with established loss...
Saved in:
| Published in | Acta crystallographica. Section A, Foundations and advances Vol. 79; no. 6; pp. 536 - 541 |
|---|---|
| Main Authors | , |
| Format | Journal Article |
| Language | English |
| Published |
International Union of Crystallography
01.11.2023
|
| Subjects | |
| Online Access | Get full text |
| ISSN | 2053-2733 2053-2733 |
| DOI | 10.1107/S205327332300760X |
Cover
| Summary: | High-throughput data collection in crystallography poses significant challenges in handling massive amounts of data. Here,
TERSE/PROLIX
(or
TRPX
for short) is presented, a novel lossless compression algorithm specifically designed for diffraction data. The algorithm is compared with established lossless compression algorithms implemented in
gzip
,
bzip2
, CBF (crystallographic binary file),
Zstandard
(
zstd
),
LZ4
and HDF5 with
gzip
,
LZF
and
bitshuffle
+
LZ4
filters, in terms of compression efficiency and speed, using continuous-rotation electron diffraction data of an inorganic compound and raw cryo-EM data. The results show that
TRPX
significantly outperforms all these algorithms in terms of speed and compression rate. It was 60 times faster than
bzip2
(which achieved a similar compression rate), and more than 3 times faster than
LZ4
, which was the runner-up in terms of speed, but had a much worse compression rate.
TRPX
files are byte-order independent and upon compilation the algorithm occupies very little memory. It can therefore be readily implemented in hardware. By providing a tailored solution for diffraction and raw cryo-EM data,
TRPX
facilitates more efficient data analysis and interpretation while mitigating storage and transmission concerns. The C++20 compression/decompression code, custom TIFF library and an
ImageJ
/
Fiji
Java plugin for reading
TRPX
files are open-sourced on GitHub under the permissive MIT license. |
|---|---|
| Bibliography: | ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 23 |
| ISSN: | 2053-2733 2053-2733 |
| DOI: | 10.1107/S205327332300760X |