Significantly Improving Lossy Compression for Scientific Data Sets Based on Multidimensional Prediction and Error-Controlled Quantization

Today's HPC applications are producing extremely large amounts of data, such that data storage and analysis are becoming more challenging for scientific research. In this work, we design a new error-controlled lossy compression algorithm for large-scale scientific data. Our key contribution is...

Full description

Saved in:

Bibliographic Details
Published in	Proceedings - IEEE International Parallel and Distributed Processing Symposium pp. 1129 - 1139
Main Authors	Dingwen Tao, Sheng Di, Zizhong Chen, Cappello, Franck
Format	Conference Proceeding
Language	English
Published	IEEE 01.05.2017
Subjects	Adaptation models Compression algorithms Data models Encoding Measurement Predictive models Quantization (signal)
Online Access	Get full text
ISSN	1530-2075
DOI	10.1109/IPDPS.2017.115

Cover

Abstract	Today's HPC applications are producing extremely large amounts of data, such that data storage and analysis are becoming more challenging for scientific research. In this work, we design a new error-controlled lossy compression algorithm for large-scale scientific data. Our key contribution is significantly improving the prediction hitting rate (or prediction accuracy) for each data point based on its nearby data values along multiple dimensions. We derive a series of multilayer prediction formulas and their unified formula in the context of data compression. One serious challenge is that the data prediction has to be performed based on the preceding decompressed values during the compression in order to guarantee the error bounds, which may degrade the prediction accuracy in turn. We explore the best layer for the prediction by considering the impact of compression errors on the prediction accuracy. Moreover, we propose an adaptive error-controlled quantization encoder, which can further improve the prediction hitting rate considerably. The data size can be reduced significantly after performing the variable-length encoding because of the uneven distribution produced by our quantization encoder. We evaluate the new compressor on production scientific data sets and compare it with many other state-of-the-art compressors: GZIP, FPZIP, ZFP, SZ-1.1, and ISABELA. Experiments show that our compressor is the best in class, especially with regard to compression factors (or bit-rates) and compression errors (including RMSE, NRMSE, and PSNR). Our solution is better than the second-best solution by more than a 2x increase in the compression factor and 3.8x reduction in the normalized root mean squared error on average, with reasonable error bounds and user-desired bit-rates.
AbstractList	Today's HPC applications are producing extremely large amounts of data, such that data storage and analysis are becoming more challenging for scientific research. In this work, we design a new error-controlled lossy compression algorithm for large-scale scientific data. Our key contribution is significantly improving the prediction hitting rate (or prediction accuracy) for each data point based on its nearby data values along multiple dimensions. We derive a series of multilayer prediction formulas and their unified formula in the context of data compression. One serious challenge is that the data prediction has to be performed based on the preceding decompressed values during the compression in order to guarantee the error bounds, which may degrade the prediction accuracy in turn. We explore the best layer for the prediction by considering the impact of compression errors on the prediction accuracy. Moreover, we propose an adaptive error-controlled quantization encoder, which can further improve the prediction hitting rate considerably. The data size can be reduced significantly after performing the variable-length encoding because of the uneven distribution produced by our quantization encoder. We evaluate the new compressor on production scientific data sets and compare it with many other state-of-the-art compressors: GZIP, FPZIP, ZFP, SZ-1.1, and ISABELA. Experiments show that our compressor is the best in class, especially with regard to compression factors (or bit-rates) and compression errors (including RMSE, NRMSE, and PSNR). Our solution is better than the second-best solution by more than a 2x increase in the compression factor and 3.8x reduction in the normalized root mean squared error on average, with reasonable error bounds and user-desired bit-rates.
Author	Dingwen Tao Cappello, Franck Sheng Di Zizhong Chen
Author_xml	– sequence: 1 surname: Dingwen Tao fullname: Dingwen Tao email: dtao001@cs.ucr.edu organization: Univ. of California, Riverside, Riverside, CA, USA – sequence: 2 surname: Sheng Di fullname: Sheng Di email: sdi1@anl.gov organization: Argonne Nat. Lab., Argonne, IL, USA – sequence: 3 surname: Zizhong Chen fullname: Zizhong Chen email: chen@cs.ucr.edu organization: Univ. of California, Riverside, Riverside, CA, USA – sequence: 4 givenname: Franck surname: Cappello fullname: Cappello, Franck email: cappello@anl.gov organization: Argonne Nat. Lab., Argonne, IL, USA
BookMark	eNotkMFOwzAQRA0Cibb0yoWLfyDFjhM7OUJaoFIRRYFztY3tyii1K9tFKn_AX-MITqtZzdvVzBhdWGcVQjeUzCgl9d1yPV-3s5xQkXR5hqa1qGjJKs5qWvBzNEqCZDkR5RUah_BJSE5YUY_QT2t21mjTgY39CS_3B---jN3hlQvhhBuXFioE4yzWzuO2M8rGwY_nEAG3Kgb8AEFJnBwvxz4aafbKDgD0eO2VNF0caLASL7x3Pmucjd71fWLejumt-YbBcY0uNfRBTf_nBH08Lt6b52z1-rRs7leZyQsaM8WLguWC1CVnUlVANdc51yXfEgYdbEsNOiXrKg1Kb0EKoQSXHFIRoGvdsQm6_btrlFKbgzd78KeNqLlIlbBfIDNn8Q
CODEN	IEEPAD
ContentType	Conference Proceeding
DBID	6IE 6IL CBEJK RIE RIL
DOI	10.1109/IPDPS.2017.115
DatabaseName	IEEE Electronic Library (IEL) Conference Proceedings IEEE Xplore POP ALL IEEE Xplore All Conference Proceedings IEEE Xplore IEEE Proceedings Order Plans (POP All) 1998-Present
DatabaseTitleList
Database_xml	– sequence: 1 dbid: RIE name: IEEE Electronic Library url: https://proxy.k.utb.cz/login?url=https://ieeexplore.ieee.org/ sourceTypes: Publisher
DeliveryMethod	fulltext_linktorsrc
Discipline	Computer Science
EISBN	9781538639146 1538639149
EISSN	1530-2075
EndPage	1139
ExternalDocumentID	7967203
Genre	orig-research
GroupedDBID	29O 6IE 6IF 6IH 6IK 6IL 6IN AAJGR AAWTH ABLEC ADZIZ ALMA_UNASSIGNED_HOLDINGS BEFXN BFFAM BGNUA BKEBE BPEOZ CBEJK CHZPO IEGSK IPLJI OCL RIE RIL
ID	FETCH-LOGICAL-i241t-e64432709563de8a1f6f26f56b03acab5faf349c8faefbad77e76d6a391af9fc3
IEDL.DBID	RIE
IngestDate	Wed Aug 27 02:19:55 EDT 2025
IsPeerReviewed	false
IsScholarly	false
Language	English
LinkModel	DirectLink
MergedId	FETCHMERGED-LOGICAL-i241t-e64432709563de8a1f6f26f56b03acab5faf349c8faefbad77e76d6a391af9fc3
PageCount	11
ParticipantIDs	ieee_primary_7967203
PublicationCentury	2000
PublicationDate	2017-05
PublicationDateYYYYMMDD	2017-05-01
PublicationDate_xml	– month: 05 year: 2017 text: 2017-05
PublicationDecade	2010
PublicationTitle	Proceedings - IEEE International Parallel and Distributed Processing Symposium
PublicationTitleAbbrev	IPDPS
PublicationYear	2017
Publisher	IEEE
Publisher_xml	– name: IEEE
SSID	ssj0020349 ssib030101683
Score	2.180909
Snippet	Today's HPC applications are producing extremely large amounts of data, such that data storage and analysis are becoming more challenging for scientific...
SourceID	ieee
SourceType	Publisher
StartPage	1129
SubjectTerms	Adaptation models Compression algorithms Data models Encoding Measurement Predictive models Quantization (signal)
Title	Significantly Improving Lossy Compression for Scientific Data Sets Based on Multidimensional Prediction and Error-Controlled Quantization
URI	https://ieeexplore.ieee.org/document/7967203
hasFullText	1
inHoldings	1
isFullTextHit
isPrint
link	http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV1LT8JAEN4AJ0_4wPjOHjxaENp99CqPoFGDQRJuZJ-GSIqp7QH_gf_a2W2p0Xjw1jY76WZm2pnsfPMNQpfcEqlFBBZgkXCnVTIQkqggFlwpGRPCjAfIPtLxLLqbk3kNXVW9MMYYDz4zbXfpa_l6rXJ3VNZhMXVVwzqqM06LXq2t74SeK41Xf-Ge410pSRq713HndjKYTB2Si8E9-TFKxUeSURM9bPdQAEhe23km2-rjFz3jfze5i1rfPXt4UkWjPVQzyT5qboc24PIbPkCf0-VL4vBBoNLVBleHCvgewuUGO4ECGptgyGcLOY8nwgORCTw12Tu-gdCnMazw7bvaDQgoyD1gB67w44yNRaLxME3XadAv4PArkHnK4bVl62cLzUbD5_44KOcxBEuI81lgIHcKe8xRF4bacNG11PaoJRTULxSY2AoLWlfcCmOl0IwZRjUVYdwVNrYqPESNZJ2YI4S1lJGQkQVPiCPCLeeaGkqIjRR1jGTH6MBpdvFWUG4sSqWe_P34FO04yxY4xDPUyNLcnEOukMkL7yRf6L3Deg
linkProvider	IEEE
linkToHtml	http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV1LT8JAEN4gHvSECsa3e_BoQejutr3KI6hAMEDCjezTEEkxtT3gP_BfO9uWGo0Hb22zk25mpp3JzjffIHTjGyoUJ2ABj3B7WiUcLqh0Au5LKQJKPZ0CZEesPyOPczovoduiF0ZrnYLPdN1eprV8tZaJPSpreAGzVcMdtEsJITTr1tp6j5uypfnFf7hlmVdymsbmXdB4GHfGE4vl8uCe_himksaSXgUNt7vIICSv9SQWdfnxi6Dxv9s8QLXvrj08LuLRISrp8AhVtmMbcP4VV9HnZPkSWoQQKHW1wcWxAh5AwNxgK5CBY0MMGW0mlyKKcIfHHE90_I7vIfgpDCvSBl5lRwRk9B6wA1v6sebGPFS4G0XryGlngPgVyDwn8Nq8-bOGZr3utN138okMzhIifexoyJ7clmfJC12lfd40zLSYoQzUzyUY2XADWpe-4doIrjxPe0wx7gZNbgIj3WNUDtehPkFYCUG4IAZ8ISDUN76vmGaUGiKZ5SQ7RVWr2cVbRrqxyJV69vfja7TXnw4Hi8HD6Okc7VsrZ6jEC1SOo0RfQuYQi6vUYb4ASdrGxw
openUrl	ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Abook&rft.genre=proceeding&rft.title=Proceedings+-+IEEE+International+Parallel+and+Distributed+Processing+Symposium&rft.atitle=Significantly+Improving+Lossy+Compression+for+Scientific+Data+Sets+Based+on+Multidimensional+Prediction+and+Error-Controlled+Quantization&rft.au=Dingwen+Tao&rft.au=Sheng+Di&rft.au=Zizhong+Chen&rft.au=Cappello%2C+Franck&rft.date=2017-05-01&rft.pub=IEEE&rft.eissn=1530-2075&rft.spage=1129&rft.epage=1139&rft_id=info:doi/10.1109%2FIPDPS.2017.115&rft.externalDocID=7967203