Significantly Improving Lossy Compression for Scientific Data Sets Based on Multidimensional Prediction and Error-Controlled Quantization

Today's HPC applications are producing extremely large amounts of data, such that data storage and analysis are becoming more challenging for scientific research. In this work, we design a new error-controlled lossy compression algorithm for large-scale scientific data. Our key contribution is...

Full description

Saved in:
Bibliographic Details
Published inProceedings - IEEE International Parallel and Distributed Processing Symposium pp. 1129 - 1139
Main Authors Dingwen Tao, Sheng Di, Zizhong Chen, Cappello, Franck
Format Conference Proceeding
LanguageEnglish
Published IEEE 01.05.2017
Subjects
Online AccessGet full text
ISSN1530-2075
DOI10.1109/IPDPS.2017.115

Cover

Abstract Today's HPC applications are producing extremely large amounts of data, such that data storage and analysis are becoming more challenging for scientific research. In this work, we design a new error-controlled lossy compression algorithm for large-scale scientific data. Our key contribution is significantly improving the prediction hitting rate (or prediction accuracy) for each data point based on its nearby data values along multiple dimensions. We derive a series of multilayer prediction formulas and their unified formula in the context of data compression. One serious challenge is that the data prediction has to be performed based on the preceding decompressed values during the compression in order to guarantee the error bounds, which may degrade the prediction accuracy in turn. We explore the best layer for the prediction by considering the impact of compression errors on the prediction accuracy. Moreover, we propose an adaptive error-controlled quantization encoder, which can further improve the prediction hitting rate considerably. The data size can be reduced significantly after performing the variable-length encoding because of the uneven distribution produced by our quantization encoder. We evaluate the new compressor on production scientific data sets and compare it with many other state-of-the-art compressors: GZIP, FPZIP, ZFP, SZ-1.1, and ISABELA. Experiments show that our compressor is the best in class, especially with regard to compression factors (or bit-rates) and compression errors (including RMSE, NRMSE, and PSNR). Our solution is better than the second-best solution by more than a 2x increase in the compression factor and 3.8x reduction in the normalized root mean squared error on average, with reasonable error bounds and user-desired bit-rates.
AbstractList Today's HPC applications are producing extremely large amounts of data, such that data storage and analysis are becoming more challenging for scientific research. In this work, we design a new error-controlled lossy compression algorithm for large-scale scientific data. Our key contribution is significantly improving the prediction hitting rate (or prediction accuracy) for each data point based on its nearby data values along multiple dimensions. We derive a series of multilayer prediction formulas and their unified formula in the context of data compression. One serious challenge is that the data prediction has to be performed based on the preceding decompressed values during the compression in order to guarantee the error bounds, which may degrade the prediction accuracy in turn. We explore the best layer for the prediction by considering the impact of compression errors on the prediction accuracy. Moreover, we propose an adaptive error-controlled quantization encoder, which can further improve the prediction hitting rate considerably. The data size can be reduced significantly after performing the variable-length encoding because of the uneven distribution produced by our quantization encoder. We evaluate the new compressor on production scientific data sets and compare it with many other state-of-the-art compressors: GZIP, FPZIP, ZFP, SZ-1.1, and ISABELA. Experiments show that our compressor is the best in class, especially with regard to compression factors (or bit-rates) and compression errors (including RMSE, NRMSE, and PSNR). Our solution is better than the second-best solution by more than a 2x increase in the compression factor and 3.8x reduction in the normalized root mean squared error on average, with reasonable error bounds and user-desired bit-rates.
Author Dingwen Tao
Cappello, Franck
Sheng Di
Zizhong Chen
Author_xml – sequence: 1
  surname: Dingwen Tao
  fullname: Dingwen Tao
  email: dtao001@cs.ucr.edu
  organization: Univ. of California, Riverside, Riverside, CA, USA
– sequence: 2
  surname: Sheng Di
  fullname: Sheng Di
  email: sdi1@anl.gov
  organization: Argonne Nat. Lab., Argonne, IL, USA
– sequence: 3
  surname: Zizhong Chen
  fullname: Zizhong Chen
  email: chen@cs.ucr.edu
  organization: Univ. of California, Riverside, Riverside, CA, USA
– sequence: 4
  givenname: Franck
  surname: Cappello
  fullname: Cappello, Franck
  email: cappello@anl.gov
  organization: Argonne Nat. Lab., Argonne, IL, USA
BookMark eNotkMFOwzAQRA0Cibb0yoWLfyDFjhM7OUJaoFIRRYFztY3tyii1K9tFKn_AX-MITqtZzdvVzBhdWGcVQjeUzCgl9d1yPV-3s5xQkXR5hqa1qGjJKs5qWvBzNEqCZDkR5RUah_BJSE5YUY_QT2t21mjTgY39CS_3B---jN3hlQvhhBuXFioE4yzWzuO2M8rGwY_nEAG3Kgb8AEFJnBwvxz4aafbKDgD0eO2VNF0caLASL7x3Pmucjd71fWLejumt-YbBcY0uNfRBTf_nBH08Lt6b52z1-rRs7leZyQsaM8WLguWC1CVnUlVANdc51yXfEgYdbEsNOiXrKg1Kb0EKoQSXHFIRoGvdsQm6_btrlFKbgzd78KeNqLlIlbBfIDNn8Q
CODEN IEEPAD
ContentType Conference Proceeding
DBID 6IE
6IL
CBEJK
RIE
RIL
DOI 10.1109/IPDPS.2017.115
DatabaseName IEEE Electronic Library (IEL) Conference Proceedings
IEEE Xplore POP ALL
IEEE Xplore All Conference Proceedings
IEEE Xplore
IEEE Proceedings Order Plans (POP All) 1998-Present
DatabaseTitleList
Database_xml – sequence: 1
  dbid: RIE
  name: IEEE Electronic Library
  url: https://proxy.k.utb.cz/login?url=https://ieeexplore.ieee.org/
  sourceTypes: Publisher
DeliveryMethod fulltext_linktorsrc
Discipline Computer Science
EISBN 9781538639146
1538639149
EISSN 1530-2075
EndPage 1139
ExternalDocumentID 7967203
Genre orig-research
GroupedDBID 29O
6IE
6IF
6IH
6IK
6IL
6IN
AAJGR
AAWTH
ABLEC
ADZIZ
ALMA_UNASSIGNED_HOLDINGS
BEFXN
BFFAM
BGNUA
BKEBE
BPEOZ
CBEJK
CHZPO
IEGSK
IPLJI
OCL
RIE
RIL
ID FETCH-LOGICAL-i241t-e64432709563de8a1f6f26f56b03acab5faf349c8faefbad77e76d6a391af9fc3
IEDL.DBID RIE
IngestDate Wed Aug 27 02:19:55 EDT 2025
IsPeerReviewed false
IsScholarly false
Language English
LinkModel DirectLink
MergedId FETCHMERGED-LOGICAL-i241t-e64432709563de8a1f6f26f56b03acab5faf349c8faefbad77e76d6a391af9fc3
PageCount 11
ParticipantIDs ieee_primary_7967203
PublicationCentury 2000
PublicationDate 2017-05
PublicationDateYYYYMMDD 2017-05-01
PublicationDate_xml – month: 05
  year: 2017
  text: 2017-05
PublicationDecade 2010
PublicationTitle Proceedings - IEEE International Parallel and Distributed Processing Symposium
PublicationTitleAbbrev IPDPS
PublicationYear 2017
Publisher IEEE
Publisher_xml – name: IEEE
SSID ssj0020349
ssib030101683
Score 2.180909
Snippet Today's HPC applications are producing extremely large amounts of data, such that data storage and analysis are becoming more challenging for scientific...
SourceID ieee
SourceType Publisher
StartPage 1129
SubjectTerms Adaptation models
Compression algorithms
Data models
Encoding
Measurement
Predictive models
Quantization (signal)
Title Significantly Improving Lossy Compression for Scientific Data Sets Based on Multidimensional Prediction and Error-Controlled Quantization
URI https://ieeexplore.ieee.org/document/7967203
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
link http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV1LT8JAEN4AJ0_4wPjOHjxaENp99CqPoFGDQRJuZJ-GSIqp7QH_gf_a2W2p0Xjw1jY76WZm2pnsfPMNQpfcEqlFBBZgkXCnVTIQkqggFlwpGRPCjAfIPtLxLLqbk3kNXVW9MMYYDz4zbXfpa_l6rXJ3VNZhMXVVwzqqM06LXq2t74SeK41Xf-Ge410pSRq713HndjKYTB2Si8E9-TFKxUeSURM9bPdQAEhe23km2-rjFz3jfze5i1rfPXt4UkWjPVQzyT5qboc24PIbPkCf0-VL4vBBoNLVBleHCvgewuUGO4ECGptgyGcLOY8nwgORCTw12Tu-gdCnMazw7bvaDQgoyD1gB67w44yNRaLxME3XadAv4PArkHnK4bVl62cLzUbD5_44KOcxBEuI81lgIHcKe8xRF4bacNG11PaoJRTULxSY2AoLWlfcCmOl0IwZRjUVYdwVNrYqPESNZJ2YI4S1lJGQkQVPiCPCLeeaGkqIjRR1jGTH6MBpdvFWUG4sSqWe_P34FO04yxY4xDPUyNLcnEOukMkL7yRf6L3Deg
linkProvider IEEE
linkToHtml http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV1LT8JAEN4gHvSECsa3e_BoQejutr3KI6hAMEDCjezTEEkxtT3gP_BfO9uWGo0Hb22zk25mpp3JzjffIHTjGyoUJ2ABj3B7WiUcLqh0Au5LKQJKPZ0CZEesPyOPczovoduiF0ZrnYLPdN1eprV8tZaJPSpreAGzVcMdtEsJITTr1tp6j5uypfnFf7hlmVdymsbmXdB4GHfGE4vl8uCe_himksaSXgUNt7vIICSv9SQWdfnxi6Dxv9s8QLXvrj08LuLRISrp8AhVtmMbcP4VV9HnZPkSWoQQKHW1wcWxAh5AwNxgK5CBY0MMGW0mlyKKcIfHHE90_I7vIfgpDCvSBl5lRwRk9B6wA1v6sebGPFS4G0XryGlngPgVyDwn8Nq8-bOGZr3utN138okMzhIifexoyJ7clmfJC12lfd40zLSYoQzUzyUY2XADWpe-4doIrjxPe0wx7gZNbgIj3WNUDtehPkFYCUG4IAZ8ISDUN76vmGaUGiKZ5SQ7RVWr2cVbRrqxyJV69vfja7TXnw4Hi8HD6Okc7VsrZ6jEC1SOo0RfQuYQi6vUYb4ASdrGxw
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Abook&rft.genre=proceeding&rft.title=Proceedings+-+IEEE+International+Parallel+and+Distributed+Processing+Symposium&rft.atitle=Significantly+Improving+Lossy+Compression+for+Scientific+Data+Sets+Based+on+Multidimensional+Prediction+and+Error-Controlled+Quantization&rft.au=Dingwen+Tao&rft.au=Sheng+Di&rft.au=Zizhong+Chen&rft.au=Cappello%2C+Franck&rft.date=2017-05-01&rft.pub=IEEE&rft.eissn=1530-2075&rft.spage=1129&rft.epage=1139&rft_id=info:doi/10.1109%2FIPDPS.2017.115&rft.externalDocID=7967203