DUHI: Dynamically updated hash index clustering method for DNA storage

The exponential growth of global data leads to the problem of insufficient data storage capacity. DNA storage can be an ideal storage method due to its high storage density and long storage time. However, the DNA storage process is subject to unavoidable errors that can lead to increased cluster red...

Full description

Saved in:
Bibliographic Details
Published inComputers in biology and medicine Vol. 164; p. 107244
Main Authors Wang, Penghao, Cao, Ben, Ma, Tao, Wang, Bin, Zhang, Qiang, Zheng, Pan
Format Journal Article
LanguageEnglish
Published United States Elsevier Ltd 01.09.2023
Elsevier Limited
Subjects
Online AccessGet full text
ISSN0010-4825
1879-0534
1879-0534
DOI10.1016/j.compbiomed.2023.107244

Cover

More Information
Summary:The exponential growth of global data leads to the problem of insufficient data storage capacity. DNA storage can be an ideal storage method due to its high storage density and long storage time. However, the DNA storage process is subject to unavoidable errors that can lead to increased cluster redundancy during data reading, which in turn affects the accuracy of the data reads. This paper proposes a dynamically updated hash index (DUHI) clustering method for DNA storage, which clusters sequences by constructing a dynamic core index set and using hash lookup. The proposed clustering method is analyzed in terms of overall reliability evaluation and visualization evaluation. The results show that the DUHI clustering method can reduce the redundancy of more than 10% of the sequences within the cluster and increase the reconstruction rate of the sequences to more than 99%. Therefore, our method solves the high redundancy problem after DNA sequence clustering, improves the accuracy of data reading, and promotes the development of DNA storage. •During clustering of DNA stores, errors in bases may occur, resulting in incorrect index construction for clustering.•Benefit from using the DUHI clustering method to continuously update the index for effective clustering of DNA sequences.•The reliability measures, Jaccard-Purity coefficient, reconstruction rate etc. indicate reduced redundancy within clusters.•Implement the DUHI clustering method to ensure accuracy of DNA storage during data reading.
Bibliography:ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 14
content type line 23
ISSN:0010-4825
1879-0534
1879-0534
DOI:10.1016/j.compbiomed.2023.107244