Integrated Error Correction to Enhance Efficiency of Digital Data Storage Based on DNA Nanostructures

Synthetic DNA is a durable, high-density information storage platform based on DNA nanostructures. However, errors during DNA reading pose challenges to data integrity. Conventional error-correcting codes add redundancy during encoding to ensure data integrity, thereby reducing storage density and i...

Full description

Saved in:
Bibliographic Details
Published inACS nano Vol. 19; no. 32; pp. 29543 - 29553
Main Authors Mao, Cuiping, Zheng, Shuo, Huang, Zhihao, Wang, Dou, Zhuang, Yufan, Zhang, Jiangjiang, Wang, Rui, Jiang, Xingyu
Format Journal Article
LanguageEnglish
Published United States American Chemical Society 19.08.2025
Subjects
Online AccessGet full text
ISSN1936-0851
1936-086X
1936-086X
DOI10.1021/acsnano.5c08183

Cover

More Information
Summary:Synthetic DNA is a durable, high-density information storage platform based on DNA nanostructures. However, errors during DNA reading pose challenges to data integrity. Conventional error-correcting codes add redundancy during encoding to ensure data integrity, thereby reducing storage density and increasing costs. Here, we present an integrated error correction (IEC) algorithm that synergistically combines three enhanced mechanisms: the “head–tail” region Levenshtein distance for error-tolerant clustering (10× faster); sliding window-optimized Hamming distance for error detection and correction of insertions and deletions without length constraints; and score-weighted majority voting for optimal sequence selection (2% higher accuracy), collectively enhancing storage density and decoding efficiency. We confirmed the effectiveness of IEC by recovering medical data encoded in DNA with errors. With IEC, we can simultaneously correct insertion, deletion, and substitution errors with a redundancy rate of 2.4%, while the current minimum redundancy rate is 7%. We thus achieved a logical density of 1.4 bits per nucleotide. Additionally, IEC ensures optimal fidelity during decoding, closely matching the encoded sequences, resulting in a reduction of the number of sequences by 3 orders of magnitude, minimizing computational overhead and runtime complexities, and enhancing decoding efficiency.
Bibliography:ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 23
ISSN:1936-0851
1936-086X
1936-086X
DOI:10.1021/acsnano.5c08183