Integrated Error Correction to Enhance Efficiency of Digital Data Storage Based on DNA Nanostructures

Synthetic DNA is a durable, high-density information storage platform based on DNA nanostructures. However, errors during DNA reading pose challenges to data integrity. Conventional error-correcting codes add redundancy during encoding to ensure data integrity, thereby reducing storage density and i...

Full description

Saved in:

Bibliographic Details
Published in	ACS nano Vol. 19; no. 32; pp. 29543 - 29553
Main Authors	Mao, Cuiping, Zheng, Shuo, Huang, Zhihao, Wang, Dou, Zhuang, Yufan, Zhang, Jiangjiang, Wang, Rui, Jiang, Xingyu
Format	Journal Article
Language	English
Published	United States American Chemical Society 19.08.2025
Subjects	Algorithms DNA - chemistry DNA - genetics Information Storage and Retrieval - methods Nanostructures - chemistry High-Density Data Storage Error Correction Health And Medical Data Synergistic Mechanism DNA Storage
Online Access	Get full text
ISSN	1936-0851 1936-086X 1936-086X
DOI	10.1021/acsnano.5c08183

Cover

More Information
Summary:	Synthetic DNA is a durable, high-density information storage platform based on DNA nanostructures. However, errors during DNA reading pose challenges to data integrity. Conventional error-correcting codes add redundancy during encoding to ensure data integrity, thereby reducing storage density and increasing costs. Here, we present an integrated error correction (IEC) algorithm that synergistically combines three enhanced mechanisms: the “head–tail” region Levenshtein distance for error-tolerant clustering (10× faster); sliding window-optimized Hamming distance for error detection and correction of insertions and deletions without length constraints; and score-weighted majority voting for optimal sequence selection (2% higher accuracy), collectively enhancing storage density and decoding efficiency. We confirmed the effectiveness of IEC by recovering medical data encoded in DNA with errors. With IEC, we can simultaneously correct insertion, deletion, and substitution errors with a redundancy rate of 2.4%, while the current minimum redundancy rate is 7%. We thus achieved a logical density of 1.4 bits per nucleotide. Additionally, IEC ensures optimal fidelity during decoding, closely matching the encoded sequences, resulting in a reduction of the number of sequences by 3 orders of magnitude, minimizing computational overhead and runtime complexities, and enhancing decoding efficiency.
Bibliography:	ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 23
ISSN:	1936-0851 1936-086X 1936-086X
DOI:	10.1021/acsnano.5c08183