Integrated Error Correction to Enhance Efficiency of Digital Data Storage Based on DNA Nanostructures
Synthetic DNA is a durable, high-density information storage platform based on DNA nanostructures. However, errors during DNA reading pose challenges to data integrity. Conventional error-correcting codes add redundancy during encoding to ensure data integrity, thereby reducing storage density and i...
Saved in:
| Published in | ACS nano Vol. 19; no. 32; pp. 29543 - 29553 |
|---|---|
| Main Authors | , , , , , , , |
| Format | Journal Article |
| Language | English |
| Published |
United States
American Chemical Society
19.08.2025
|
| Subjects | |
| Online Access | Get full text |
| ISSN | 1936-0851 1936-086X 1936-086X |
| DOI | 10.1021/acsnano.5c08183 |
Cover
| Summary: | Synthetic DNA is a durable, high-density information storage platform based on DNA nanostructures. However, errors during DNA reading pose challenges to data integrity. Conventional error-correcting codes add redundancy during encoding to ensure data integrity, thereby reducing storage density and increasing costs. Here, we present an integrated error correction (IEC) algorithm that synergistically combines three enhanced mechanisms: the “head–tail” region Levenshtein distance for error-tolerant clustering (10× faster); sliding window-optimized Hamming distance for error detection and correction of insertions and deletions without length constraints; and score-weighted majority voting for optimal sequence selection (2% higher accuracy), collectively enhancing storage density and decoding efficiency. We confirmed the effectiveness of IEC by recovering medical data encoded in DNA with errors. With IEC, we can simultaneously correct insertion, deletion, and substitution errors with a redundancy rate of 2.4%, while the current minimum redundancy rate is 7%. We thus achieved a logical density of 1.4 bits per nucleotide. Additionally, IEC ensures optimal fidelity during decoding, closely matching the encoded sequences, resulting in a reduction of the number of sequences by 3 orders of magnitude, minimizing computational overhead and runtime complexities, and enhancing decoding efficiency. |
|---|---|
| Bibliography: | ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 23 |
| ISSN: | 1936-0851 1936-086X 1936-086X |
| DOI: | 10.1021/acsnano.5c08183 |