Achievable Rates of Nanopore-Based DNA Storage
This paper studies achievable rates of nanopore-based DNA storage when nanopore signals are decoded using a tractable channel model that does not rely on a basecalling algorithm. Specifically, the noisy nanopore channel (NNC) with the Scrappie pore model generates average output levels via i.i.d. ge...
        Saved in:
      
    
          | Published in | IEEE journal on selected areas in information theory Vol. 6; pp. 261 - 269 | 
|---|---|
| Main Authors | , | 
| Format | Journal Article | 
| Language | English | 
| Published | 
        Piscataway
          IEEE
    
        2025
     The Institute of Electrical and Electronics Engineers, Inc. (IEEE)  | 
| Subjects | |
| Online Access | Get full text | 
| ISSN | 2641-8770 2641-8770  | 
| DOI | 10.1109/JSAIT.2025.3598756 | 
Cover
| Abstract | This paper studies achievable rates of nanopore-based DNA storage when nanopore signals are decoded using a tractable channel model that does not rely on a basecalling algorithm. Specifically, the noisy nanopore channel (NNC) with the Scrappie pore model generates average output levels via i.i.d. geometric sample duplications corrupted by i.i.d. Gaussian noise (NNC-Scrappie). Simplified message passing algorithms are derived for efficient soft decoding of nanopore signals using NNC-Scrappie. Previously, evaluation of this channel model was limited by the lack of DNA storage datasets with nanopore signals included. This is solved by deriving an achievable rate based on the dynamic time-warping (DTW) algorithm that can be applied to genomic sequencing datasets subject to constraints that make the resulting rate applicable to DNA storage. Using a publicly-available dataset from Oxford Nanopore Technologies (ONT), it is demonstrated that coding over multiple DNA strands of 100 bases in length and decoding with the NNC-Scrappie decoder can achieve rates of at least <inline-formula> <tex-math notation="LaTeX">0.64-1.18 </tex-math></inline-formula> bits per base, depending on the channel quality of the nanopore that is chosen in the sequencing device per channel-use, and 0.96 bits per base on average assuming uniformly chosen nanopores. These rates are pessimistic since they only apply to single reads and do not include calibration of the pore model to specific nanopores. | 
    
|---|---|
| AbstractList | This paper studies achievable rates of nanopore-based DNA storage when nanopore signals are decoded using a tractable channel model that does not rely on a basecalling algorithm. Specifically, the noisy nanopore channel (NNC) with the Scrappie pore model generates average output levels via i.i.d. geometric sample duplications corrupted by i.i.d. Gaussian noise (NNC-Scrappie). Simplified message passing algorithms are derived for efficient soft decoding of nanopore signals using NNC-Scrappie. Previously, evaluation of this channel model was limited by the lack of DNA storage datasets with nanopore signals included. This is solved by deriving an achievable rate based on the dynamic time-warping (DTW) algorithm that can be applied to genomic sequencing datasets subject to constraints that make the resulting rate applicable to DNA storage. Using a publicly-available dataset from Oxford Nanopore Technologies (ONT), it is demonstrated that coding over multiple DNA strands of 100 bases in length and decoding with the NNC-Scrappie decoder can achieve rates of at least [Formula Omitted] bits per base, depending on the channel quality of the nanopore that is chosen in the sequencing device per channel-use, and 0.96 bits per base on average assuming uniformly chosen nanopores. These rates are pessimistic since they only apply to single reads and do not include calibration of the pore model to specific nanopores. This paper studies achievable rates of nanopore-based DNA storage when nanopore signals are decoded using a tractable channel model that does not rely on a basecalling algorithm. Specifically, the noisy nanopore channel (NNC) with the Scrappie pore model generates average output levels via i.i.d. geometric sample duplications corrupted by i.i.d. Gaussian noise (NNC-Scrappie). Simplified message passing algorithms are derived for efficient soft decoding of nanopore signals using NNC-Scrappie. Previously, evaluation of this channel model was limited by the lack of DNA storage datasets with nanopore signals included. This is solved by deriving an achievable rate based on the dynamic time-warping (DTW) algorithm that can be applied to genomic sequencing datasets subject to constraints that make the resulting rate applicable to DNA storage. Using a publicly-available dataset from Oxford Nanopore Technologies (ONT), it is demonstrated that coding over multiple DNA strands of 100 bases in length and decoding with the NNC-Scrappie decoder can achieve rates of at least <inline-formula> <tex-math notation="LaTeX">0.64-1.18 </tex-math></inline-formula> bits per base, depending on the channel quality of the nanopore that is chosen in the sequencing device per channel-use, and 0.96 bits per base on average assuming uniformly chosen nanopores. These rates are pessimistic since they only apply to single reads and do not include calibration of the pore model to specific nanopores.  | 
    
| Author | Viterbo, Emanuele McBain, Brendon  | 
    
| Author_xml | – sequence: 1 givenname: Brendon orcidid: 0000-0002-1073-2948 surname: McBain fullname: McBain, Brendon email: brendon.mcbain@monash.edu organization: ECSE Department, Monash University, Melbourne, VIC, Australia – sequence: 2 givenname: Emanuele orcidid: 0000-0002-5861-2873 surname: Viterbo fullname: Viterbo, Emanuele email: emanuele.viterbo@monash.edu organization: ECSE Department, Monash University, Melbourne, VIC, Australia  | 
    
| BookMark | eNpNkEtLw0AUhQepYNX-AXERcJ1455WZWcb6qpQKtq6HSXKjLTVTZ1LBf29qC7q6Z3G-c-E7JYPWt0jIBYWMUjDXT_NissgYMJlxabSS-REZslzQVCsFg3_5hIxiXAEAY1QorYYkK6r3JX65co3Ji-swJr5JZq71Gx8wvXER6-R2ViTzzgf3hufkuHHriKPDPSOv93eL8WM6fX6YjItpWjFhulQKrZiuKqhLXjvBa1ZyCsBBNgINpTovBVc548KwBqXKRW3A9LnEvNFS8zNytd_dBP-5xdjZld-Gtn9pORNKaGYE7Vts36qCjzFgYzdh-eHCt6Vgd2rsrxq7U2MPanrocg8tEfEPoJQJqYD_AAw6Xek | 
    
| CODEN | IJSTL5 | 
    
| Cites_doi | 10.1109/ISIT57864.2024.10619277 10.1109/ISTC49272.2021.9594243 10.1109/ISIT57864.2024.10619598 10.1109/ICC45041.2023.10279497 10.1109/ISIT45174.2021.9517755 10.1109/TIT.2024.3404002 10.1109/ISIT54713.2023.10206948 10.1109/jsait.2025.3598773 10.1109/ITW55543.2023.10161642 10.1109/TIT.2018.2809001 10.1109/TSP.2025.3570417 10.1038/nbt.2950 10.1038/s41598-017-05188-1 10.1109/ISIT50566.2022.9834633 10.1109/MBITS.2024.3355883 10.1109/ISTC57237.2023.10273510 10.1109/TNB.2024.3350001 10.1109/LSP.2001.838216 10.1109/TMBMC.2024.3403488 10.1109/TCOMM.2024.3367748 10.1109/ISIT57864.2024.10619201 10.1109/ACCESS.2023.3278975 10.1109/LCOMM.2020.3029071 10.1109/ISIT54713.2023.10206475 10.1016/j.patcog.2021.107895  | 
    
| ContentType | Journal Article | 
    
| Copyright | Copyright The Institute of Electrical and Electronics Engineers, Inc. (IEEE) 2025 | 
    
| Copyright_xml | – notice: Copyright The Institute of Electrical and Electronics Engineers, Inc. (IEEE) 2025 | 
    
| DBID | 97E RIA RIE AAYXX CITATION 7SC 7SP 8FD JQ2 L7M L~C L~D  | 
    
| DOI | 10.1109/JSAIT.2025.3598756 | 
    
| DatabaseName | IEEE All-Society Periodicals Package (ASPP) 2005–Present IEEE All-Society Periodicals Package (ASPP) 1998–Present IEEE Electronic Library (IEL) CrossRef Computer and Information Systems Abstracts Electronics & Communications Abstracts Technology Research Database ProQuest Computer Science Collection Advanced Technologies Database with Aerospace Computer and Information Systems Abstracts  Academic Computer and Information Systems Abstracts Professional  | 
    
| DatabaseTitle | CrossRef Technology Research Database Computer and Information Systems Abstracts – Academic Electronics & Communications Abstracts ProQuest Computer Science Collection Computer and Information Systems Abstracts Advanced Technologies Database with Aerospace Computer and Information Systems Abstracts Professional  | 
    
| DatabaseTitleList | Technology Research Database | 
    
| Database_xml | – sequence: 1 dbid: RIE name: IEEE Electronic Library (IEL) url: https://proxy.k.utb.cz/login?url=https://ieeexplore.ieee.org/ sourceTypes: Publisher  | 
    
| DeliveryMethod | fulltext_linktorsrc | 
    
| Discipline | Computer Science | 
    
| EISSN | 2641-8770 | 
    
| EndPage | 269 | 
    
| ExternalDocumentID | 10_1109_JSAIT_2025_3598756 11124570  | 
    
| Genre | orig-research | 
    
| GroupedDBID | 0R~ 97E AAJGR AASAJ AAWTH ABAZT ABJNI ABQJQ ABVLG AGQYO AHBIQ AKJIK AKQYR ALMA_UNASSIGNED_HOLDINGS ATWAV BEFXN BFFAM BGNUA BKEBE BPEOZ EBS EJD IFIPE JAVBF OCL RIA RIE AAYXX CITATION 7SC 7SP 8FD JQ2 L7M L~C L~D  | 
    
| ID | FETCH-LOGICAL-c249t-548728cc0db3da43d2b3100305f4e91186b437623492fe5764d90992fbe6f8583 | 
    
| IEDL.DBID | RIE | 
    
| ISSN | 2641-8770 | 
    
| IngestDate | Wed Oct 08 14:10:38 EDT 2025 Wed Oct 01 05:35:32 EDT 2025 Wed Sep 10 07:40:58 EDT 2025  | 
    
| IsPeerReviewed | true | 
    
| IsScholarly | true | 
    
| Language | English | 
    
| License | https://ieeexplore.ieee.org/Xplorehelp/downloads/license-information/IEEE.html https://doi.org/10.15223/policy-029 https://doi.org/10.15223/policy-037  | 
    
| LinkModel | DirectLink | 
    
| MergedId | FETCHMERGED-LOGICAL-c249t-548728cc0db3da43d2b3100305f4e91186b437623492fe5764d90992fbe6f8583 | 
    
| Notes | ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14  | 
    
| ORCID | 0000-0002-1073-2948 0000-0002-5861-2873  | 
    
| PQID | 3247482941 | 
    
| PQPubID | 5075791 | 
    
| PageCount | 9 | 
    
| ParticipantIDs | proquest_journals_3247482941 ieee_primary_11124570 crossref_primary_10_1109_JSAIT_2025_3598756  | 
    
| ProviderPackageCode | CITATION AAYXX  | 
    
| PublicationCentury | 2000 | 
    
| PublicationDate | 20250000 2025-00-00 20250101  | 
    
| PublicationDateYYYYMMDD | 2025-01-01 | 
    
| PublicationDate_xml | – year: 2025 text: 20250000  | 
    
| PublicationDecade | 2020 | 
    
| PublicationPlace | Piscataway | 
    
| PublicationPlace_xml | – name: Piscataway | 
    
| PublicationTitle | IEEE journal on selected areas in information theory | 
    
| PublicationTitleAbbrev | JSAIT | 
    
| PublicationYear | 2025 | 
    
| Publisher | IEEE The Institute of Electrical and Electronics Engineers, Inc. (IEEE)  | 
    
| Publisher_xml | – name: IEEE – name: The Institute of Electrical and Electronics Engineers, Inc. (IEEE)  | 
    
| References | ref13 ref12 ref14 (ref23) 2025 ref11 ref10 ref2 ref1 ref17 ref16 ref19 ref18 ref24 ref26 ref25 ref20 (ref15) 2017 ref22 ref28 ref27 ref8 ref7 McBain (ref21) 2025 ref9 ref4 ref3 ref6 ref5  | 
    
| References_xml | – ident: ref10 doi: 10.1109/ISIT57864.2024.10619277 – ident: ref6 doi: 10.1109/ISTC49272.2021.9594243 – ident: ref11 doi: 10.1109/ISIT57864.2024.10619598 – volume-title: Oxford Nanopore Technologies Benchmark Datasets year: 2025 ident: ref23 – ident: ref17 doi: 10.1109/ICC45041.2023.10279497 – ident: ref13 doi: 10.1109/ISIT45174.2021.9517755 – ident: ref9 doi: 10.1109/TIT.2024.3404002 – ident: ref20 doi: 10.1109/ISIT54713.2023.10206948 – ident: ref16 doi: 10.1109/jsait.2025.3598773 – volume-title: Scrappie: A Technology Demonstrator for the Oxford Nanopore Research Algorithms Group year: 2017 ident: ref15 – ident: ref12 doi: 10.1109/ITW55543.2023.10161642 – ident: ref5 doi: 10.1109/TIT.2018.2809001 – ident: ref26 doi: 10.1109/TSP.2025.3570417 – ident: ref4 doi: 10.1038/nbt.2950 – year: 2025 ident: ref21 article-title: Coding synthetic DNA for nanopore sequencing – ident: ref27 doi: 10.1038/s41598-017-05188-1 – ident: ref8 doi: 10.1109/ISIT50566.2022.9834633 – ident: ref3 doi: 10.1109/MBITS.2024.3355883 – ident: ref28 doi: 10.1109/ISTC57237.2023.10273510 – ident: ref19 doi: 10.1109/TNB.2024.3350001 – ident: ref25 doi: 10.1109/LSP.2001.838216 – ident: ref2 doi: 10.1109/TMBMC.2024.3403488 – ident: ref1 doi: 10.1109/TCOMM.2024.3367748 – ident: ref14 doi: 10.1109/ISIT57864.2024.10619201 – ident: ref7 doi: 10.1109/ACCESS.2023.3278975 – ident: ref22 doi: 10.1109/LCOMM.2020.3029071 – ident: ref18 doi: 10.1109/ISIT54713.2023.10206475 – ident: ref24 doi: 10.1016/j.patcog.2021.107895  | 
    
| SSID | ssj0002214787 | 
    
| Score | 2.2816415 | 
    
| Snippet | This paper studies achievable rates of nanopore-based DNA storage when nanopore signals are decoded using a tractable channel model that does not rely on a... | 
    
| SourceID | proquest crossref ieee  | 
    
| SourceType | Aggregation Database Index Database Publisher  | 
    
| StartPage | 261 | 
    
| SubjectTerms | Accuracy Algorithms Backtracking Channel models Datasets Decoding DNA Encoding information rates mathematical models Message passing nanobioscience nanopores Noise Noise measurement Random noise Sequential analysis Vectors  | 
    
| Title | Achievable Rates of Nanopore-Based DNA Storage | 
    
| URI | https://ieeexplore.ieee.org/document/11124570 https://www.proquest.com/docview/3247482941  | 
    
| Volume | 6 | 
    
| hasFullText | 1 | 
    
| inHoldings | 1 | 
    
| isFullTextHit | |
| isPrint | |
| journalDatabaseRights | – providerCode: PRVIEE databaseName: IEEE Electronic Library (IEL) customDbUrl: eissn: 2641-8770 dateEnd: 99991231 omitProxy: false ssIdentifier: ssj0002214787 issn: 2641-8770 databaseCode: RIE dateStart: 20200101 isFulltext: true titleUrlDefault: https://ieeexplore.ieee.org/ providerName: IEEE  | 
    
| link | http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV07T8MwELagEwvlUUShoAxsKGnqOLE9hkdVOnSgrdQtSuyzQEgNgpSBX8_ZSQCBkNgyJJbl89193-UehFzEIsojrhRKQDOfmUT70vDYNwaxsoxkLMFlW8ySyZJNV_GqKVZ3tTAA4JLPILCP7l--LtXGhsqGqJeUxRwZ-jYXSV2s9RlQoXbijuBtYUwoh9N5erdACkjjwPap43ZI9Tfn46ap_DLBzq-Mu2TW7qhOJ3kKNlURqPcfzRr_veU9stsgTC-tr8Q-2YL1Aem20xu8RpkPSZCqh0d4s6VT3r1FnF5pPDS2JSJy8K_Qu2nvZpZ6c2TlaHR6ZDm-XVxP_GZ6gq-QUlW-pSJUKBXqItI5izQtbDAf9dswQBMnkoKhdaG2PaEBpB1MS4SL1BSQGIEyPCKddbmGY-JJyQG9nYkZkjelIGc5HUGYx2BErgXtk8v2WLPnuklG5shFKDMnhMwKIWuE0Cc9e05fbzZH1CeDVhRZo0ivGeI9zgSVbHTyx2enZMeuXodFBqRTvWzgDIFCVZy7C_IB2xu3sg | 
    
| linkProvider | IEEE | 
    
| linkToHtml | http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV07T8MwELZQGWChPIooFMjAhpKmjt3EY3ipLSUDbaVuUWKfBUJqEaQM_HrOTgIIhMSWIZEtn-_u-y73IOSMR0EWhFKiBBRzme4rV-iQu1ojVhaB4AJstkXSH8zYaM7nVbG6rYUBAJt8Bp55tP_y1VKuTKisi3pJGQ-Roa9zxhgvy7U-QyrUzNyJwro0xhfd0SQeTpEEUu6ZTnWhGVP9zf3YeSq_jLD1LDdNktR7KhNKnrxVkXvy_Ue7xn9veptsVRjTictLsUPWYLFLmvX8BqdS5z3ixfLhEd5M8ZRzbzCns9QOmtslYnJwL9C_KecqiZ0J8nI0Oy0yu7meXg7can6CK5FUFa4hIzSS0ld5oDIWKJqbcD5quGaARi7q5wztCzUNCjUg8WBKIGCkOoe-jlCK-6SxWC7ggDhChID-TnOG9E1KyFhGe-BnHHSUqYi2yXl9rOlz2SYjtfTCF6kVQmqEkFZCaJOWOaevN6sjapNOLYq0UqXXFBFfyCIqWO_wj89OycZgejdOx8Pk9ohsmpXKIEmHNIqXFRwjbCjyE3tZPgCsXbr_ | 
    
| openUrl | ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Achievable+Rates+of+Nanopore-Based+DNA+Storage&rft.jtitle=IEEE+journal+on+selected+areas+in+information+theory&rft.au=McBain%2C+Brendon&rft.au=Viterbo%2C+Emanuele&rft.date=2025&rft.pub=IEEE&rft.eissn=2641-8770&rft.volume=6&rft.spage=261&rft.epage=269&rft_id=info:doi/10.1109%2FJSAIT.2025.3598756&rft.externalDocID=11124570 | 
    
| thumbnail_l | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=2641-8770&client=summon | 
    
| thumbnail_m | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=2641-8770&client=summon | 
    
| thumbnail_s | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=2641-8770&client=summon |