Evaluating the effectiveness of ensemble voting in improving the accuracy of consensus signals produced by various DTWA algorithms from step-current signals generated during nanopore sequencing
Nanopore sequencing device analysis systems simultaneously generate multiple picoamperage current signals representing the passage of DNA or RNA nucleotides ratcheted through a biomolecule nanopore array by motor proteins. Squiggles are a noisy and time-distorted representation of an underlying nucl...
Saved in:
| Published in | PLoS computational biology Vol. 17; no. 9; p. e1009350 |
|---|---|
| Main Authors | , , , |
| Format | Journal Article |
| Language | English |
| Published |
United States
Public Library of Science
01.09.2021
Public Library of Science (PLoS) |
| Subjects | |
| Online Access | Get full text |
| ISSN | 1553-7358 1553-734X 1553-7358 |
| DOI | 10.1371/journal.pcbi.1009350 |
Cover
| Summary: | Nanopore sequencing device analysis systems simultaneously generate multiple picoamperage current signals representing the passage of DNA or RNA nucleotides ratcheted through a biomolecule nanopore array by motor proteins. Squiggles are a noisy and time-distorted representation of an underlying nucleotide sequence, “gold standard model”, due to experimental and algorithmic artefacts. Other research fields use dynamic time warped
-
space averaging
(DTWA)
algorithms to produce a consensus signal from multiple time-warped sources while preserving key features distorted by standard, linear-averaging approaches. We compared the ability of
DTW
Barycentre averaging (
DBA
), minimize mean (
MM
) and stochastic sub-gradient descent (
SSG) DTWA
algorithms to generate a consensus signal from squiggle-space ensembles of RNA molecules
Enolase
,
Sequin R1-71-1
and
Sequin R2-55-3
without knowledge of their associated gold standard model. We propose techniques to identify the leader and distorted squiggle features prior to
DTWA
consensus generation. New visualization and warping-path metrics are introduced to compare consensus signals and the best estimate of the “true” consensus, the study’s gold standard model. The
DBA
consensus was the best match to the gold standard for both
Sequin
studies but was outperformed in the
Enolase
study. Given an underlying common characteristic across a squiggle ensemble, we objectively evaluate a novel “voting scheme” that improves the local similarity between the consensus signal and a given fraction of the squiggle ensemble. While the gold standard is not used during voting, the increase in the match of the final voted-on consensus to the underlying
Enolase
and
Sequin
gold standard sequences provides an indirect success measure for the proposed voting procedure in two ways: First is the decreased least squares warped distance between the final consensus and the gold model, and second, the voting generates a final consensus length closer to the known underlying RNA biomolecule length. The results suggest considerable potential in marrying squiggle analysis and
voted-on DTWA
consensus signals to provide low-noise, low-distortion signals. This will lead to improved accuracy in detecting nucleotides and their deviation model due to chemical modifications (a.k.a. epigenetic information). The proposed combination of ensemble voting and
DTWA
has application in other research fields involving time-distorted, high entropy signals. |
|---|---|
| Bibliography: | new_version ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14 ObjectType-Article-2 ObjectType-Undefined-1 ObjectType-Feature-3 content type line 23 RC and MK also contributed equally to this work. The authors have declared that no competing interests exist. |
| ISSN: | 1553-7358 1553-734X 1553-7358 |
| DOI: | 10.1371/journal.pcbi.1009350 |