DNApi: A De Novo Adapter Prediction Algorithm for Small RNA Sequencing Data

With the rapid accumulation of publicly available small RNA sequencing datasets, third-party meta-analysis across many datasets is becoming increasingly powerful. Although removing the 3´ adapter is an essential step for small RNA sequencing analysis, the adapter sequence information is not always a...

Full description

Saved in:
Bibliographic Details
Published inPloS one Vol. 11; no. 10; p. e0164228
Main Authors Tsuji, Junko, Weng, Zhiping
Format Journal Article
LanguageEnglish
Published United States Public Library of Science 13.10.2016
Public Library of Science (PLoS)
Subjects
Online AccessGet full text
ISSN1932-6203
1932-6203
DOI10.1371/journal.pone.0164228

Cover

More Information
Summary:With the rapid accumulation of publicly available small RNA sequencing datasets, third-party meta-analysis across many datasets is becoming increasingly powerful. Although removing the 3´ adapter is an essential step for small RNA sequencing analysis, the adapter sequence information is not always available in the metadata. The information can be also erroneous even when it is available. In this study, we developed DNApi, a lightweight Python software package that predicts the 3´ adapter sequence de novo and provides the user with cleansed small RNA sequences ready for down stream analysis. Tested on 539 publicly available small RNA libraries accompanied with 3´ adapter sequences in their metadata, DNApi shows near-perfect accuracy (98.5%) with fast runtime (~2.85 seconds per library) and efficient memory usage (~43 MB on average). In addition to 3´ adapter prediction, it is also important to classify whether the input small RNA libraries were already processed, i.e. the 3´ adapters were removed. DNApi perfectly judged that given another batch of datasets, 192 publicly available processed libraries were "ready-to-map" small RNA sequence. DNApi is compatible with Python 2 and 3, and is available at https://github.com/jnktsj/DNApi. The 731 small RNA libraries used for DNApi evaluation were from human tissues and were carefully and manually collected. This study also provides readers with the curated datasets that can be integrated into their studies.
Bibliography:ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 14
content type line 23
Competing Interests: The authors have declared that no competing interests exist.
Conceptualization: JT ZW.Data curation: JT.Formal analysis: JT.Funding acquisition: ZW.Investigation: JT.Methodology: JT.Project administration: ZW.Resources: ZW.Software: JT.Supervision: ZW.Validation: JT.Visualization: JT ZW.Writing – original draft: JT ZW.Writing – review & editing: JT ZW.
ISSN:1932-6203
1932-6203
DOI:10.1371/journal.pone.0164228