m6Aminer: Predicting the m6Am Sites on mRNA by Fusing Multiple Sequence-Derived Features into a CatBoost-Based Classifier

As one of the most important post-transcriptional modifications, m6Am plays a fairly important role in conferring mRNA stability and in the progression of cancers. The accurate identification of the m6Am sites is critical for explaining its biological significance and developing its application in t...

Full description

Saved in:
Bibliographic Details
Published inInternational journal of molecular sciences Vol. 24; no. 9; p. 7878
Main Authors Liu, Ze, Lan, Pengfei, Liu, Ting, Liu, Xudong, Liu, Tao
Format Journal Article
LanguageEnglish
Published Switzerland MDPI AG 26.04.2023
MDPI
Subjects
Online AccessGet full text
ISSN1422-0067
1661-6596
1422-0067
DOI10.3390/ijms24097878

Cover

More Information
Summary:As one of the most important post-transcriptional modifications, m6Am plays a fairly important role in conferring mRNA stability and in the progression of cancers. The accurate identification of the m6Am sites is critical for explaining its biological significance and developing its application in the medical field. However, conventional experimental approaches are time-consuming and expensive, making them unsuitable for the large-scale identification of the m6Am sites. To address this challenge, we exploit a CatBoost-based method, m6Aminer, to identify the m6Am sites on mRNA. For feature extraction, nine different feature-encoding schemes (pseudo electron–ion interaction potential, hash decimal conversion method, dinucleotide binary encoding, nucleotide chemical properties, pseudo k-tuple composition, dinucleotide numerical mapping, K monomeric units, series correlation pseudo trinucleotide composition, and K-spaced nucleotide pair frequency) were utilized to form the initial feature space. To obtain the optimized feature subset, the ExtraTreesClassifier algorithm was adopted to perform feature importance ranking, and the top 300 features were selected as the optimal feature subset. With different performance assessment methods, 10-fold cross-validation and independent test, m6Aminer achieved average AUC of 0.913 and 0.754, demonstrating a competitive performance with the state-of-the-art models m6AmPred (0.905 and 0.735) and DLm6Am (0.897 and 0.730). The prediction model developed in this study can be used to identify the m6Am sites in the whole transcriptome, laying a foundation for the functional research of m6Am.
Bibliography:ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 14
content type line 23
These authors contributed equally to this work.
ISSN:1422-0067
1661-6596
1422-0067
DOI:10.3390/ijms24097878