mb-PHENIX: diffusion and supervised uniform manifold approximation for denoizing microbiota data

Microbiota data encounters challenges arising from technical noise and the curse of dimensionality, which affect the reliability of scientific findings. Furthermore, abundance matrices exhibit a zero-inflated distribution due to biological and technical influences. Consequently, there is a growing d...

Full description

Saved in:
Bibliographic Details
Published inBioinformatics (Oxford, England) Vol. 39; no. 12
Main Authors Padron-Manrique, Cristian, Vázquez-Jiménez, Aarón, Esquivel-Hernandez, Diego Armando, Martinez Lopez, Yoscelina Estrella, Neri-Rosario, Daniel, Sánchez-Castañeda, Jean Paul, Giron-Villalobos, David, Resendis-Antonio, Osbaldo
Format Journal Article
LanguageEnglish
Published England Oxford Publishing Limited (England) 01.12.2023
Subjects
Online AccessGet full text
ISSN1367-4811
1367-4803
1367-4811
DOI10.1093/bioinformatics/btad706

Cover

More Information
Summary:Microbiota data encounters challenges arising from technical noise and the curse of dimensionality, which affect the reliability of scientific findings. Furthermore, abundance matrices exhibit a zero-inflated distribution due to biological and technical influences. Consequently, there is a growing demand for advanced algorithms that can effectively recover missing taxa while also considering the preservation of data structure. We present mb-PHENIX, an open-source algorithm developed in Python that recovers taxa abundances from the noisy and sparse microbiota data. Our method infers the missing information of count matrix (in 16S microbiota and shotgun studies) by applying imputation via diffusion with supervised Uniform Manifold Approximation Projection (sUMAP) space as initialization. Our hybrid machine learning approach allows to denoise microbiota data, revealing differential abundance microbes among study groups where traditional abundance analysis fails. The mb-PHENIX algorithm is available at https://github.com/resendislab/mb-PHENIX. An easy-to-use implementation is available on Google Colab (see GitHub).
Bibliography:SourceType-Scholarly Journals-1
content type line 14
ObjectType-Report-1
ObjectType-Article-1
ObjectType-Feature-2
content type line 23
ISSN:1367-4811
1367-4803
1367-4811
DOI:10.1093/bioinformatics/btad706