Microarray Missing Value Imputation: A Regularized Local Learning Method

Microarray experiments on gene expression inevitably generate missing values, which impedes further downstream biological analysis. Therefore, it is key to estimate the missing values accurately. Most of the existing imputation methods tend to suffer from the over-fitting problem. In this study, we...

Full description

Saved in:

Bibliographic Details
Published in	IEEE/ACM transactions on computational biology and bioinformatics Vol. 16; no. 3; pp. 980 - 993
Main Authors	Wang, Aiguo, Chen, Ye, An, Ning, Yang, Jing, Li, Lian, Jiang, Lili
Format	Journal Article
Language	English
Published	United States IEEE 01.05.2019 The Institute of Electrical and Electronics Engineers, Inc. (IEEE)
Subjects	Algorithms Bayes methods Biological analysis Computational Biology - methods Data models DNA microarrays Gene expression Gene Expression Profiling - methods Humans Knowledge based systems Learning Learning systems Least squares Least-Squares Analysis local learning Machine Learning Mathematical model Microarray data missing value imputation Oligonucleotide Array Sequence Analysis - methods Performance measurement Preservation Regularization regularized model similarity measurement Teaching methods Time series
Online Access	Get full text
ISSN	1545-5963 1557-9964 1557-9964
DOI	10.1109/TCBB.2018.2810205

Cover

More Information
Summary:	Microarray experiments on gene expression inevitably generate missing values, which impedes further downstream biological analysis. Therefore, it is key to estimate the missing values accurately. Most of the existing imputation methods tend to suffer from the over-fitting problem. In this study, we propose two regularized local learning methods for microarray missing value imputation. Motivated by the grouping effect of L_{2}L2 regularization, after selecting the target gene, we train an L_{2}L2 Regularized Local Least Squares imputation model (RLLSimpute_L2) on the target gene and its neighbors to estimate the missing values of the target gene. Furthermore, RLLSimpute_L2 imputes the missing values in an ascending order based on the associated missing rate with each target gene. This contributes to fully utilizing the previously estimated values. Besides L_{2}L2, we further explore L_{1}L1 regularization and propose an L_{1}L1 Regularized Local Least Squares imputation model (RLLSimpute_L1). To evaluate their effectiveness, we conducted extensive experimental studies on six benchmark datasets covering both time series and non-time series cases. Nine state-of-the-art imputation methods are compared with RLLSimpute_L2 and RLLSimpute_L1 in terms of three performance metrics. The comparative experimental results indicate that RLLSimpute_L2 outperforms its competitors by achieving smaller imputation errors and better structure preservation of differentially expressed genes.
Bibliography:	ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14 content type line 23
ISSN:	1545-5963 1557-9964 1557-9964
DOI:	10.1109/TCBB.2018.2810205