An imputation–regularized optimization algorithm for high dimensional missing data problems and beyond

Missing data are frequently encountered in high dimensional problems, but they are usually difficult to deal with by using standard algorithms, such as the expectation–maximization algorithm and its variants. To tackle this difficulty, some problem-specific algorithms have been developed in the lite...

Full description

Saved in:

Bibliographic Details
Published in	Journal of the Royal Statistical Society. Series B, Statistical methodology Vol. 80; no. 5; pp. 899 - 926
Main Authors	Liang, Faming, Jia, Bochao, Xue, Jingnan, Li, Qizhai, Luo, Ye
Format	Journal Article
Language	English
Published	England Wiley 01.11.2018 Oxford University Press
Subjects	Algorithms Divergence equations Expectation–maximization algorithm Gaussian graphical model Gibbs sampler Graphical models Imputation consistency Mathematical models Missing data Optimization Parameter estimation Random‐coefficient model Regression analysis Regularization Statistical methods Statistics Variable selection Variants Gibbs sampler Gaussian graphical model Expectation-maximization algorithm Random-coefficient model Imputation consistency Variable selection
Online Access	Get full text
ISSN	1369-7412 1467-9868 1467-9868
DOI	10.1111/rssb.12279

Cover

More Information
Summary:	Missing data are frequently encountered in high dimensional problems, but they are usually difficult to deal with by using standard algorithms, such as the expectation–maximization algorithm and its variants. To tackle this difficulty, some problem-specific algorithms have been developed in the literature, but there still lacks a general algorithm. This work is to fill the gap:we propose a general algorithm for high dimensional missing data problems. The algorithm works by iterating between an imputation step and a regularized optimization step. At the imputation step, the missing data are imputed conditionally on the observed data and the current estimates of parameters and, at the regularized optimization step, a consistent estimate is found via the regularization approach for the minimizer of a Kullback–Leibler divergence defined on the pseudocomplete data. For high dimensional problems, the consistent estimate can be found under sparsity constraints. The consistency of the averaged estimate for the true parameter can be established under quite general conditions. The algorithm is illustrated by using high dimensional Gaussian graphical models, high dimensional variable selection and a random-coefficient model.
Bibliography:	ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14 content type line 23
ISSN:	1369-7412 1467-9868 1467-9868
DOI:	10.1111/rssb.12279