Penalized joint generalized estimating equations for longitudinal binary data

In statistical research, variable selection and feature extraction are a typical issue. Variable selection in linear models has been fully developed, while it has received relatively little attention for longitudinal data. Since a longitudinal study involves within‐subject correlations, the likeliho...

Full description

Saved in:
Bibliographic Details
Published inBiometrical journal Vol. 64; no. 1; pp. 57 - 73
Main Authors Huang, Youjun, Pan, Jianxin
Format Journal Article
LanguageEnglish
Published Germany Wiley - VCH Verlag GmbH & Co. KGaA 01.01.2022
Subjects
Online AccessGet full text
ISSN0323-3847
1521-4036
1521-4036
DOI10.1002/bimj.202000336

Cover

More Information
Summary:In statistical research, variable selection and feature extraction are a typical issue. Variable selection in linear models has been fully developed, while it has received relatively little attention for longitudinal data. Since a longitudinal study involves within‐subject correlations, the likelihood function of discrete longitudinal responses generally cannot be expressed in analytically closed form, and standard variable selection methods cannot be directly applied. As an alternative, the penalized generalized estimating equation (PGEE) is helpful but very likely results in incorrect variable selection if the working correlation matrix is misspecified. In many circumstances, the within‐subject correlations are of interest and need to be modeled together with the mean. For longitudinal binary data, it becomes more challenging because the within‐subject correlation coefficients have the so‐called Fréchet–Hoeffding upper bound. In this paper, we proposed smoothly clipped absolute deviation (SCAD)‐based and least absolute shrinkage and selection operator (LASSO)‐based penalized joint generalized estimating equation (PJGEE) methods to simultaneously model the mean and correlations for longitudinal binary data, together with variable selection in the mean model. The estimated correlation coefficients satisfy the upper bound constraints. Simulation studies under different scenarios are made to assess the performance of the proposed method. Compared to existing PGEE methods that specify a working correlation matrix for longitudinal binary data, the proposed PJGEE method works much better in terms of variable selection consistency and parameter estimation accuracy. A real data set on Clinical Global Impression is analyzed for illustration.
Bibliography:ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 14
content type line 23
ISSN:0323-3847
1521-4036
1521-4036
DOI:10.1002/bimj.202000336