Functional linear models for zero-inflated count data with application to modeling hospitalizations in patients on dialysis

We propose functional linear models for zero‐inflated count data with a focus on the functional hurdle and functional zero‐inflated Poisson (ZIP) models. Although the hurdle model assumes the counts come from a mixture of a degenerate distribution at zero and a zero‐truncated Poisson distribution, t...

Full description

Saved in:
Bibliographic Details
Published inStatistics in medicine Vol. 33; no. 27; pp. 4825 - 4840
Main Authors Şentürk, Damla, Dalrymple, Lorien S., Nguyen, Danh V.
Format Journal Article
LanguageEnglish
Published England Blackwell Publishing Ltd 30.11.2014
Wiley Subscription Services, Inc
Subjects
Online AccessGet full text
ISSN0277-6715
1097-0258
1097-0258
DOI10.1002/sim.6241

Cover

More Information
Summary:We propose functional linear models for zero‐inflated count data with a focus on the functional hurdle and functional zero‐inflated Poisson (ZIP) models. Although the hurdle model assumes the counts come from a mixture of a degenerate distribution at zero and a zero‐truncated Poisson distribution, the ZIP model considers a mixture of a degenerate distribution at zero and a standard Poisson distribution. We extend the generalized functional linear model framework with a functional predictor and multiple cross‐sectional predictors to model counts generated by a mixture distribution. We propose an estimation procedure for functional hurdle and ZIP models, called penalized reconstruction, geared towards error‐prone and sparsely observed longitudinal functional predictors. The approach relies on dimension reduction and pooling of information across subjects involving basis expansions and penalized maximum likelihood techniques. The developed functional hurdle model is applied to modeling hospitalizations within the first 2 years from initiation of dialysis, with a high percentage of zeros, in the Comprehensive Dialysis Study participants. Hospitalization counts are modeled as a function of sparse longitudinal measurements of serum albumin concentrations, patient demographics, and comorbidities. Simulation studies are used to study finite sample properties of the proposed method and include comparisons with an adaptation of standard principal components regression. Copyright © 2014 John Wiley & Sons, Ltd.
Bibliography:Supporting info itemSupporting info item
ark:/67375/WNG-63GVS9JK-J
ArticleID:SIM6241
istex:18B3A9451097516374A49B0758FFCB39874640CC
SourceType-Scholarly Journals-1
ObjectType-Feature-1
content type line 14
ObjectType-Article-1
ObjectType-Feature-2
content type line 23
dsenturk@ucla.edu
ISSN:0277-6715
1097-0258
1097-0258
DOI:10.1002/sim.6241