Individual Homogeneity Learning in Density Data Response Additive Models

In many complex applications, both data heterogeneity and homogeneity are present simultaneously. Overlooking either aspect can lead to misleading statistical inferences. Moreover, the increasing prevalence of complex, non-Euclidean data calls for more sophisticated modeling techniques. To address t...

Full description

Saved in:

Bibliographic Details
Published in	Stats (Basel, Switzerland) Vol. 8; no. 3; p. 71
Main Authors	Han, Zixuan, Li, Tao, You, Jinhong, Balakrishnan, Narayanaswamy
Format	Journal Article
Language	English
Published	Basel MDPI AG 01.09.2025
Subjects	Algorithms Clustering COVID-19 Data analysis Diabetes Electronic data processing Estimation theory heterogeneity hierarchical agglomerative clustering Hilbert space Information management latent group structures Mathematical functions Methods Mortality post-grouping oracle United States United Kingdom > UK United States > US China
Online Access	Get full text
ISSN	2571-905X 2571-905X
DOI	10.3390/stats8030071

Cover

More Information
Summary:	In many complex applications, both data heterogeneity and homogeneity are present simultaneously. Overlooking either aspect can lead to misleading statistical inferences. Moreover, the increasing prevalence of complex, non-Euclidean data calls for more sophisticated modeling techniques. To address these challenges, we propose a density data response additive model, where the response variable is represented by a distributional density function. In this framework, individual effect curves are assumed to be homogeneous within groups but heterogeneous across groups, while covariates that explain variation share common additive bivariate functions. We begin by applying a transformation to map density functions into a linear space. To estimate the unknown subject-specific functions and the additive bivariate components, we adopt a B-spline series approximation method. Latent group structures are uncovered using a hierarchical agglomerative clustering algorithm, which allows our method to recover the true underlying groupings with high probability. To further improve estimation efficiency, we develop refined spline-backfitted local linear estimators for both the grouped structures and the additive bivariate functions in the post-grouping model. We also establish the asymptotic properties of the proposed estimators, including their convergence rates, asymptotic distributions, and post-grouping oracle efficiency. The effectiveness of our method is demonstrated through extensive simulation studies and real-world data analysis, both of which show promising and robust performance.
Bibliography:	ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14
ISSN:	2571-905X 2571-905X
DOI:	10.3390/stats8030071