M-LDQ feature embedding and regression modeling for distribution-valued data

With the improving capacity to collect massive amounts of data, distribution-valued data are increasingly used in many applications, where they are presented in a clustered, summarized, or aggregated form to provide detailed information, as opposed to single-valued data. Most of the existing models...

Full description

Saved in:
Bibliographic Details
Published inInformation sciences Vol. 609; pp. 121 - 152
Main Authors Zhao, Qing, Wang, Huiwen, Lu, Shan
Format Journal Article
LanguageEnglish
Published Elsevier Inc 01.09.2022
Subjects
Online AccessGet full text
ISSN0020-0255
DOI10.1016/j.ins.2022.07.064

Cover

More Information
Summary:With the improving capacity to collect massive amounts of data, distribution-valued data are increasingly used in many applications, where they are presented in a clustered, summarized, or aggregated form to provide detailed information, as opposed to single-valued data. Most of the existing models for distribution-valued data are subject to limitations attributed to the inherent constraints caused by the special expressions of probability distributions. This makes the practical usage of distribution-valued data highly challenging. This paper introduces a novel feature embedding method to characterize a probability distribution, and on this basis, an effective linear regression model that does not contain additional constraints is proposed. Unlike previous models with nonnegative constraints on coefficients, our model is capable of addressing negative coefficients. The detailed parameter estimation procedure applying partial least squares for this model is presented to guarantee more stable results, especially in the presence of a relatively small sample size or multicollinearity among variables. Overall, the proposed method fundamentally facilitates distribution-valued data regression analysis. Extensive simulation experiments and empirical PM2.5 concentration modeling not only verify the effectiveness of our regression method for distribution-valued data but also demonstrate the advantages of the proposed method compared with existing approaches.
ISSN:0020-0255
DOI:10.1016/j.ins.2022.07.064