Unbalanced Longitudinal Data Clustering with a Copula Kernel Mixture Model
Unbalanced longitudinal data appears commonly in practice, for example in cases where measurements are collected at different time points for different subjects and can therefore be sparse and/or irregularly sampled. Treating such data as functional enables smooth curve estimation and better handlin...
Saved in:
| Published in | Statistics and computing Vol. 35; no. 5 |
|---|---|
| Main Authors | , , |
| Format | Journal Article |
| Language | English |
| Published |
Dordrecht
Springer Nature B.V
01.10.2025
|
| Subjects | |
| Online Access | Get full text |
| ISSN | 0960-3174 1573-1375 |
| DOI | 10.1007/s11222-025-10650-6 |
Cover
| Abstract | Unbalanced longitudinal data appears commonly in practice, for example in cases where measurements are collected at different time points for different subjects and can therefore be sparse and/or irregularly sampled. Treating such data as functional enables smooth curve estimation and better handling of missing or irregularly spaced observations. Therefore, a Gaussian copula kernel mixture model (CKMM), based on functional data analysis, is proposed for clustering unbalanced multivariate longitudinal data. In this model, subject-specific warping matrices are included to account for irregularly spaced observations. A regularized functional eigen-decomposition is employed to estimate the copula correlation parameters, ensuring the smoothing procedure is integrated into clustering. Additionally, a functional gradient descent algorithm is implemented as an alternative to kernel density estimation to reduce computational complexity. An expectation-maximization-like algorithm is proposed to estimate marginal distributions, copula parameters, eigenfunctions, and eigenvalues in the CKMM. The performance of the CKMM is demonstrated through a simulation study and a data application. The proposed model exhibits superior performance compared to k-means with dynamic time warping, the growth mixture model, and functional high-dimensional data clustering. |
|---|---|
| AbstractList | Unbalanced longitudinal data appears commonly in practice, for example in cases where measurements are collected at different time points for different subjects and can therefore be sparse and/or irregularly sampled. Treating such data as functional enables smooth curve estimation and better handling of missing or irregularly spaced observations. Therefore, a Gaussian copula kernel mixture model (CKMM), based on functional data analysis, is proposed for clustering unbalanced multivariate longitudinal data. In this model, subject-specific warping matrices are included to account for irregularly spaced observations. A regularized functional eigen-decomposition is employed to estimate the copula correlation parameters, ensuring the smoothing procedure is integrated into clustering. Additionally, a functional gradient descent algorithm is implemented as an alternative to kernel density estimation to reduce computational complexity. An expectation-maximization-like algorithm is proposed to estimate marginal distributions, copula parameters, eigenfunctions, and eigenvalues in the CKMM. The performance of the CKMM is demonstrated through a simulation study and a data application. The proposed model exhibits superior performance compared to k-means with dynamic time warping, the growth mixture model, and functional high-dimensional data clustering. |
| ArticleNumber | 126 |
| Author | Murphy, Orla A. McNicholas, Paul D. Zhang, Xi |
| Author_xml | – sequence: 1 givenname: Xi orcidid: 0009-0004-5802-4862 surname: Zhang fullname: Zhang, Xi – sequence: 2 givenname: Orla A. orcidid: 0000-0003-1731-9811 surname: Murphy fullname: Murphy, Orla A. – sequence: 3 givenname: Paul D. orcidid: 0000-0002-2482-523X surname: McNicholas fullname: McNicholas, Paul D. |
| BookMark | eNotkMtOwzAURC1UJNrCD7CyxNrga8dOvETlTSo2dG05zk1JFZxiJwL-nkBZjTQ6Go3OgsxCH5CQc-CXwHl-lQCEEIwLxYBrxZk-InNQuWQgczUjc240ZxLy7IQsUtpxDqBlNidPm1C5zgWPNS37sG2HsW6D6-iNGxxddWMaMLZhSz_b4Y1OTb8fO0efMQbs6Lr9GsaIdN3X2J2S48Z1Cc_-c0k2d7evqwdWvtw_rq5L5oUoBoZ1LRvjvcKMu8YrUXGDvFBNoT2AU0VWodGNqY2vaoOF0oUHIavGCFUJqeSSXBx297H_GDENdtePcfqcrBRQZEpImU-UOFA-9ilFbOw-tu8uflvg9teZPTizkzP758xq-QNhQmCd |
| Cites_doi | 10.1007/s00357-017-9243-9 10.1111/j.0006-341X.1999.00463.x 10.1111/j.1541-0420.2012.01828.x 10.1111/j.1467-9868.2007.00605.x 10.1093/biomet/asq079 10.1214/aos/1176344136 10.1016/j.csda.2012.12.008 10.1016/j.jmva.2004.06.003 10.1007/BF01908075 10.1198/016214503000189 10.1007/978-3-540-45231-7_31 10.1201/9781315140919 10.1111/j.2517-6161.1977.tb01600.x 10.1002/cjs.11838 10.1016/j.csda.2013.04.001 10.1007/s00357-010-9054-8 10.1007/b98888 10.1007/s10618-013-0312-3 10.1016/j.csda.2012.12.004 10.1016/0377-0427(87)90125-7 10.4159/9780674041318 10.1016/j.jkss.2018.12.001 10.1146/annurev-statistics-041715-033624 10.1080/03610926.2016.1277753 10.1198/016214504000001574 10.1007/s00180-020-00958-4 10.1038/18581 10.1201/9781315373577 10.1007/s00362-023-01408-1 10.1016/j.csda.2019.106843 10.1111/1467-9469.00350 10.1002/9781118575574 10.1007/BF01246098 |
| ContentType | Journal Article |
| Copyright | The Author(s), under exclusive licence to Springer Science+Business Media, LLC, part of Springer Nature 2025. |
| Copyright_xml | – notice: The Author(s), under exclusive licence to Springer Science+Business Media, LLC, part of Springer Nature 2025. |
| DBID | AAYXX CITATION JQ2 |
| DOI | 10.1007/s11222-025-10650-6 |
| DatabaseName | CrossRef ProQuest Computer Science Collection |
| DatabaseTitle | CrossRef ProQuest Computer Science Collection |
| DatabaseTitleList | ProQuest Computer Science Collection |
| DeliveryMethod | fulltext_linktorsrc |
| Discipline | Statistics Mathematics Computer Science |
| EISSN | 1573-1375 |
| ExternalDocumentID | 10_1007_s11222_025_10650_6 |
| GroupedDBID | -Y2 -~C .86 .DC .VR 06D 0R~ 0VY 123 199 1N0 1SB 2.D 203 28- 29Q 2J2 2JN 2JY 2KG 2KM 2LR 2P1 2VQ 2~H 30V 4.4 406 408 409 40D 40E 5QI 5VS 67Z 6NX 78A 8TC 8UJ 95- 95. 95~ 96X AAAVM AABHQ AACDK AAHNG AAIAL AAJBT AAJKR AANZL AAPKM AARHV AARTL AASML AATNV AATVU AAUYE AAWCG AAYIU AAYQN AAYTO AAYXX AAYZH ABAKF ABBBX ABBRH ABBXA ABDBE ABDZT ABECU ABFSG ABFTV ABHLI ABHQN ABJNI ABJOX ABKCH ABKTR ABLJU ABMNI ABMQK ABNWP ABQBU ABQSL ABRTQ ABSXP ABTEG ABTHY ABTKH ABTMW ABULA ABWNU ABXPI ACAOD ACBXY ACDTI ACGFS ACHSB ACHXU ACKNC ACMDZ ACMLO ACOKC ACOMO ACPIV ACSNA ACSTC ACZOJ ADHHG ADHIR ADHKG ADIMF ADKFA ADKNI ADKPE ADRFC ADTPH ADURQ ADYFF ADZKW AEBTG AEFIE AEFQL AEGAL AEGNC AEJHL AEJRE AEKMD AEMSY AENEX AEOHA AEPYU AESKC AETLH AEVLU AEXYK AEZWR AFBBN AFDZB AFEXP AFGCZ AFHIU AFLOW AFOHR AFQWF AFWTZ AFZKB AGAYW AGDGC AGGDS AGJBK AGMZJ AGQEE AGQMX AGQPQ AGRTI AGWIL AGWZB AGYKE AHAVH AHBYD AHPBZ AHSBF AHWEU AHYZX AIAKS AIGIU AIIXL AILAN AITGF AIXLP AJBLW AJRNO AJZVZ ALMA_UNASSIGNED_HOLDINGS ALWAN AMKLP AMXSW AMYLF AMYQR AOCGG ARMRJ ASPBG ATHPR AVWKF AXYYD AYFIA AYJHY AZFZN B-. BA0 BAPOH BBWZM BDATZ BGNMA BSONS CAG CITATION COF CS3 CSCUP DDRTE DL5 DNIVK DPUIP DU5 EBLON EBS EIOEI EJD ESBYG F5P FEDTE FERAY FFXSO FIGPU FINBP FNLPD FRRFC FSGXE FWDCC GGCAI GGRSB GJIRD GNWQR GQ7 GQ8 GXS H13 HF~ HG5 HG6 HMJXF HQYDN HRMNR HVGLF HZ~ I09 IHE IJ- IKXTQ ITM IWAJR IXC IZIGR IZQ I~X I~Z J-C J0Z JBSCW JCJTX JZLTJ KDC KOV KOW LAK LLZTM M4Y MA- N2Q NB0 NDZJH NPVJJ NQJWS NU0 O9- O93 O9G O9I O9J OAM OVD P19 P2P P9R PF0 PT4 PT5 QOK QOS R4E R89 R9I RHV RNI RNS ROL RPX RSV RZC RZE RZK S16 S1Z S26 S27 S28 S3B SAP SCJ SCLPG SDD SDH SDM SHX SISQX SJYHP SMT SNE SNPRN SNX SOHCF SOJ SPISZ SRMVM SSLCW STPWE SZN T13 T16 TEORI TN5 TSG TSK TSV TUC U2A UG4 UOJIU UTJUX UZXMN VC2 VFIZW W23 W48 WK8 YLTOR Z45 ZMTXR ZWQNP ~EX JQ2 |
| ID | FETCH-LOGICAL-c228t-edd3f9cc5e40afc52b09e085f86c11a584be96f9d9cbd9e8568c123bf925b2353 |
| ISSN | 0960-3174 |
| IngestDate | Thu Oct 02 16:26:41 EDT 2025 Wed Oct 01 05:32:06 EDT 2025 |
| IsPeerReviewed | true |
| IsScholarly | true |
| Issue | 5 |
| Language | English |
| LinkModel | OpenURL |
| MergedId | FETCHMERGED-LOGICAL-c228t-edd3f9cc5e40afc52b09e085f86c11a584be96f9d9cbd9e8568c123bf925b2353 |
| Notes | ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14 |
| ORCID | 0009-0004-5802-4862 0000-0003-1731-9811 0000-0002-2482-523X |
| PQID | 3218452337 |
| PQPubID | 2043829 |
| ParticipantIDs | proquest_journals_3218452337 crossref_primary_10_1007_s11222_025_10650_6 |
| ProviderPackageCode | CITATION AAYXX |
| PublicationCentury | 2000 |
| PublicationDate | 2025-10-01 |
| PublicationDateYYYYMMDD | 2025-10-01 |
| PublicationDate_xml | – month: 10 year: 2025 text: 2025-10-01 day: 01 |
| PublicationDecade | 2020 |
| PublicationPlace | Dordrecht |
| PublicationPlace_xml | – name: Dordrecht |
| PublicationTitle | Statistics and computing |
| PublicationYear | 2025 |
| Publisher | Springer Nature B.V |
| Publisher_xml | – name: Springer Nature B.V |
| References | DW Scott (10650_CR31) 2015 PD McNicholas (10650_CR23) 2016 H Joe (10650_CR15) 2005; 94 J-M Chiou (10650_CR8) 2007; 69 F Centofanti (10650_CR9) 2024; 65 10650_CR30 L Zhang (10650_CR37) 2019; 48 GE Batista (10650_CR6) 2014; 28 G Celeux (10650_CR10) 1996; 13 10650_CR12 10650_CR34 G Mazo (10650_CR20) 2017; 34 J Jacques (10650_CR16) 2014; 71 C Abraham (10650_CR1) 2003; 30 10650_CR3 C Bouveyron (10650_CR2) 2014; 71 N Birbaumer (10650_CR5) 1999; 398 L Hubert (10650_CR14) 1985; 2 M Marbac (10650_CR22) 2017; 46 10650_CR4 J-L Wang (10650_CR36) 2016; 3 M Kayano (10650_CR18) 2010; 27 M Giacofci (10650_CR13) 2013; 69 GM James (10650_CR17) 2003; 98 AP Dempster (10650_CR11) 1977; 39 10650_CR21 BW Silverman (10650_CR32) 2018 10650_CR24 B Muthén (10650_CR26) 1999; 55 N Serban (10650_CR35) 2005; 100 N Coffey (10650_CR7) 2014; 71 N McCloud (10650_CR25) 2020; 143 10650_CR38 A Schmutz (10650_CR33) 2020; 35 DS Nagin (10650_CR27) 2005 JO Ramsay (10650_CR29) 2005 M Levine (10650_CR19) 2011; 98 PJ Rousseeuw (10650_CR28) 1987; 20 |
| References_xml | – volume: 34 start-page: 444 year: 2017 ident: 10650_CR20 publication-title: J. Classif. doi: 10.1007/s00357-017-9243-9 – volume: 55 start-page: 463 issue: 2 year: 1999 ident: 10650_CR26 publication-title: Biometrics doi: 10.1111/j.0006-341X.1999.00463.x – volume: 69 start-page: 31 issue: 1 year: 2013 ident: 10650_CR13 publication-title: Biometrics doi: 10.1111/j.1541-0420.2012.01828.x – volume: 69 start-page: 679 issue: 4 year: 2007 ident: 10650_CR8 publication-title: J. R. Stat. Soc. Ser. B Stat Methodol. doi: 10.1111/j.1467-9868.2007.00605.x – volume: 98 start-page: 403 issue: 2 year: 2011 ident: 10650_CR19 publication-title: Biometrika doi: 10.1093/biomet/asq079 – ident: 10650_CR30 doi: 10.1214/aos/1176344136 – volume: 71 start-page: 52 year: 2014 ident: 10650_CR2 publication-title: Computational Statistics & Data Analysis doi: 10.1016/j.csda.2012.12.008 – volume: 94 start-page: 401 issue: 2 year: 2005 ident: 10650_CR15 publication-title: J. Multivar. Anal. doi: 10.1016/j.jmva.2004.06.003 – ident: 10650_CR21 – volume: 2 start-page: 193 year: 1985 ident: 10650_CR14 publication-title: J. Classif. doi: 10.1007/BF01908075 – volume: 98 start-page: 397 issue: 462 year: 2003 ident: 10650_CR17 publication-title: J. Am. Stat. Assoc. doi: 10.1198/016214503000189 – ident: 10650_CR24 doi: 10.1007/978-3-540-45231-7_31 – volume-title: Density Estimation for Statistics and Data Analysis year: 2018 ident: 10650_CR32 doi: 10.1201/9781315140919 – ident: 10650_CR4 – volume: 39 start-page: 1 issue: 1 year: 1977 ident: 10650_CR11 publication-title: J. Roy. Stat. Soc. B doi: 10.1111/j.2517-6161.1977.tb01600.x – ident: 10650_CR38 doi: 10.1002/cjs.11838 – volume: 71 start-page: 14 year: 2014 ident: 10650_CR7 publication-title: Computational Statistics & Data Analysis doi: 10.1016/j.csda.2013.04.001 – volume: 27 start-page: 211 year: 2010 ident: 10650_CR18 publication-title: J. Classif. doi: 10.1007/s00357-010-9054-8 – volume-title: Functional Data Analysis year: 2005 ident: 10650_CR29 doi: 10.1007/b98888 – volume: 28 start-page: 634 issue: 3 year: 2014 ident: 10650_CR6 publication-title: Data Min. Knowl. Disc. doi: 10.1007/s10618-013-0312-3 – volume: 71 start-page: 92 year: 2014 ident: 10650_CR16 publication-title: Computational Statistics & Data Analysis doi: 10.1016/j.csda.2012.12.004 – volume: 20 start-page: 53 year: 1987 ident: 10650_CR28 publication-title: J. Comput. Appl. Math. doi: 10.1016/0377-0427(87)90125-7 – volume-title: Group-Based Modeling of Development year: 2005 ident: 10650_CR27 doi: 10.4159/9780674041318 – ident: 10650_CR34 – volume: 48 start-page: 480 issue: 3 year: 2019 ident: 10650_CR37 publication-title: Journal of the Korean Statistical Society doi: 10.1016/j.jkss.2018.12.001 – ident: 10650_CR12 – volume: 3 start-page: 257 year: 2016 ident: 10650_CR36 publication-title: Annual Review of Statistics and Its Application doi: 10.1146/annurev-statistics-041715-033624 – volume: 46 start-page: 11635 issue: 23 year: 2017 ident: 10650_CR22 publication-title: Communications in Statistics-Theory and Methods doi: 10.1080/03610926.2016.1277753 – volume: 100 start-page: 990 issue: 471 year: 2005 ident: 10650_CR35 publication-title: J. Am. Stat. Assoc. doi: 10.1198/016214504000001574 – volume: 35 start-page: 1101 issue: 3 year: 2020 ident: 10650_CR33 publication-title: Comput. Statistics doi: 10.1007/s00180-020-00958-4 – volume: 398 start-page: 297 issue: 6725 year: 1999 ident: 10650_CR5 publication-title: Nature doi: 10.1038/18581 – volume-title: Mixture Model-Based Classification year: 2016 ident: 10650_CR23 doi: 10.1201/9781315373577 – ident: 10650_CR3 – volume: 65 start-page: 795 issue: 2 year: 2024 ident: 10650_CR9 publication-title: Stat. Pap. doi: 10.1007/s00362-023-01408-1 – volume: 143 year: 2020 ident: 10650_CR25 publication-title: Computational Statistics & Data Analysis doi: 10.1016/j.csda.2019.106843 – volume: 30 start-page: 581 issue: 3 year: 2003 ident: 10650_CR1 publication-title: Scand. J. Stat. doi: 10.1111/1467-9469.00350 – volume-title: Multivariate Density Estimation: Theory, Practice, and Visualization year: 2015 ident: 10650_CR31 doi: 10.1002/9781118575574 – volume: 13 start-page: 195 issue: 2 year: 1996 ident: 10650_CR10 publication-title: J. Classif. doi: 10.1007/BF01246098 |
| SSID | ssj0011634 |
| Score | 2.4103024 |
| Snippet | Unbalanced longitudinal data appears commonly in practice, for example in cases where measurements are collected at different time points for different... |
| SourceID | proquest crossref |
| SourceType | Aggregation Database Index Database |
| SubjectTerms | Algorithms Clustering Data analysis Eigenvalues Eigenvectors Multivariate analysis Parameters Warping |
| Title | Unbalanced Longitudinal Data Clustering with a Copula Kernel Mixture Model |
| URI | https://www.proquest.com/docview/3218452337 |
| Volume | 35 |
| hasFullText | 1 |
| inHoldings | 1 |
| isFullTextHit | |
| isPrint | |
| journalDatabaseRights | – providerCode: PRVLSH databaseName: SpringerLink Journals customDbUrl: mediaType: online eissn: 1573-1375 dateEnd: 99991231 omitProxy: false ssIdentifier: ssj0011634 issn: 0960-3174 databaseCode: AFBBN dateStart: 19910901 isFulltext: true providerName: Library Specific Holdings – providerCode: PRVAVX databaseName: SpringerLINK - Czech Republic Consortium customDbUrl: eissn: 1573-1375 dateEnd: 99991231 omitProxy: false ssIdentifier: ssj0011634 issn: 0960-3174 databaseCode: AGYKE dateStart: 19970101 isFulltext: true titleUrlDefault: http://link.springer.com providerName: Springer Nature – providerCode: PRVAVX databaseName: SpringerLink Journals (ICM) customDbUrl: eissn: 1573-1375 dateEnd: 99991231 omitProxy: true ssIdentifier: ssj0011634 issn: 0960-3174 databaseCode: U2A dateStart: 19970101 isFulltext: true titleUrlDefault: http://www.springerlink.com/journals/ providerName: Springer Nature |
| link | http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwnV1Lb9NAEF6FcikHHgFEoaA9IC7RVvZu1vEeq9KqKkl7IJZys7wPS5UqtySOhPj1zKzXj0CEgItlraV4tfNl_O3szHyEfBSRxZYhAmMAnE1tNGOFiyxLFdyXUk-NV1FYXCeX2fRqJVej0ddB1tK21ifmx966kv-xKoyBXbFK9h8s2_0oDMA92BeuYGG4_pWNs0pjZiIe4c_vUXhoa73I1eeiLiZnd1tsgtAFW2HEi3VNvrh1hfmst9_96QGqod0NOSryz0H7ZuN1H9ov3DDGvLr93Vw3a3hDh4qFAaTh7nnTZiGGDOMQZ-Cyy1jbjTNiEjVOrquDaYKJCXrzRm_nxAVXOhMsFo0uSutrm9YkAVNyrwuPQklzDMyFNfMAFsn29Mu-vskvsvk8X56vlp8evjGUEsMj96Cr8og85uDqUc8j46fd0RIQUN9TrJ1yqKRq6il_fekuW9n9WHsGsnxOnoatAz1tcPCCjFw1Js9aWQ4avPSYPFl0rXg3Y3LYm_MlueoRQ4eIoYgY2iOGImIojHjE0AYxNCCGesS8ItnF-fLskgU1DWY4T2vmrBWlMka6aVSURnIdKQeEu0wTE8cFEFHtVFIqq4y2yqUySQ3QGl0qLjUXUrwmB9V95d4Qqizss51NuBMoXT1TcZnEpS2kEtbABuiITNpVyx-apil53x4b1ziHNc79GufJETluFzYPf65NLjD0ILkQs7d_fvyOHPZoPSYH9Xrr3gNPrPUHb_efnClm6w |
| linkProvider | Springer Nature |
| openUrl | ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Unbalanced+Longitudinal+Data+Clustering+with+a+Copula+Kernel+Mixture+Model&rft.jtitle=Statistics+and+computing&rft.au=Zhang%2C+Xi&rft.au=Murphy%2C+Orla+A&rft.au=McNicholas%2C+Paul+D&rft.date=2025-10-01&rft.pub=Springer+Nature+B.V&rft.issn=0960-3174&rft.eissn=1573-1375&rft.volume=35&rft.issue=5&rft_id=info:doi/10.1007%2Fs11222-025-10650-6&rft.externalDBID=NO_FULL_TEXT |
| thumbnail_l | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=0960-3174&client=summon |
| thumbnail_m | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=0960-3174&client=summon |
| thumbnail_s | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=0960-3174&client=summon |