Unbalanced Longitudinal Data Clustering with a Copula Kernel Mixture Model

Unbalanced longitudinal data appears commonly in practice, for example in cases where measurements are collected at different time points for different subjects and can therefore be sparse and/or irregularly sampled. Treating such data as functional enables smooth curve estimation and better handlin...

Full description

Saved in:
Bibliographic Details
Published inStatistics and computing Vol. 35; no. 5
Main Authors Zhang, Xi, Murphy, Orla A., McNicholas, Paul D.
Format Journal Article
LanguageEnglish
Published Dordrecht Springer Nature B.V 01.10.2025
Subjects
Online AccessGet full text
ISSN0960-3174
1573-1375
DOI10.1007/s11222-025-10650-6

Cover

Abstract Unbalanced longitudinal data appears commonly in practice, for example in cases where measurements are collected at different time points for different subjects and can therefore be sparse and/or irregularly sampled. Treating such data as functional enables smooth curve estimation and better handling of missing or irregularly spaced observations. Therefore, a Gaussian copula kernel mixture model (CKMM), based on functional data analysis, is proposed for clustering unbalanced multivariate longitudinal data. In this model, subject-specific warping matrices are included to account for irregularly spaced observations. A regularized functional eigen-decomposition is employed to estimate the copula correlation parameters, ensuring the smoothing procedure is integrated into clustering. Additionally, a functional gradient descent algorithm is implemented as an alternative to kernel density estimation to reduce computational complexity. An expectation-maximization-like algorithm is proposed to estimate marginal distributions, copula parameters, eigenfunctions, and eigenvalues in the CKMM. The performance of the CKMM is demonstrated through a simulation study and a data application. The proposed model exhibits superior performance compared to k-means with dynamic time warping, the growth mixture model, and functional high-dimensional data clustering.
AbstractList Unbalanced longitudinal data appears commonly in practice, for example in cases where measurements are collected at different time points for different subjects and can therefore be sparse and/or irregularly sampled. Treating such data as functional enables smooth curve estimation and better handling of missing or irregularly spaced observations. Therefore, a Gaussian copula kernel mixture model (CKMM), based on functional data analysis, is proposed for clustering unbalanced multivariate longitudinal data. In this model, subject-specific warping matrices are included to account for irregularly spaced observations. A regularized functional eigen-decomposition is employed to estimate the copula correlation parameters, ensuring the smoothing procedure is integrated into clustering. Additionally, a functional gradient descent algorithm is implemented as an alternative to kernel density estimation to reduce computational complexity. An expectation-maximization-like algorithm is proposed to estimate marginal distributions, copula parameters, eigenfunctions, and eigenvalues in the CKMM. The performance of the CKMM is demonstrated through a simulation study and a data application. The proposed model exhibits superior performance compared to k-means with dynamic time warping, the growth mixture model, and functional high-dimensional data clustering.
ArticleNumber 126
Author Murphy, Orla A.
McNicholas, Paul D.
Zhang, Xi
Author_xml – sequence: 1
  givenname: Xi
  orcidid: 0009-0004-5802-4862
  surname: Zhang
  fullname: Zhang, Xi
– sequence: 2
  givenname: Orla A.
  orcidid: 0000-0003-1731-9811
  surname: Murphy
  fullname: Murphy, Orla A.
– sequence: 3
  givenname: Paul D.
  orcidid: 0000-0002-2482-523X
  surname: McNicholas
  fullname: McNicholas, Paul D.
BookMark eNotkMtOwzAURC1UJNrCD7CyxNrga8dOvETlTSo2dG05zk1JFZxiJwL-nkBZjTQ6Go3OgsxCH5CQc-CXwHl-lQCEEIwLxYBrxZk-InNQuWQgczUjc240ZxLy7IQsUtpxDqBlNidPm1C5zgWPNS37sG2HsW6D6-iNGxxddWMaMLZhSz_b4Y1OTb8fO0efMQbs6Lr9GsaIdN3X2J2S48Z1Cc_-c0k2d7evqwdWvtw_rq5L5oUoBoZ1LRvjvcKMu8YrUXGDvFBNoT2AU0VWodGNqY2vaoOF0oUHIavGCFUJqeSSXBx297H_GDENdtePcfqcrBRQZEpImU-UOFA-9ilFbOw-tu8uflvg9teZPTizkzP758xq-QNhQmCd
Cites_doi 10.1007/s00357-017-9243-9
10.1111/j.0006-341X.1999.00463.x
10.1111/j.1541-0420.2012.01828.x
10.1111/j.1467-9868.2007.00605.x
10.1093/biomet/asq079
10.1214/aos/1176344136
10.1016/j.csda.2012.12.008
10.1016/j.jmva.2004.06.003
10.1007/BF01908075
10.1198/016214503000189
10.1007/978-3-540-45231-7_31
10.1201/9781315140919
10.1111/j.2517-6161.1977.tb01600.x
10.1002/cjs.11838
10.1016/j.csda.2013.04.001
10.1007/s00357-010-9054-8
10.1007/b98888
10.1007/s10618-013-0312-3
10.1016/j.csda.2012.12.004
10.1016/0377-0427(87)90125-7
10.4159/9780674041318
10.1016/j.jkss.2018.12.001
10.1146/annurev-statistics-041715-033624
10.1080/03610926.2016.1277753
10.1198/016214504000001574
10.1007/s00180-020-00958-4
10.1038/18581
10.1201/9781315373577
10.1007/s00362-023-01408-1
10.1016/j.csda.2019.106843
10.1111/1467-9469.00350
10.1002/9781118575574
10.1007/BF01246098
ContentType Journal Article
Copyright The Author(s), under exclusive licence to Springer Science+Business Media, LLC, part of Springer Nature 2025.
Copyright_xml – notice: The Author(s), under exclusive licence to Springer Science+Business Media, LLC, part of Springer Nature 2025.
DBID AAYXX
CITATION
JQ2
DOI 10.1007/s11222-025-10650-6
DatabaseName CrossRef
ProQuest Computer Science Collection
DatabaseTitle CrossRef
ProQuest Computer Science Collection
DatabaseTitleList ProQuest Computer Science Collection
DeliveryMethod fulltext_linktorsrc
Discipline Statistics
Mathematics
Computer Science
EISSN 1573-1375
ExternalDocumentID 10_1007_s11222_025_10650_6
GroupedDBID -Y2
-~C
.86
.DC
.VR
06D
0R~
0VY
123
199
1N0
1SB
2.D
203
28-
29Q
2J2
2JN
2JY
2KG
2KM
2LR
2P1
2VQ
2~H
30V
4.4
406
408
409
40D
40E
5QI
5VS
67Z
6NX
78A
8TC
8UJ
95-
95.
95~
96X
AAAVM
AABHQ
AACDK
AAHNG
AAIAL
AAJBT
AAJKR
AANZL
AAPKM
AARHV
AARTL
AASML
AATNV
AATVU
AAUYE
AAWCG
AAYIU
AAYQN
AAYTO
AAYXX
AAYZH
ABAKF
ABBBX
ABBRH
ABBXA
ABDBE
ABDZT
ABECU
ABFSG
ABFTV
ABHLI
ABHQN
ABJNI
ABJOX
ABKCH
ABKTR
ABLJU
ABMNI
ABMQK
ABNWP
ABQBU
ABQSL
ABRTQ
ABSXP
ABTEG
ABTHY
ABTKH
ABTMW
ABULA
ABWNU
ABXPI
ACAOD
ACBXY
ACDTI
ACGFS
ACHSB
ACHXU
ACKNC
ACMDZ
ACMLO
ACOKC
ACOMO
ACPIV
ACSNA
ACSTC
ACZOJ
ADHHG
ADHIR
ADHKG
ADIMF
ADKFA
ADKNI
ADKPE
ADRFC
ADTPH
ADURQ
ADYFF
ADZKW
AEBTG
AEFIE
AEFQL
AEGAL
AEGNC
AEJHL
AEJRE
AEKMD
AEMSY
AENEX
AEOHA
AEPYU
AESKC
AETLH
AEVLU
AEXYK
AEZWR
AFBBN
AFDZB
AFEXP
AFGCZ
AFHIU
AFLOW
AFOHR
AFQWF
AFWTZ
AFZKB
AGAYW
AGDGC
AGGDS
AGJBK
AGMZJ
AGQEE
AGQMX
AGQPQ
AGRTI
AGWIL
AGWZB
AGYKE
AHAVH
AHBYD
AHPBZ
AHSBF
AHWEU
AHYZX
AIAKS
AIGIU
AIIXL
AILAN
AITGF
AIXLP
AJBLW
AJRNO
AJZVZ
ALMA_UNASSIGNED_HOLDINGS
ALWAN
AMKLP
AMXSW
AMYLF
AMYQR
AOCGG
ARMRJ
ASPBG
ATHPR
AVWKF
AXYYD
AYFIA
AYJHY
AZFZN
B-.
BA0
BAPOH
BBWZM
BDATZ
BGNMA
BSONS
CAG
CITATION
COF
CS3
CSCUP
DDRTE
DL5
DNIVK
DPUIP
DU5
EBLON
EBS
EIOEI
EJD
ESBYG
F5P
FEDTE
FERAY
FFXSO
FIGPU
FINBP
FNLPD
FRRFC
FSGXE
FWDCC
GGCAI
GGRSB
GJIRD
GNWQR
GQ7
GQ8
GXS
H13
HF~
HG5
HG6
HMJXF
HQYDN
HRMNR
HVGLF
HZ~
I09
IHE
IJ-
IKXTQ
ITM
IWAJR
IXC
IZIGR
IZQ
I~X
I~Z
J-C
J0Z
JBSCW
JCJTX
JZLTJ
KDC
KOV
KOW
LAK
LLZTM
M4Y
MA-
N2Q
NB0
NDZJH
NPVJJ
NQJWS
NU0
O9-
O93
O9G
O9I
O9J
OAM
OVD
P19
P2P
P9R
PF0
PT4
PT5
QOK
QOS
R4E
R89
R9I
RHV
RNI
RNS
ROL
RPX
RSV
RZC
RZE
RZK
S16
S1Z
S26
S27
S28
S3B
SAP
SCJ
SCLPG
SDD
SDH
SDM
SHX
SISQX
SJYHP
SMT
SNE
SNPRN
SNX
SOHCF
SOJ
SPISZ
SRMVM
SSLCW
STPWE
SZN
T13
T16
TEORI
TN5
TSG
TSK
TSV
TUC
U2A
UG4
UOJIU
UTJUX
UZXMN
VC2
VFIZW
W23
W48
WK8
YLTOR
Z45
ZMTXR
ZWQNP
~EX
JQ2
ID FETCH-LOGICAL-c228t-edd3f9cc5e40afc52b09e085f86c11a584be96f9d9cbd9e8568c123bf925b2353
ISSN 0960-3174
IngestDate Thu Oct 02 16:26:41 EDT 2025
Wed Oct 01 05:32:06 EDT 2025
IsPeerReviewed true
IsScholarly true
Issue 5
Language English
LinkModel OpenURL
MergedId FETCHMERGED-LOGICAL-c228t-edd3f9cc5e40afc52b09e085f86c11a584be96f9d9cbd9e8568c123bf925b2353
Notes ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 14
ORCID 0009-0004-5802-4862
0000-0003-1731-9811
0000-0002-2482-523X
PQID 3218452337
PQPubID 2043829
ParticipantIDs proquest_journals_3218452337
crossref_primary_10_1007_s11222_025_10650_6
ProviderPackageCode CITATION
AAYXX
PublicationCentury 2000
PublicationDate 2025-10-01
PublicationDateYYYYMMDD 2025-10-01
PublicationDate_xml – month: 10
  year: 2025
  text: 2025-10-01
  day: 01
PublicationDecade 2020
PublicationPlace Dordrecht
PublicationPlace_xml – name: Dordrecht
PublicationTitle Statistics and computing
PublicationYear 2025
Publisher Springer Nature B.V
Publisher_xml – name: Springer Nature B.V
References DW Scott (10650_CR31) 2015
PD McNicholas (10650_CR23) 2016
H Joe (10650_CR15) 2005; 94
J-M Chiou (10650_CR8) 2007; 69
F Centofanti (10650_CR9) 2024; 65
10650_CR30
L Zhang (10650_CR37) 2019; 48
GE Batista (10650_CR6) 2014; 28
G Celeux (10650_CR10) 1996; 13
10650_CR12
10650_CR34
G Mazo (10650_CR20) 2017; 34
J Jacques (10650_CR16) 2014; 71
C Abraham (10650_CR1) 2003; 30
10650_CR3
C Bouveyron (10650_CR2) 2014; 71
N Birbaumer (10650_CR5) 1999; 398
L Hubert (10650_CR14) 1985; 2
M Marbac (10650_CR22) 2017; 46
10650_CR4
J-L Wang (10650_CR36) 2016; 3
M Kayano (10650_CR18) 2010; 27
M Giacofci (10650_CR13) 2013; 69
GM James (10650_CR17) 2003; 98
AP Dempster (10650_CR11) 1977; 39
10650_CR21
BW Silverman (10650_CR32) 2018
10650_CR24
B Muthén (10650_CR26) 1999; 55
N Serban (10650_CR35) 2005; 100
N Coffey (10650_CR7) 2014; 71
N McCloud (10650_CR25) 2020; 143
10650_CR38
A Schmutz (10650_CR33) 2020; 35
DS Nagin (10650_CR27) 2005
JO Ramsay (10650_CR29) 2005
M Levine (10650_CR19) 2011; 98
PJ Rousseeuw (10650_CR28) 1987; 20
References_xml – volume: 34
  start-page: 444
  year: 2017
  ident: 10650_CR20
  publication-title: J. Classif.
  doi: 10.1007/s00357-017-9243-9
– volume: 55
  start-page: 463
  issue: 2
  year: 1999
  ident: 10650_CR26
  publication-title: Biometrics
  doi: 10.1111/j.0006-341X.1999.00463.x
– volume: 69
  start-page: 31
  issue: 1
  year: 2013
  ident: 10650_CR13
  publication-title: Biometrics
  doi: 10.1111/j.1541-0420.2012.01828.x
– volume: 69
  start-page: 679
  issue: 4
  year: 2007
  ident: 10650_CR8
  publication-title: J. R. Stat. Soc. Ser. B Stat Methodol.
  doi: 10.1111/j.1467-9868.2007.00605.x
– volume: 98
  start-page: 403
  issue: 2
  year: 2011
  ident: 10650_CR19
  publication-title: Biometrika
  doi: 10.1093/biomet/asq079
– ident: 10650_CR30
  doi: 10.1214/aos/1176344136
– volume: 71
  start-page: 52
  year: 2014
  ident: 10650_CR2
  publication-title: Computational Statistics & Data Analysis
  doi: 10.1016/j.csda.2012.12.008
– volume: 94
  start-page: 401
  issue: 2
  year: 2005
  ident: 10650_CR15
  publication-title: J. Multivar. Anal.
  doi: 10.1016/j.jmva.2004.06.003
– ident: 10650_CR21
– volume: 2
  start-page: 193
  year: 1985
  ident: 10650_CR14
  publication-title: J. Classif.
  doi: 10.1007/BF01908075
– volume: 98
  start-page: 397
  issue: 462
  year: 2003
  ident: 10650_CR17
  publication-title: J. Am. Stat. Assoc.
  doi: 10.1198/016214503000189
– ident: 10650_CR24
  doi: 10.1007/978-3-540-45231-7_31
– volume-title: Density Estimation for Statistics and Data Analysis
  year: 2018
  ident: 10650_CR32
  doi: 10.1201/9781315140919
– ident: 10650_CR4
– volume: 39
  start-page: 1
  issue: 1
  year: 1977
  ident: 10650_CR11
  publication-title: J. Roy. Stat. Soc. B
  doi: 10.1111/j.2517-6161.1977.tb01600.x
– ident: 10650_CR38
  doi: 10.1002/cjs.11838
– volume: 71
  start-page: 14
  year: 2014
  ident: 10650_CR7
  publication-title: Computational Statistics & Data Analysis
  doi: 10.1016/j.csda.2013.04.001
– volume: 27
  start-page: 211
  year: 2010
  ident: 10650_CR18
  publication-title: J. Classif.
  doi: 10.1007/s00357-010-9054-8
– volume-title: Functional Data Analysis
  year: 2005
  ident: 10650_CR29
  doi: 10.1007/b98888
– volume: 28
  start-page: 634
  issue: 3
  year: 2014
  ident: 10650_CR6
  publication-title: Data Min. Knowl. Disc.
  doi: 10.1007/s10618-013-0312-3
– volume: 71
  start-page: 92
  year: 2014
  ident: 10650_CR16
  publication-title: Computational Statistics & Data Analysis
  doi: 10.1016/j.csda.2012.12.004
– volume: 20
  start-page: 53
  year: 1987
  ident: 10650_CR28
  publication-title: J. Comput. Appl. Math.
  doi: 10.1016/0377-0427(87)90125-7
– volume-title: Group-Based Modeling of Development
  year: 2005
  ident: 10650_CR27
  doi: 10.4159/9780674041318
– ident: 10650_CR34
– volume: 48
  start-page: 480
  issue: 3
  year: 2019
  ident: 10650_CR37
  publication-title: Journal of the Korean Statistical Society
  doi: 10.1016/j.jkss.2018.12.001
– ident: 10650_CR12
– volume: 3
  start-page: 257
  year: 2016
  ident: 10650_CR36
  publication-title: Annual Review of Statistics and Its Application
  doi: 10.1146/annurev-statistics-041715-033624
– volume: 46
  start-page: 11635
  issue: 23
  year: 2017
  ident: 10650_CR22
  publication-title: Communications in Statistics-Theory and Methods
  doi: 10.1080/03610926.2016.1277753
– volume: 100
  start-page: 990
  issue: 471
  year: 2005
  ident: 10650_CR35
  publication-title: J. Am. Stat. Assoc.
  doi: 10.1198/016214504000001574
– volume: 35
  start-page: 1101
  issue: 3
  year: 2020
  ident: 10650_CR33
  publication-title: Comput. Statistics
  doi: 10.1007/s00180-020-00958-4
– volume: 398
  start-page: 297
  issue: 6725
  year: 1999
  ident: 10650_CR5
  publication-title: Nature
  doi: 10.1038/18581
– volume-title: Mixture Model-Based Classification
  year: 2016
  ident: 10650_CR23
  doi: 10.1201/9781315373577
– ident: 10650_CR3
– volume: 65
  start-page: 795
  issue: 2
  year: 2024
  ident: 10650_CR9
  publication-title: Stat. Pap.
  doi: 10.1007/s00362-023-01408-1
– volume: 143
  year: 2020
  ident: 10650_CR25
  publication-title: Computational Statistics & Data Analysis
  doi: 10.1016/j.csda.2019.106843
– volume: 30
  start-page: 581
  issue: 3
  year: 2003
  ident: 10650_CR1
  publication-title: Scand. J. Stat.
  doi: 10.1111/1467-9469.00350
– volume-title: Multivariate Density Estimation: Theory, Practice, and Visualization
  year: 2015
  ident: 10650_CR31
  doi: 10.1002/9781118575574
– volume: 13
  start-page: 195
  issue: 2
  year: 1996
  ident: 10650_CR10
  publication-title: J. Classif.
  doi: 10.1007/BF01246098
SSID ssj0011634
Score 2.4103024
Snippet Unbalanced longitudinal data appears commonly in practice, for example in cases where measurements are collected at different time points for different...
SourceID proquest
crossref
SourceType Aggregation Database
Index Database
SubjectTerms Algorithms
Clustering
Data analysis
Eigenvalues
Eigenvectors
Multivariate analysis
Parameters
Warping
Title Unbalanced Longitudinal Data Clustering with a Copula Kernel Mixture Model
URI https://www.proquest.com/docview/3218452337
Volume 35
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
journalDatabaseRights – providerCode: PRVLSH
  databaseName: SpringerLink Journals
  customDbUrl:
  mediaType: online
  eissn: 1573-1375
  dateEnd: 99991231
  omitProxy: false
  ssIdentifier: ssj0011634
  issn: 0960-3174
  databaseCode: AFBBN
  dateStart: 19910901
  isFulltext: true
  providerName: Library Specific Holdings
– providerCode: PRVAVX
  databaseName: SpringerLINK - Czech Republic Consortium
  customDbUrl:
  eissn: 1573-1375
  dateEnd: 99991231
  omitProxy: false
  ssIdentifier: ssj0011634
  issn: 0960-3174
  databaseCode: AGYKE
  dateStart: 19970101
  isFulltext: true
  titleUrlDefault: http://link.springer.com
  providerName: Springer Nature
– providerCode: PRVAVX
  databaseName: SpringerLink Journals (ICM)
  customDbUrl:
  eissn: 1573-1375
  dateEnd: 99991231
  omitProxy: true
  ssIdentifier: ssj0011634
  issn: 0960-3174
  databaseCode: U2A
  dateStart: 19970101
  isFulltext: true
  titleUrlDefault: http://www.springerlink.com/journals/
  providerName: Springer Nature
link http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwnV1Lb9NAEF6FcikHHgFEoaA9IC7RVvZu1vEeq9KqKkl7IJZys7wPS5UqtySOhPj1zKzXj0CEgItlraV4tfNl_O3szHyEfBSRxZYhAmMAnE1tNGOFiyxLFdyXUk-NV1FYXCeX2fRqJVej0ddB1tK21ifmx966kv-xKoyBXbFK9h8s2_0oDMA92BeuYGG4_pWNs0pjZiIe4c_vUXhoa73I1eeiLiZnd1tsgtAFW2HEi3VNvrh1hfmst9_96QGqod0NOSryz0H7ZuN1H9ov3DDGvLr93Vw3a3hDh4qFAaTh7nnTZiGGDOMQZ-Cyy1jbjTNiEjVOrquDaYKJCXrzRm_nxAVXOhMsFo0uSutrm9YkAVNyrwuPQklzDMyFNfMAFsn29Mu-vskvsvk8X56vlp8evjGUEsMj96Cr8og85uDqUc8j46fd0RIQUN9TrJ1yqKRq6il_fekuW9n9WHsGsnxOnoatAz1tcPCCjFw1Js9aWQ4avPSYPFl0rXg3Y3LYm_MlueoRQ4eIoYgY2iOGImIojHjE0AYxNCCGesS8ItnF-fLskgU1DWY4T2vmrBWlMka6aVSURnIdKQeEu0wTE8cFEFHtVFIqq4y2yqUySQ3QGl0qLjUXUrwmB9V95d4Qqizss51NuBMoXT1TcZnEpS2kEtbABuiITNpVyx-apil53x4b1ziHNc79GufJETluFzYPf65NLjD0ILkQs7d_fvyOHPZoPSYH9Xrr3gNPrPUHb_efnClm6w
linkProvider Springer Nature
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Unbalanced+Longitudinal+Data+Clustering+with+a+Copula+Kernel+Mixture+Model&rft.jtitle=Statistics+and+computing&rft.au=Zhang%2C+Xi&rft.au=Murphy%2C+Orla+A&rft.au=McNicholas%2C+Paul+D&rft.date=2025-10-01&rft.pub=Springer+Nature+B.V&rft.issn=0960-3174&rft.eissn=1573-1375&rft.volume=35&rft.issue=5&rft_id=info:doi/10.1007%2Fs11222-025-10650-6&rft.externalDBID=NO_FULL_TEXT
thumbnail_l http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=0960-3174&client=summon
thumbnail_m http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=0960-3174&client=summon
thumbnail_s http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=0960-3174&client=summon