Identifying the number of components in Gaussian mixture models using numerical algebraic geometry

Using Gaussian mixture models for clustering is a statistically mature method for clustering in data science with numerous successful applications in science and engineering. The parameters for a Gaussian mixture model (GMM) are typically estimated from training data using the iterative expectation-...

Full description

Saved in:
Bibliographic Details
Published inJournal of algebra and its applications Vol. 19; no. 11; p. 2050204
Main Authors Shirinkam, Sara, Alaeddini, Adel, Gross, Elizabeth
Format Journal Article
LanguageEnglish
Published Singapore World Scientific Publishing Company 01.11.2020
World Scientific Publishing Co. Pte., Ltd
Subjects
Online AccessGet full text
ISSN0219-4988
1793-6829
1793-6829
DOI10.1142/S0219498820502047

Cover

Abstract Using Gaussian mixture models for clustering is a statistically mature method for clustering in data science with numerous successful applications in science and engineering. The parameters for a Gaussian mixture model (GMM) are typically estimated from training data using the iterative expectation-maximization algorithm, which requires the number of Gaussian components a priori. In this study, we propose two algorithms rooted in numerical algebraic geometry (NAG), namely, an area-based algorithm and a local maxima algorithm, to identify the optimal number of components. The area-based algorithm transforms several GMM with varying number of components into sets of equivalent polynomial regression splines. Next, it uses homotopy continuation methods for evaluating the resulting splines to identify the number of components that is most compatible with the gradient data. The local maxima algorithm forms a set of polynomials by fitting a smoothing spline to a dataset. Next, it uses NAG to solve the system of the first derivatives for finding the local maxima of the resulting smoothing spline, which represent the number of mixture components. The local maxima algorithm also identifies the location of the centers of Gaussian components. Using a real-world case study in automotive manufacturing and extensive simulations, we demonstrate that the performance of the proposed algorithms is comparable with that of Akaike information criterion (AIC) and Bayesian information criterion (BIC), which are popular methods in the literature. We also show the proposed algorithms are more robust than AIC and BIC when the Gaussian assumption is violated.
AbstractList Using Gaussian mixture models for clustering is a statistically mature method for clustering in data science with numerous successful applications in science and engineering. The parameters for a Gaussian mixture model are typically estimated from training data using the iterative expectation-maximization algorithm, which requires the number of Gaussian components a priori. In this study, we propose two algorithms rooted in numerical algebraic geometry, namely an area-based algorithm and a local maxima algorithm, to identify the optimal number of components. The area-based algorithm transforms several Gaussian mixture models with varying number of components into sets of equivalent polynomial regression splines. Next, it uses homotopy continuation methods for evaluating the resulting splines to identify the number of components that results in the best fit. The local maxima algorithm forms a set of polynomials by fitting a smoothing spline to a kernel density estimate of the data. Next, it uses numerical algebraic geometry to solve the system of the first derivatives for finding the local maxima of the resulting smoothing spline, which estimates the number of mixture components. The local maxima algorithm also identifies the location of the centers of Gaussian components. Using a real-world case study in automotive manufacturing and multiple simulations, we compare the performance of the proposed algorithms with that of Akaike information criterion (AIC) and Bayesian information criterion (BIC), which are popular methods in the literature. We show the proposed algorithms are more robust than AIC and BIC when the Gaussian assumption is violated.
Using Gaussian mixture models for clustering is a statistically mature method for clustering in data science with numerous successful applications in science and engineering. The parameters for a Gaussian mixture model (GMM) are typically estimated from training data using the iterative expectation-maximization algorithm, which requires the number of Gaussian components a priori. In this study, we propose two algorithms rooted in numerical algebraic geometry (NAG), namely, an area-based algorithm and a local maxima algorithm, to identify the optimal number of components. The area-based algorithm transforms several GMM with varying number of components into sets of equivalent polynomial regression splines. Next, it uses homotopy continuation methods for evaluating the resulting splines to identify the number of components that is most compatible with the gradient data. The local maxima algorithm forms a set of polynomials by fitting a smoothing spline to a dataset. Next, it uses NAG to solve the system of the first derivatives for finding the local maxima of the resulting smoothing spline, which represent the number of mixture components. The local maxima algorithm also identifies the location of the centers of Gaussian components. Using a real-world case study in automotive manufacturing and extensive simulations, we demonstrate that the performance of the proposed algorithms is comparable with that of Akaike information criterion (AIC) and Bayesian information criterion (BIC), which are popular methods in the literature. We also show the proposed algorithms are more robust than AIC and BIC when the Gaussian assumption is violated.
Author Shirinkam, Sara
Gross, Elizabeth
Alaeddini, Adel
Author_xml – sequence: 1
  givenname: Sara
  surname: Shirinkam
  fullname: Shirinkam, Sara
– sequence: 2
  givenname: Adel
  surname: Alaeddini
  fullname: Alaeddini, Adel
– sequence: 3
  givenname: Elizabeth
  surname: Gross
  fullname: Gross, Elizabeth
BackLink https://www.ncbi.nlm.nih.gov/pubmed/33867617$$D View this record in MEDLINE/PubMed
BookMark eNp9kktP3DAUha2KqgzQH9BNZanrUD9jZ4OEEC8JiQVlbTm2Mxgl9tROCvPvcRgeBaquLOvc79xzr70DtkIMDoBvGO1jzMjPK0RwwxopCeKIICY-gQUWDa1qSZotsJjlata3wU7Ot6jcMadfwDalshY1FgvQnlsXRt-tfVjC8cbBMA2tSzB20MRhVfqFMUMf4KmecvY6wMHfj1NycIjW9RlOeSYL5ZI3uoe6X7o2aW_g0sXBjWm9Bz53us_u69O5C65Pjn8dnVUXl6fnR4cXlWG1FBUlLTENQZYaLY3grRDcUows1pZQjjoumMXzNAJxYbhtu7qhWuCacowIp7uAbHynsNLrO933apX8oNNaYaTmhan8fmEFOthAq6kdnDVl3KRfwai9eqsEf6OW8Y-SiEmGSTH48WSQ4u_J5VHdximFMqgijEpcE8pkqfr-d5sX_-eHKAV4U2BSzDm57kP2q39kF-8Y40c9-jgn9f1_SbQh72LqbTb-8Q9489L0I_IAfru6zQ
CitedBy_id crossref_primary_10_1002_btpr_3490
crossref_primary_10_1016_j_scitotenv_2023_163709
crossref_primary_10_1137_23M1610082
crossref_primary_10_1007_s41468_021_00077_z
crossref_primary_10_1109_ACCESS_2024_3362647
Cites_doi 10.1111/j.2517-6161.1991.tb01857.x
10.1080/07474930008800475
10.1145/2331130.2331136
10.1080/08120099.2016.1143876
10.1111/j.1467-9892.1993.tb00144.x
10.1016/0005-1098(78)90005-5
10.1214/15-EJS1026
10.1016/S0167-7152(98)00003-0
10.18409/jas.v7i1.42
10.1142/5763
10.1198/016214502760047131
10.1111/rssb.12187
10.1109/34.888716
10.1016/j.jsc.2016.07.019
10.1214/009053605000000417
10.1093/comjnl/41.8.578
10.1109/JSYST.2011.2165597
10.1145/317275.317286
10.1093/biomet/asv027
10.1080/03610920903094899
10.1137/1.9781611972702
10.1016/j.difgeo.2017.07.009
10.1145/2608628.2608659
10.1098/rsif.2016.0256
10.1111/j.1475-6803.1990.tb00633.x
10.1002/widm.1135
ContentType Journal Article
Copyright 2020, World Scientific Publishing Company
2020. World Scientific Publishing Company
Copyright_xml – notice: 2020, World Scientific Publishing Company
– notice: 2020. World Scientific Publishing Company
DBID AAYXX
CITATION
NPM
5PM
ADTOC
UNPAY
DOI 10.1142/S0219498820502047
DatabaseName CrossRef
PubMed
PubMed Central (Full Participant titles)
Unpaywall for CDI: Periodical Content
Unpaywall
DatabaseTitle CrossRef
PubMed
DatabaseTitleList PubMed
CrossRef



Database_xml – sequence: 1
  dbid: NPM
  name: PubMed
  url: https://proxy.k.utb.cz/login?url=http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=PubMed
  sourceTypes: Index Database
– sequence: 2
  dbid: UNPAY
  name: Unpaywall
  url: https://proxy.k.utb.cz/login?url=https://unpaywall.org/
  sourceTypes: Open Access Repository
DeliveryMethod fulltext_linktorsrc
Discipline Mathematics
EISSN 1793-6829
ExternalDocumentID oai:pubmedcentral.nih.gov:8048412
PMC8048412
33867617
10_1142_S0219498820502047
S0219498820502047
Genre Journal Article
GrantInformation_xml – fundername: NIGMS NIH HHS
  grantid: SC2 GM118266
GroupedDBID 0R~
4.4
5GY
ADSJI
AEILP
AENEX
ALMA_UNASSIGNED_HOLDINGS
CAG
COF
CS3
DU5
EBS
EJD
ESX
HZ~
J9A
O9-
P2P
P71
RWJ
AAYXX
AMVHM
CITATION
ABDNZ
ACYGS
NPM
5PM
ADTOC
UNPAY
ID FETCH-LOGICAL-c4687-32b2c920d3ca8c75b775d310d1ad2350f574d149887057c5dbf693a7163510253
IEDL.DBID UNPAY
ISSN 0219-4988
1793-6829
IngestDate Sun Oct 26 02:47:59 EDT 2025
Thu Aug 21 18:09:31 EDT 2025
Mon Jun 30 06:45:09 EDT 2025
Thu Jan 02 22:37:08 EST 2025
Wed Oct 01 03:34:06 EDT 2025
Thu Apr 24 23:11:52 EDT 2025
Mon Nov 25 02:43:55 EST 2024
IsDoiOpenAccess false
IsOpenAccess true
IsPeerReviewed true
IsScholarly true
Issue 11
Keywords numerical algebraic geometry
smoothing spline
model-based clustering
Mixture models
Language English
LinkModel DirectLink
MergedId FETCHMERGED-LOGICAL-c4687-32b2c920d3ca8c75b775d310d1ad2350f574d149887057c5dbf693a7163510253
Notes ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 14
ORCID 0000-0002-8153-4754
OpenAccessLink https://proxy.k.utb.cz/login?url=https://www.ncbi.nlm.nih.gov/pmc/articles/8048412
PMID 33867617
PQID 2438162348
PQPubID 2049850
ParticipantIDs pubmedcentral_primary_oai_pubmedcentral_nih_gov_8048412
proquest_journals_2438162348
worldscientific_primary_S0219498820502047
unpaywall_primary_10_1142_s0219498820502047
pubmed_primary_33867617
crossref_primary_10_1142_S0219498820502047
crossref_citationtrail_10_1142_S0219498820502047
ProviderPackageCode CITATION
AAYXX
PublicationCentury 2000
PublicationDate 20201100
PublicationDateYYYYMMDD 2020-11-01
PublicationDate_xml – month: 11
  year: 2020
  text: 20201100
PublicationDecade 2020
PublicationPlace Singapore
PublicationPlace_xml – name: Singapore
– name: River Edge
PublicationTitle Journal of algebra and its applications
PublicationTitleAlternate J Algebra Appl
PublicationYear 2020
Publisher World Scientific Publishing Company
World Scientific Publishing Co. Pte., Ltd
Publisher_xml – name: World Scientific Publishing Company
– name: World Scientific Publishing Co. Pte., Ltd
References S0219498820502047BIB019
Hu S. (S0219498820502047BIB028) 2007; 93
S0219498820502047BIB014
S0219498820502047BIB038
S0219498820502047BIB017
S0219498820502047BIB039
S0219498820502047BIB010
S0219498820502047BIB011
S0219498820502047BIB035
Hauenstein J. D. (S0219498820502047BIB026) 2014
Hastie T. (S0219498820502047BIB025) 1990
S0219498820502047BIB031
Li J. (S0219498820502047BIB034) 2007; 8
Améndola C. (S0219498820502047BIB004) 2016
De Boor C. (S0219498820502047BIB018) 2008; 27
S0219498820502047BIB007
S0219498820502047BIB029
S0219498820502047BIB009
Putinar M. (S0219498820502047BIB037) 2008; 149
S0219498820502047BIB003
Améndola C. (S0219498820502047BIB001) 2015
S0219498820502047BIB005
S0219498820502047BIB027
S0219498820502047BIB006
S0219498820502047BIB021
Sheather S. J. (S0219498820502047BIB041) 1991
S0219498820502047BIB043
S0219498820502047BIB044
Chen S. (S0219498820502047BIB015) 1998; 8
S0219498820502047BIB023
S0219498820502047BIB045
S0219498820502047BIB024
S0219498820502047BIB040
S0219498820502047BIB020
Jain A. K. (S0219498820502047BIB032) 1988
S0219498820502047BIB042
Hurvich C. M. (S0219498820502047BIB030) 1991; 78
Burnham K. P. (S0219498820502047BIB012) 2003
References_xml – year: 2014
  ident: S0219498820502047BIB026
  publication-title: J. Reine Ang. Math.
– start-page: 579
  volume-title: Int. Conf. Mathematical Aspects of Computer and Information Sciences
  year: 2015
  ident: S0219498820502047BIB001
– start-page: 683
  year: 1991
  ident: S0219498820502047BIB041
  publication-title: J. Royal Stat. Soc. Ser. B (Methodological)
  doi: 10.1111/j.2517-6161.1991.tb01857.x
– ident: S0219498820502047BIB011
  doi: 10.1080/07474930008800475
– ident: S0219498820502047BIB027
  doi: 10.1145/2331130.2331136
– ident: S0219498820502047BIB006
  doi: 10.1080/08120099.2016.1143876
– volume: 27
  volume-title: A practical guide to splines
  year: 2008
  ident: S0219498820502047BIB018
– ident: S0219498820502047BIB031
  doi: 10.1111/j.1467-9892.1993.tb00144.x
– ident: S0219498820502047BIB039
  doi: 10.1016/0005-1098(78)90005-5
– ident: S0219498820502047BIB009
  doi: 10.1214/15-EJS1026
– ident: S0219498820502047BIB045
  doi: 10.1016/S0167-7152(98)00003-0
– volume-title: Generalized additive models
  year: 1990
  ident: S0219498820502047BIB025
– ident: S0219498820502047BIB003
  doi: 10.18409/jas.v7i1.42
– volume: 93
  year: 2007
  ident: S0219498820502047BIB028
  publication-title: Center Res. Sci. Comput.
– volume: 78
  start-page: 499
  issue: 3
  year: 1991
  ident: S0219498820502047BIB030
  publication-title: Biometrika
– ident: S0219498820502047BIB042
  doi: 10.1142/5763
– year: 2016
  ident: S0219498820502047BIB004
  publication-title: Int. Math. Res. Noti.
– volume-title: Algorithms for Clustering Data
  year: 1988
  ident: S0219498820502047BIB032
– volume: 8
  start-page: 1687
  year: 2007
  ident: S0219498820502047BIB034
  publication-title: J. Mach. Learn. Res.
– ident: S0219498820502047BIB020
  doi: 10.1198/016214502760047131
– ident: S0219498820502047BIB019
  doi: 10.1111/rssb.12187
– ident: S0219498820502047BIB014
  doi: 10.1109/34.888716
– ident: S0219498820502047BIB017
  doi: 10.1016/j.jsc.2016.07.019
– ident: S0219498820502047BIB038
  doi: 10.1214/009053605000000417
– ident: S0219498820502047BIB021
  doi: 10.1093/comjnl/41.8.578
– volume: 149
  volume-title: Emerging Applications of Algebraic Geometry
  year: 2008
  ident: S0219498820502047BIB037
– ident: S0219498820502047BIB044
  doi: 10.1109/JSYST.2011.2165597
– volume-title: Model Selection and Multimodel Inference: A Practical Information-Theoretic Approach
  year: 2003
  ident: S0219498820502047BIB012
– ident: S0219498820502047BIB043
  doi: 10.1145/317275.317286
– ident: S0219498820502047BIB029
  doi: 10.1093/biomet/asv027
– ident: S0219498820502047BIB005
  doi: 10.1080/03610920903094899
– ident: S0219498820502047BIB007
  doi: 10.1137/1.9781611972702
– ident: S0219498820502047BIB010
  doi: 10.1016/j.difgeo.2017.07.009
– volume: 8
  start-page: 127
  volume-title: Proc. darpa broadcast news transcription and understanding workshop
  year: 1998
  ident: S0219498820502047BIB015
– ident: S0219498820502047BIB024
  doi: 10.1145/2608628.2608659
– ident: S0219498820502047BIB023
  doi: 10.1098/rsif.2016.0256
– ident: S0219498820502047BIB040
  doi: 10.1111/j.1475-6803.1990.tb00633.x
– ident: S0219498820502047BIB035
  doi: 10.1002/widm.1135
SSID ssj0021153
ssib023645929
Score 2.203695
Snippet Using Gaussian mixture models for clustering is a statistically mature method for clustering in data science with numerous successful applications in science...
SourceID unpaywall
pubmedcentral
proquest
pubmed
crossref
worldscientific
SourceType Open Access Repository
Aggregation Database
Index Database
Enrichment Source
Publisher
StartPage 2050204
SubjectTerms Algebra
Algorithms
Clustering
Computer simulation
Continuation methods
Criteria
Data smoothing
Identification methods
Iterative methods
Optimization
Polynomials
Probabilistic models
Research Article
Robustness (mathematics)
Spline functions
Title Identifying the number of components in Gaussian mixture models using numerical algebraic geometry
URI http://www.worldscientific.com/doi/abs/10.1142/S0219498820502047
https://www.ncbi.nlm.nih.gov/pubmed/33867617
https://www.proquest.com/docview/2438162348
https://pubmed.ncbi.nlm.nih.gov/PMC8048412
https://www.ncbi.nlm.nih.gov/pmc/articles/8048412
UnpaywallVersion submittedVersion
Volume 19
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
journalDatabaseRights – providerCode: PRVEBS
  databaseName: Mathematics Source
  customDbUrl:
  eissn: 1793-6829
  dateEnd: 20241102
  omitProxy: false
  ssIdentifier: ssj0021153
  issn: 0219-4988
  databaseCode: AMVHM
  dateStart: 20020301
  isFulltext: true
  titleUrlDefault: https://www.ebsco.com/products/research-databases/mathematics-source
  providerName: EBSCOhost
link http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV1Nj9MwEB0t7QE4LJ8LgWXlAxdAaRPHjpNjhVgqpFYroGg5RbHjLBFtWm1awfLr8Tgfoo2EtJcoiu0o1ozt5_jNPIDXmpnRHLDYFZnHXKY97cZcCVdSX0rpp1Fs1Rpm83C6YJ8u-eUR-G0sjCXtK1mMyuVqVBY_LLdys1Ljlic2jozPMdQVHobcwO8BDBfzi8l3-y_FR8U0qzWJfueGEY2bk0yf0XGFFbCcehxjQsX-WtQDmH2e5N1duUlvfqVLc39sU5rWYYvI6vlnWTp_AJ_bDtVslJ-j3VaO1J-DXI-36vFDOG5AKpnURY_gSJeP4f6sy_BaPQFZR_jaKClinpNaWoSsc4Is9XWJBA1SlORjuqswUJOsit94WkGs9k5FkHB_ha3sidGSoN6I2bkXilzp9Upvr2-ewuL8w9f3U7fRa3AVC3GuopKqmHpZoNJICS6F4JmBj5mfZjTgXs4Fy3w0gDAoUfFM5mEcpGbHFpiZgfLgBAal-b7nQPxQa8ayXOIOlEY6pUzleRAZtCYyxXIHvNZyiWqSmaOmxjKpA61p8uXQ2A687Zps6kwe_6t82rpD0gzqKqGYDs3ARRY58Kz2jO5NZqcfCoMGHRB7PtNVwDTe-yXG1jadd2NeB9513tX7wJ7rOvDmwP-6Nr3OvLjVm1_CPYr_FGy85SkMttc7_coAr608g-Fk9m06O4M78wtzrQfeX79aKf8
linkProvider Unpaywall
linkToUnpaywall http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV1La9wwEB7C5tD2kPRdp2nRoZe2eNeWJcs-hpI0FBJK24X0ZKyHE9Nd7xLv0iS_vhr5QXYNhdyMJRmLGUmfpG_mA_hgmB3NEUt9oQPmMxMYP-VK-JKGUsowT1Kn1nB2Hp9O2bcLfrEDYRcL40j7SpbjajYfV-WV41Yu52rS8cQmifU5hrrCuzG38HsEu9Pz70e_3VlKiIppTmsS_c6PE5q2N5kho5MaK2A5DTjGhIrNtWgAMIc8yUfrapnf_s1n9nnPpTRtwhaR1XNvWTrZhx9dhxo2yp_xeiXH6m4r1-ODevwU9lqQSo6aomewY6rn8OSsz_BavwDZRPi6KCli35NGWoQsCoIs9UWFBA1SVuRrvq4xUJPMyxu8rSBOe6cmSLi_xFbuxmhGUG_E7txLRS7NYm5W17cvYXpy_OvLqd_qNfiKxThXUUlVSgMdqTxRgkshuLbwUYe5phEPCi6YDtEAwqJExbUs4jTK7Y4tsjMD5dErGFX2_94ACWNjGNOFxB0oTUxOmSqKKLFoTWjFCg-CznKZapOZo6bGLGsCrWn2c9vYHnzqmyybTB7_q3zYuUPWDuo6o5gOzcJFlnjwuvGM_kt2px8LiwY9EBs-01fANN6bJdbWLp13a14PPvfeNfjBget68HHL__o2g84cPOjLb-ExxTMFF295CKPV9dq8s8BrJd-3Q-0fiTUnVw
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Identifying+the+number+of+components+in+Gaussian+mixture+models+using+numerical+algebraic+geometry&rft.jtitle=Journal+of+algebra+and+its+applications&rft.au=Shirinkam%2C+Sara&rft.au=Alaeddini%2C+Adel&rft.au=Gross%2C+Elizabeth&rft.date=2020-11-01&rft.pub=World+Scientific+Publishing+Company&rft.issn=0219-4988&rft.eissn=1793-6829&rft.volume=19&rft.issue=11&rft_id=info:doi/10.1142%2FS0219498820502047&rft.externalDocID=S0219498820502047
thumbnail_l http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=0219-4988&client=summon
thumbnail_m http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=0219-4988&client=summon
thumbnail_s http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=0219-4988&client=summon