Identifying the number of components in Gaussian mixture models using numerical algebraic geometry
Using Gaussian mixture models for clustering is a statistically mature method for clustering in data science with numerous successful applications in science and engineering. The parameters for a Gaussian mixture model (GMM) are typically estimated from training data using the iterative expectation-...
        Saved in:
      
    
          | Published in | Journal of algebra and its applications Vol. 19; no. 11; p. 2050204 | 
|---|---|
| Main Authors | , , | 
| Format | Journal Article | 
| Language | English | 
| Published | 
        Singapore
          World Scientific Publishing Company
    
        01.11.2020
     World Scientific Publishing Co. Pte., Ltd  | 
| Subjects | |
| Online Access | Get full text | 
| ISSN | 0219-4988 1793-6829 1793-6829  | 
| DOI | 10.1142/S0219498820502047 | 
Cover
| Abstract | Using Gaussian mixture models for clustering is a statistically mature method for clustering in data science with numerous successful applications in science and engineering. The parameters for a Gaussian mixture model (GMM) are typically estimated from training data using the iterative expectation-maximization algorithm, which requires the number of Gaussian components a priori. In this study, we propose two algorithms rooted in numerical algebraic geometry (NAG), namely, an area-based algorithm and a local maxima algorithm, to identify the optimal number of components. The area-based algorithm transforms several GMM with varying number of components into sets of equivalent polynomial regression splines. Next, it uses homotopy continuation methods for evaluating the resulting splines to identify the number of components that is most compatible with the gradient data. The local maxima algorithm forms a set of polynomials by fitting a smoothing spline to a dataset. Next, it uses NAG to solve the system of the first derivatives for finding the local maxima of the resulting smoothing spline, which represent the number of mixture components. The local maxima algorithm also identifies the location of the centers of Gaussian components. Using a real-world case study in automotive manufacturing and extensive simulations, we demonstrate that the performance of the proposed algorithms is comparable with that of Akaike information criterion (AIC) and Bayesian information criterion (BIC), which are popular methods in the literature. We also show the proposed algorithms are more robust than AIC and BIC when the Gaussian assumption is violated. | 
    
|---|---|
| AbstractList | Using Gaussian mixture models for clustering is a statistically mature method for clustering in data science with numerous successful applications in science and engineering. The parameters for a Gaussian mixture model are typically estimated from training data using the iterative expectation-maximization algorithm, which requires the number of Gaussian components a priori. In this study, we propose two algorithms rooted in numerical algebraic geometry, namely an area-based algorithm and a local maxima algorithm, to identify the optimal number of components. The area-based algorithm transforms several Gaussian mixture models with varying number of components into sets of equivalent polynomial regression splines. Next, it uses homotopy continuation methods for evaluating the resulting splines to identify the number of components that results in the best fit. The local maxima algorithm forms a set of polynomials by fitting a smoothing spline to a kernel density estimate of the data. Next, it uses numerical algebraic geometry to solve the system of the first derivatives for finding the local maxima of the resulting smoothing spline, which estimates the number of mixture components. The local maxima algorithm also identifies the location of the centers of Gaussian components. Using a real-world case study in automotive manufacturing and multiple simulations, we compare the performance of the proposed algorithms with that of Akaike information criterion (AIC) and Bayesian information criterion (BIC), which are popular methods in the literature. We show the proposed algorithms are more robust than AIC and BIC when the Gaussian assumption is violated. Using Gaussian mixture models for clustering is a statistically mature method for clustering in data science with numerous successful applications in science and engineering. The parameters for a Gaussian mixture model (GMM) are typically estimated from training data using the iterative expectation-maximization algorithm, which requires the number of Gaussian components a priori. In this study, we propose two algorithms rooted in numerical algebraic geometry (NAG), namely, an area-based algorithm and a local maxima algorithm, to identify the optimal number of components. The area-based algorithm transforms several GMM with varying number of components into sets of equivalent polynomial regression splines. Next, it uses homotopy continuation methods for evaluating the resulting splines to identify the number of components that is most compatible with the gradient data. The local maxima algorithm forms a set of polynomials by fitting a smoothing spline to a dataset. Next, it uses NAG to solve the system of the first derivatives for finding the local maxima of the resulting smoothing spline, which represent the number of mixture components. The local maxima algorithm also identifies the location of the centers of Gaussian components. Using a real-world case study in automotive manufacturing and extensive simulations, we demonstrate that the performance of the proposed algorithms is comparable with that of Akaike information criterion (AIC) and Bayesian information criterion (BIC), which are popular methods in the literature. We also show the proposed algorithms are more robust than AIC and BIC when the Gaussian assumption is violated.  | 
    
| Author | Shirinkam, Sara Gross, Elizabeth Alaeddini, Adel  | 
    
| Author_xml | – sequence: 1 givenname: Sara surname: Shirinkam fullname: Shirinkam, Sara – sequence: 2 givenname: Adel surname: Alaeddini fullname: Alaeddini, Adel – sequence: 3 givenname: Elizabeth surname: Gross fullname: Gross, Elizabeth  | 
    
| BackLink | https://www.ncbi.nlm.nih.gov/pubmed/33867617$$D View this record in MEDLINE/PubMed | 
    
| BookMark | eNp9kktP3DAUha2KqgzQH9BNZanrUD9jZ4OEEC8JiQVlbTm2Mxgl9tROCvPvcRgeBaquLOvc79xzr70DtkIMDoBvGO1jzMjPK0RwwxopCeKIICY-gQUWDa1qSZotsJjlata3wU7Ot6jcMadfwDalshY1FgvQnlsXRt-tfVjC8cbBMA2tSzB20MRhVfqFMUMf4KmecvY6wMHfj1NycIjW9RlOeSYL5ZI3uoe6X7o2aW_g0sXBjWm9Bz53us_u69O5C65Pjn8dnVUXl6fnR4cXlWG1FBUlLTENQZYaLY3grRDcUows1pZQjjoumMXzNAJxYbhtu7qhWuCacowIp7uAbHynsNLrO933apX8oNNaYaTmhan8fmEFOthAq6kdnDVl3KRfwai9eqsEf6OW8Y-SiEmGSTH48WSQ4u_J5VHdximFMqgijEpcE8pkqfr-d5sX_-eHKAV4U2BSzDm57kP2q39kF-8Y40c9-jgn9f1_SbQh72LqbTb-8Q9489L0I_IAfru6zQ | 
    
| CitedBy_id | crossref_primary_10_1002_btpr_3490 crossref_primary_10_1016_j_scitotenv_2023_163709 crossref_primary_10_1137_23M1610082 crossref_primary_10_1007_s41468_021_00077_z crossref_primary_10_1109_ACCESS_2024_3362647  | 
    
| Cites_doi | 10.1111/j.2517-6161.1991.tb01857.x 10.1080/07474930008800475 10.1145/2331130.2331136 10.1080/08120099.2016.1143876 10.1111/j.1467-9892.1993.tb00144.x 10.1016/0005-1098(78)90005-5 10.1214/15-EJS1026 10.1016/S0167-7152(98)00003-0 10.18409/jas.v7i1.42 10.1142/5763 10.1198/016214502760047131 10.1111/rssb.12187 10.1109/34.888716 10.1016/j.jsc.2016.07.019 10.1214/009053605000000417 10.1093/comjnl/41.8.578 10.1109/JSYST.2011.2165597 10.1145/317275.317286 10.1093/biomet/asv027 10.1080/03610920903094899 10.1137/1.9781611972702 10.1016/j.difgeo.2017.07.009 10.1145/2608628.2608659 10.1098/rsif.2016.0256 10.1111/j.1475-6803.1990.tb00633.x 10.1002/widm.1135  | 
    
| ContentType | Journal Article | 
    
| Copyright | 2020, World Scientific Publishing Company 2020. World Scientific Publishing Company  | 
    
| Copyright_xml | – notice: 2020, World Scientific Publishing Company – notice: 2020. World Scientific Publishing Company  | 
    
| DBID | AAYXX CITATION NPM 5PM ADTOC UNPAY  | 
    
| DOI | 10.1142/S0219498820502047 | 
    
| DatabaseName | CrossRef PubMed PubMed Central (Full Participant titles) Unpaywall for CDI: Periodical Content Unpaywall  | 
    
| DatabaseTitle | CrossRef PubMed  | 
    
| DatabaseTitleList | PubMed CrossRef  | 
    
| Database_xml | – sequence: 1 dbid: NPM name: PubMed url: https://proxy.k.utb.cz/login?url=http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=PubMed sourceTypes: Index Database – sequence: 2 dbid: UNPAY name: Unpaywall url: https://proxy.k.utb.cz/login?url=https://unpaywall.org/ sourceTypes: Open Access Repository  | 
    
| DeliveryMethod | fulltext_linktorsrc | 
    
| Discipline | Mathematics | 
    
| EISSN | 1793-6829 | 
    
| ExternalDocumentID | oai:pubmedcentral.nih.gov:8048412 PMC8048412 33867617 10_1142_S0219498820502047 S0219498820502047  | 
    
| Genre | Journal Article | 
    
| GrantInformation_xml | – fundername: NIGMS NIH HHS grantid: SC2 GM118266  | 
    
| GroupedDBID | 0R~ 4.4 5GY ADSJI AEILP AENEX ALMA_UNASSIGNED_HOLDINGS CAG COF CS3 DU5 EBS EJD ESX HZ~ J9A O9- P2P P71 RWJ AAYXX AMVHM CITATION ABDNZ ACYGS NPM 5PM ADTOC UNPAY  | 
    
| ID | FETCH-LOGICAL-c4687-32b2c920d3ca8c75b775d310d1ad2350f574d149887057c5dbf693a7163510253 | 
    
| IEDL.DBID | UNPAY | 
    
| ISSN | 0219-4988 1793-6829  | 
    
| IngestDate | Sun Oct 26 02:47:59 EDT 2025 Thu Aug 21 18:09:31 EDT 2025 Mon Jun 30 06:45:09 EDT 2025 Thu Jan 02 22:37:08 EST 2025 Wed Oct 01 03:34:06 EDT 2025 Thu Apr 24 23:11:52 EDT 2025 Mon Nov 25 02:43:55 EST 2024  | 
    
| IsDoiOpenAccess | false | 
    
| IsOpenAccess | true | 
    
| IsPeerReviewed | true | 
    
| IsScholarly | true | 
    
| Issue | 11 | 
    
| Keywords | numerical algebraic geometry smoothing spline model-based clustering Mixture models  | 
    
| Language | English | 
    
| LinkModel | DirectLink | 
    
| MergedId | FETCHMERGED-LOGICAL-c4687-32b2c920d3ca8c75b775d310d1ad2350f574d149887057c5dbf693a7163510253 | 
    
| Notes | ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14  | 
    
| ORCID | 0000-0002-8153-4754 | 
    
| OpenAccessLink | https://proxy.k.utb.cz/login?url=https://www.ncbi.nlm.nih.gov/pmc/articles/8048412 | 
    
| PMID | 33867617 | 
    
| PQID | 2438162348 | 
    
| PQPubID | 2049850 | 
    
| ParticipantIDs | pubmedcentral_primary_oai_pubmedcentral_nih_gov_8048412 proquest_journals_2438162348 worldscientific_primary_S0219498820502047 unpaywall_primary_10_1142_s0219498820502047 pubmed_primary_33867617 crossref_primary_10_1142_S0219498820502047 crossref_citationtrail_10_1142_S0219498820502047  | 
    
| ProviderPackageCode | CITATION AAYXX  | 
    
| PublicationCentury | 2000 | 
    
| PublicationDate | 20201100 | 
    
| PublicationDateYYYYMMDD | 2020-11-01 | 
    
| PublicationDate_xml | – month: 11 year: 2020 text: 20201100  | 
    
| PublicationDecade | 2020 | 
    
| PublicationPlace | Singapore | 
    
| PublicationPlace_xml | – name: Singapore – name: River Edge  | 
    
| PublicationTitle | Journal of algebra and its applications | 
    
| PublicationTitleAlternate | J Algebra Appl | 
    
| PublicationYear | 2020 | 
    
| Publisher | World Scientific Publishing Company World Scientific Publishing Co. Pte., Ltd  | 
    
| Publisher_xml | – name: World Scientific Publishing Company – name: World Scientific Publishing Co. Pte., Ltd  | 
    
| References | S0219498820502047BIB019 Hu S. (S0219498820502047BIB028) 2007; 93 S0219498820502047BIB014 S0219498820502047BIB038 S0219498820502047BIB017 S0219498820502047BIB039 S0219498820502047BIB010 S0219498820502047BIB011 S0219498820502047BIB035 Hauenstein J. D. (S0219498820502047BIB026) 2014 Hastie T. (S0219498820502047BIB025) 1990 S0219498820502047BIB031 Li J. (S0219498820502047BIB034) 2007; 8 Améndola C. (S0219498820502047BIB004) 2016 De Boor C. (S0219498820502047BIB018) 2008; 27 S0219498820502047BIB007 S0219498820502047BIB029 S0219498820502047BIB009 Putinar M. (S0219498820502047BIB037) 2008; 149 S0219498820502047BIB003 Améndola C. (S0219498820502047BIB001) 2015 S0219498820502047BIB005 S0219498820502047BIB027 S0219498820502047BIB006 S0219498820502047BIB021 Sheather S. J. (S0219498820502047BIB041) 1991 S0219498820502047BIB043 S0219498820502047BIB044 Chen S. (S0219498820502047BIB015) 1998; 8 S0219498820502047BIB023 S0219498820502047BIB045 S0219498820502047BIB024 S0219498820502047BIB040 S0219498820502047BIB020 Jain A. K. (S0219498820502047BIB032) 1988 S0219498820502047BIB042 Hurvich C. M. (S0219498820502047BIB030) 1991; 78 Burnham K. P. (S0219498820502047BIB012) 2003  | 
    
| References_xml | – year: 2014 ident: S0219498820502047BIB026 publication-title: J. Reine Ang. Math. – start-page: 579 volume-title: Int. Conf. Mathematical Aspects of Computer and Information Sciences year: 2015 ident: S0219498820502047BIB001 – start-page: 683 year: 1991 ident: S0219498820502047BIB041 publication-title: J. Royal Stat. Soc. Ser. B (Methodological) doi: 10.1111/j.2517-6161.1991.tb01857.x – ident: S0219498820502047BIB011 doi: 10.1080/07474930008800475 – ident: S0219498820502047BIB027 doi: 10.1145/2331130.2331136 – ident: S0219498820502047BIB006 doi: 10.1080/08120099.2016.1143876 – volume: 27 volume-title: A practical guide to splines year: 2008 ident: S0219498820502047BIB018 – ident: S0219498820502047BIB031 doi: 10.1111/j.1467-9892.1993.tb00144.x – ident: S0219498820502047BIB039 doi: 10.1016/0005-1098(78)90005-5 – ident: S0219498820502047BIB009 doi: 10.1214/15-EJS1026 – ident: S0219498820502047BIB045 doi: 10.1016/S0167-7152(98)00003-0 – volume-title: Generalized additive models year: 1990 ident: S0219498820502047BIB025 – ident: S0219498820502047BIB003 doi: 10.18409/jas.v7i1.42 – volume: 93 year: 2007 ident: S0219498820502047BIB028 publication-title: Center Res. Sci. Comput. – volume: 78 start-page: 499 issue: 3 year: 1991 ident: S0219498820502047BIB030 publication-title: Biometrika – ident: S0219498820502047BIB042 doi: 10.1142/5763 – year: 2016 ident: S0219498820502047BIB004 publication-title: Int. Math. Res. Noti. – volume-title: Algorithms for Clustering Data year: 1988 ident: S0219498820502047BIB032 – volume: 8 start-page: 1687 year: 2007 ident: S0219498820502047BIB034 publication-title: J. Mach. Learn. Res. – ident: S0219498820502047BIB020 doi: 10.1198/016214502760047131 – ident: S0219498820502047BIB019 doi: 10.1111/rssb.12187 – ident: S0219498820502047BIB014 doi: 10.1109/34.888716 – ident: S0219498820502047BIB017 doi: 10.1016/j.jsc.2016.07.019 – ident: S0219498820502047BIB038 doi: 10.1214/009053605000000417 – ident: S0219498820502047BIB021 doi: 10.1093/comjnl/41.8.578 – volume: 149 volume-title: Emerging Applications of Algebraic Geometry year: 2008 ident: S0219498820502047BIB037 – ident: S0219498820502047BIB044 doi: 10.1109/JSYST.2011.2165597 – volume-title: Model Selection and Multimodel Inference: A Practical Information-Theoretic Approach year: 2003 ident: S0219498820502047BIB012 – ident: S0219498820502047BIB043 doi: 10.1145/317275.317286 – ident: S0219498820502047BIB029 doi: 10.1093/biomet/asv027 – ident: S0219498820502047BIB005 doi: 10.1080/03610920903094899 – ident: S0219498820502047BIB007 doi: 10.1137/1.9781611972702 – ident: S0219498820502047BIB010 doi: 10.1016/j.difgeo.2017.07.009 – volume: 8 start-page: 127 volume-title: Proc. darpa broadcast news transcription and understanding workshop year: 1998 ident: S0219498820502047BIB015 – ident: S0219498820502047BIB024 doi: 10.1145/2608628.2608659 – ident: S0219498820502047BIB023 doi: 10.1098/rsif.2016.0256 – ident: S0219498820502047BIB040 doi: 10.1111/j.1475-6803.1990.tb00633.x – ident: S0219498820502047BIB035 doi: 10.1002/widm.1135  | 
    
| SSID | ssj0021153 ssib023645929  | 
    
| Score | 2.203695 | 
    
| Snippet | Using Gaussian mixture models for clustering is a statistically mature method for clustering in data science with numerous successful applications in science... | 
    
| SourceID | unpaywall pubmedcentral proquest pubmed crossref worldscientific  | 
    
| SourceType | Open Access Repository Aggregation Database Index Database Enrichment Source Publisher  | 
    
| StartPage | 2050204 | 
    
| SubjectTerms | Algebra Algorithms Clustering Computer simulation Continuation methods Criteria Data smoothing Identification methods Iterative methods Optimization Polynomials Probabilistic models Research Article Robustness (mathematics) Spline functions  | 
    
| Title | Identifying the number of components in Gaussian mixture models using numerical algebraic geometry | 
    
| URI | http://www.worldscientific.com/doi/abs/10.1142/S0219498820502047 https://www.ncbi.nlm.nih.gov/pubmed/33867617 https://www.proquest.com/docview/2438162348 https://pubmed.ncbi.nlm.nih.gov/PMC8048412 https://www.ncbi.nlm.nih.gov/pmc/articles/8048412  | 
    
| UnpaywallVersion | submittedVersion | 
    
| Volume | 19 | 
    
| hasFullText | 1 | 
    
| inHoldings | 1 | 
    
| isFullTextHit | |
| isPrint | |
| journalDatabaseRights | – providerCode: PRVEBS databaseName: Mathematics Source customDbUrl: eissn: 1793-6829 dateEnd: 20241102 omitProxy: false ssIdentifier: ssj0021153 issn: 0219-4988 databaseCode: AMVHM dateStart: 20020301 isFulltext: true titleUrlDefault: https://www.ebsco.com/products/research-databases/mathematics-source providerName: EBSCOhost  | 
    
| link | http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV1Nj9MwEB0t7QE4LJ8LgWXlAxdAaRPHjpNjhVgqpFYroGg5RbHjLBFtWm1awfLr8Tgfoo2EtJcoiu0o1ozt5_jNPIDXmpnRHLDYFZnHXKY97cZcCVdSX0rpp1Fs1Rpm83C6YJ8u-eUR-G0sjCXtK1mMyuVqVBY_LLdys1Ljlic2jozPMdQVHobcwO8BDBfzi8l3-y_FR8U0qzWJfueGEY2bk0yf0XGFFbCcehxjQsX-WtQDmH2e5N1duUlvfqVLc39sU5rWYYvI6vlnWTp_AJ_bDtVslJ-j3VaO1J-DXI-36vFDOG5AKpnURY_gSJeP4f6sy_BaPQFZR_jaKClinpNaWoSsc4Is9XWJBA1SlORjuqswUJOsit94WkGs9k5FkHB_ha3sidGSoN6I2bkXilzp9Upvr2-ewuL8w9f3U7fRa3AVC3GuopKqmHpZoNJICS6F4JmBj5mfZjTgXs4Fy3w0gDAoUfFM5mEcpGbHFpiZgfLgBAal-b7nQPxQa8ayXOIOlEY6pUzleRAZtCYyxXIHvNZyiWqSmaOmxjKpA61p8uXQ2A687Zps6kwe_6t82rpD0gzqKqGYDs3ARRY58Kz2jO5NZqcfCoMGHRB7PtNVwDTe-yXG1jadd2NeB9513tX7wJ7rOvDmwP-6Nr3OvLjVm1_CPYr_FGy85SkMttc7_coAr608g-Fk9m06O4M78wtzrQfeX79aKf8 | 
    
| linkProvider | Unpaywall | 
    
| linkToUnpaywall | http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV1La9wwEB7C5tD2kPRdp2nRoZe2eNeWJcs-hpI0FBJK24X0ZKyHE9Nd7xLv0iS_vhr5QXYNhdyMJRmLGUmfpG_mA_hgmB3NEUt9oQPmMxMYP-VK-JKGUsowT1Kn1nB2Hp9O2bcLfrEDYRcL40j7SpbjajYfV-WV41Yu52rS8cQmifU5hrrCuzG38HsEu9Pz70e_3VlKiIppTmsS_c6PE5q2N5kho5MaK2A5DTjGhIrNtWgAMIc8yUfrapnf_s1n9nnPpTRtwhaR1XNvWTrZhx9dhxo2yp_xeiXH6m4r1-ODevwU9lqQSo6aomewY6rn8OSsz_BavwDZRPi6KCli35NGWoQsCoIs9UWFBA1SVuRrvq4xUJPMyxu8rSBOe6cmSLi_xFbuxmhGUG_E7txLRS7NYm5W17cvYXpy_OvLqd_qNfiKxThXUUlVSgMdqTxRgkshuLbwUYe5phEPCi6YDtEAwqJExbUs4jTK7Y4tsjMD5dErGFX2_94ACWNjGNOFxB0oTUxOmSqKKLFoTWjFCg-CznKZapOZo6bGLGsCrWn2c9vYHnzqmyybTB7_q3zYuUPWDuo6o5gOzcJFlnjwuvGM_kt2px8LiwY9EBs-01fANN6bJdbWLp13a14PPvfeNfjBget68HHL__o2g84cPOjLb-ExxTMFF295CKPV9dq8s8BrJd-3Q-0fiTUnVw | 
    
| openUrl | ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Identifying+the+number+of+components+in+Gaussian+mixture+models+using+numerical+algebraic+geometry&rft.jtitle=Journal+of+algebra+and+its+applications&rft.au=Shirinkam%2C+Sara&rft.au=Alaeddini%2C+Adel&rft.au=Gross%2C+Elizabeth&rft.date=2020-11-01&rft.pub=World+Scientific+Publishing+Company&rft.issn=0219-4988&rft.eissn=1793-6829&rft.volume=19&rft.issue=11&rft_id=info:doi/10.1142%2FS0219498820502047&rft.externalDocID=S0219498820502047 | 
    
| thumbnail_l | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=0219-4988&client=summon | 
    
| thumbnail_m | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=0219-4988&client=summon | 
    
| thumbnail_s | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=0219-4988&client=summon |