Clustering with Scikit-Learn in Python
This tutorial demonstrates how to apply clustering algorithms with Python to a dataset with two concrete use cases. The first example uses clustering to identify meaningful groups of Greco-Roman authors based on their publications and their reception. The second use case applies clustering algorithm...
        Saved in:
      
    
          | Published in | The programming historian Vol. 10; no. 10 | 
|---|---|
| Main Author | |
| Format | Journal Article | 
| Language | English | 
| Published | 
            ProgHist Ltd
    
        29.09.2021
     Editorial Board of the Programming Historian  | 
| Subjects | |
| Online Access | Get full text | 
| ISSN | 2397-2068 2397-2068  | 
| DOI | 10.46430/phen0094 | 
Cover
| Abstract | This tutorial demonstrates how to apply clustering algorithms with Python to a dataset with two concrete use cases. The first example uses clustering to identify meaningful groups of Greco-Roman authors based on their publications and their reception. The second use case applies clustering algorithms to textual data in order to discover thematic groups. After finishing this tutorial, you will be able to use clustering in Python with Scikit-learn applied to your own data, adding an invaluable method to your toolbox for exploratory data analysis. | 
    
|---|---|
| AbstractList | This tutorial demonstrates how to apply clustering algorithms with Python to a dataset with two concrete use cases. The first example uses clustering to identify meaningful groups of Greco-Roman authors based on their publications and their reception. The second use case applies clustering algorithms to textual data in order to discover thematic groups. After finishing this tutorial, you will be able to use clustering in Python with Scikit-learn applied to your own data, adding an invaluable method to your toolbox for exploratory data analysis. | 
    
| Author | Jurczyk, Thomas | 
    
| Author_xml | – sequence: 1 givenname: Thomas surname: Jurczyk fullname: Jurczyk, Thomas  | 
    
| BookMark | eNp9kF1LwzAUhoNMcM5d-A8KgqBQl48ubS9l-DEYKKjX4SxJt8ya1DRl7N8bVhUR8eocDs95OOc9RgPrrEbolOCrjGcMT5q1thiX2QEaUlbmKcW8GPzoj9C4bTcYY1IyRggbovNZ3bVBe2NXydaEdfIkzasJ6UKDt4mxyeMurJ09QYcV1K0ef9YRerm9eZ7dp4uHu_nsepFKilmWTqUmGWBFMdCiVMAZy5XkREtMlrykUkKOVVkSDvFKpYtcV7CESmqtpCwUG6F571UONqLx5g38TjgwYj9wfiXAByNrLSQFRqKbYJZnGcmXRFYxg6gqQGkso-uyd3W2gd0W6vpbSLDYBya-AovwWQ833r13ug1i4zpv46-CFrTI8ynnRaQuekp617ZeV_8aJ79YaQIE42zwYOo_Nj4AzgaJIw | 
    
| CitedBy_id | crossref_primary_10_3390_f14010136 crossref_primary_10_18698_0236_3933_2021_4_109_121  | 
    
| ContentType | Journal Article | 
    
| Contributor | Walsh, Melanie Huang, Luling  | 
    
| Contributor_xml | – sequence: 1 givenname: Melanie surname: Walsh fullname: Walsh, Melanie – sequence: 2 givenname: Luling surname: Huang fullname: Huang, Luling  | 
    
| Copyright | 2021. This work is published under https://creativecommons.org/licenses/by/4.0/deed.en (the “License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License. | 
    
| Copyright_xml | – notice: 2021. This work is published under https://creativecommons.org/licenses/by/4.0/deed.en (the “License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License. | 
    
| DBID | AAYXX CITATION 8FE 8FG ABUWG AFKRA ARAPS AZQEC BENPR BGLVJ CCPQU DWQXO GNUQQ HCIFZ JQ2 K7- P62 PHGZM PHGZT PIMPY PKEHL PQEST PQGLB PQQKQ PQUKI ADTOC UNPAY DOA  | 
    
| DOI | 10.46430/phen0094 | 
    
| DatabaseName | CrossRef ProQuest SciTech Collection ProQuest Technology Collection ProQuest Central (Alumni) ProQuest Central UK/Ireland ProQuest Advanced Technologies & Aerospace Database ProQuest Central Essentials ProQuest Central ProQuest Technology Collection (LUT) ProQuest One Community College ProQuest Central Korea ProQuest Central Student SciTech Premium Collection ProQuest Computer Science Collection ProQuest Computer Science Database ProQuest Advanced Technologies & Aerospace Collection ProQuest Central Premium ProQuest One Academic Publicly Available Content Database ProQuest One Academic Middle East (New) ProQuest One Academic Eastern Edition (DO NOT USE) ProQuest One Applied & Life Sciences ProQuest One Academic ProQuest One Academic UKI Edition Unpaywall for CDI: Periodical Content Unpaywall DOAJ Directory of Open Access Journals  | 
    
| DatabaseTitle | CrossRef Publicly Available Content Database Advanced Technologies & Aerospace Collection Computer Science Database ProQuest Central Student Technology Collection ProQuest One Academic Middle East (New) ProQuest Advanced Technologies & Aerospace Collection ProQuest Central Essentials ProQuest Computer Science Collection ProQuest One Academic Eastern Edition ProQuest Central (Alumni Edition) SciTech Premium Collection ProQuest One Community College ProQuest Technology Collection ProQuest SciTech Collection ProQuest Central ProQuest One Applied & Life Sciences ProQuest One Academic UKI Edition ProQuest Central Korea ProQuest Central (New) ProQuest One Academic ProQuest One Academic (New)  | 
    
| DatabaseTitleList | CrossRef Publicly Available Content Database  | 
    
| Database_xml | – sequence: 1 dbid: DOA name: DOAJ Directory of Open Access Journals url: https://www.doaj.org/ sourceTypes: Open Website – sequence: 2 dbid: UNPAY name: Unpaywall url: https://proxy.k.utb.cz/login?url=https://unpaywall.org/ sourceTypes: Open Access Repository – sequence: 3 dbid: 8FG name: ProQuest Technology Collection url: https://search.proquest.com/technologycollection1 sourceTypes: Aggregation Database  | 
    
| DeliveryMethod | fulltext_linktorsrc | 
    
| Discipline | Religion | 
    
| EISSN | 2397-2068 | 
    
| ExternalDocumentID | oai_doaj_org_article_c2a31c6110374417b1cf430cee8ade0c 10.46430/phen0094 10_46430_phen0094  | 
    
| GroupedDBID | 5VS AAFWJ AAYXX ADBBV AFKRA AFPKN ALMA_UNASSIGNED_HOLDINGS ARAPS BCNDV BENPR BGLVJ CCPQU CITATION GROUPED_DOAJ HCIFZ K7- KQ8 M~E PHGZM PHGZT PIMPY PQGLB PUEGO 8FE 8FG ABUWG AZQEC DWQXO GNUQQ JQ2 P62 PKEHL PQEST PQQKQ PQUKI ADTOC IPNFZ RIG UNPAY  | 
    
| ID | FETCH-LOGICAL-c2034-5ce14a0d20a289da6337dc61ec01b692cca70d9916a094de87efabafceedcc8d3 | 
    
| IEDL.DBID | UNPAY | 
    
| ISSN | 2397-2068 | 
    
| IngestDate | Fri Oct 03 12:39:16 EDT 2025 Tue Aug 19 20:14:05 EDT 2025 Fri Jul 25 08:45:18 EDT 2025 Thu Apr 24 23:12:15 EDT 2025 Wed Oct 01 05:57:02 EDT 2025  | 
    
| IsDoiOpenAccess | true | 
    
| IsOpenAccess | true | 
    
| IsPeerReviewed | true | 
    
| IsScholarly | true | 
    
| Issue | 10 | 
    
| Language | English | 
    
| License | https://creativecommons.org/licenses/by/4.0/deed.en cc-by  | 
    
| LinkModel | DirectLink | 
    
| MergedId | FETCHMERGED-LOGICAL-c2034-5ce14a0d20a289da6337dc61ec01b692cca70d9916a094de87efabafceedcc8d3 | 
    
| Notes | ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14  | 
    
| ORCID | 0000-0002-5943-2305 | 
    
| OpenAccessLink | https://proxy.k.utb.cz/login?url=https://doi.org/10.46430/phen0094 | 
    
| PQID | 2828775668 | 
    
| PQPubID | 6458210 | 
    
| ParticipantIDs | doaj_primary_oai_doaj_org_article_c2a31c6110374417b1cf430cee8ade0c unpaywall_primary_10_46430_phen0094 proquest_journals_2828775668 crossref_primary_10_46430_phen0094 crossref_citationtrail_10_46430_phen0094  | 
    
| ProviderPackageCode | CITATION AAYXX  | 
    
| PublicationCentury | 2000 | 
    
| PublicationDate | 20210929 | 
    
| PublicationDateYYYYMMDD | 2021-09-29 | 
    
| PublicationDate_xml | – month: 09 year: 2021 text: 20210929 day: 29  | 
    
| PublicationDecade | 2020 | 
    
| PublicationTitle | The programming historian | 
    
| PublicationYear | 2021 | 
    
| Publisher | ProgHist Ltd Editorial Board of the Programming Historian  | 
    
| Publisher_xml | – name: ProgHist Ltd – name: Editorial Board of the Programming Historian  | 
    
| SSID | ssj0001933113 | 
    
| Score | 2.1606996 | 
    
| Snippet | This tutorial demonstrates how to apply clustering algorithms with Python to a dataset with two concrete use cases. The first example uses clustering to... | 
    
| SourceID | doaj unpaywall proquest crossref  | 
    
| SourceType | Open Website Open Access Repository Aggregation Database Enrichment Source Index Database  | 
    
| SubjectTerms | Algorithms Artificial intelligence Bibliographic literature Case studies Clustering Data analysis Datasets Libraries Machine learning Religion  | 
    
| SummonAdditionalLinks | – databaseName: DOAJ Directory of Open Access Journals dbid: DOA link: http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwrV1LS8NAEF6kF72IomK0SlARL6H7SLOboxalCIoHC72FfUIxxKIt0n_vTJKWFhQvXsMcZmcyO9_szn5DyBVkdNnnGuLbKZuk3rjEpFlI-ooaLYzhPOCN7tNzNhylj-P-eG3UF_aENfTAjeF6lmvBbMbwPRuOyzLMhlRQ2NuVdp5a3H2pyteKqfp0Bep0xkRDJZRC1qU9bJnCPrqNBFTz9G-Ay-15NdWLL12Wa3nmYY_stgAxvm0U2ydbvjog14NyjowGkGdiPDmNISTfJrOkZkeNJ1X8skAOgEMyerh_HQyTdsJBYjkVadK3nqWaOk41FD5OZ0JIB4v1ljKT5RzMK6lDCKdBc-eV9EEbHTCzWaucOCKd6r3yxyQGpARIyHidY9-ZCDpkVlGZW8OCDJpG5Ga57MK29N84haIsoAyoLVQsLRSRi5XotOG8-EnoDm23EkCa6voDOK9onVf85byIdJeWL9rY-SywCJQSYKaKyOXKG79rcvIfmpySHY7dKnjflHdJZ_Yx92cAN2bmvP6zvgGA79Ov priority: 102 providerName: Directory of Open Access Journals – databaseName: ProQuest Central dbid: BENPR link: http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwfV1LSwMxEB5qBfUiPrFaZVERL4vZ7LMHEZWWIlhEFLwteUpx2dbaIv33zmx3WwX1GkJIZjKZb5LJNwCn6NHjkAu0b50oNzBSuzKIrBsmTApfSs4tveje96Luc3D3Er7UoFf9haG0yupMLA5qPVB0R35BoUEcI_hIrobvLlWNotfVqoSGKEsr6MuCYmwJljkxY9Vh-abde3hc3Lpg_O55_oxiKEBvzC4olYry6344poK__wfoXJ3kQzH9FFn2zf90NmC9BI7O9UzTm1Az-RasVBnF23B2m02I9ABdkUOXqw5a7Vt_7BYEqk4_dx6mRBOwA8-d9tNt1y2LILiKMz9wQ2W8QDDNmcDYSIvI92OtIs8o5smoxVEDMdOE8gQuQpskNlZIYcn5KZVofxfq-SA3e-AgmEKwJI1oUWqab4WNVMLilpKeja1gDTivJJCqkiGcClVkKUYKhbDSSlgNOJ53Hc5oMX7rdENinHcgJuuiYTB6TUvDSBUXvofrof-KVA5NesriIDj9RGjDVAOalRLS0rw-0sVmaMDJXDF_z2T__0EOYI1Tqgo9NrWaUB-PJuYQscZYHpUb6AvGzdTq priority: 102 providerName: ProQuest  | 
    
| Title | Clustering with Scikit-Learn in Python | 
    
| URI | https://www.proquest.com/docview/2828775668 https://doi.org/10.46430/phen0094 https://doaj.org/article/c2a31c6110374417b1cf430cee8ade0c  | 
    
| UnpaywallVersion | publishedVersion | 
    
| Volume | 10 | 
    
| hasFullText | 1 | 
    
| inHoldings | 1 | 
    
| isFullTextHit | |
| isPrint | |
| journalDatabaseRights | – providerCode: PRVAFT databaseName: Open Access Digital Library customDbUrl: eissn: 2397-2068 dateEnd: 99991231 omitProxy: true ssIdentifier: ssj0001933113 issn: 2397-2068 databaseCode: KQ8 dateStart: 20120101 isFulltext: true titleUrlDefault: http://grweb.coalliance.org/oadl/oadl.html providerName: Colorado Alliance of Research Libraries – providerCode: PRVAON databaseName: DOAJ Directory of Open Access Journals customDbUrl: eissn: 2397-2068 dateEnd: 99991231 omitProxy: true ssIdentifier: ssj0001933113 issn: 2397-2068 databaseCode: DOA dateStart: 20120101 isFulltext: true titleUrlDefault: https://www.doaj.org/ providerName: Directory of Open Access Journals – providerCode: PRVHPJ databaseName: ROAD: Directory of Open Access Scholarly Resources customDbUrl: eissn: 2397-2068 dateEnd: 99991231 omitProxy: true ssIdentifier: ssj0001933113 issn: 2397-2068 databaseCode: M~E dateStart: 20120101 isFulltext: true titleUrlDefault: https://road.issn.org providerName: ISSN International Centre – providerCode: PRVPQU databaseName: ProQuest Central customDbUrl: http://www.proquest.com/pqcentral?accountid=15518 eissn: 2397-2068 dateEnd: 99991231 omitProxy: true ssIdentifier: ssj0001933113 issn: 2397-2068 databaseCode: BENPR dateStart: 20120627 isFulltext: true titleUrlDefault: https://www.proquest.com/central providerName: ProQuest  | 
    
| link | http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV1LSwNBDA7aHvTiAxXro6wPxMvq7HNmjypWESxFLOhpmScUyyraIvrrTbbbakXF6xKWZLIhX3aSLwD7mNF5EkqMbyO0H1tlfBWnzk8EUzJSKgwd3ehet9PLbnx1l9zNwM54FubL_X2MyZIdU6cTtb_NQj1NEG7XoN5td07uy6VxGUcvp2LEGDQtP5VnSjr-KQw5Nyye5Nur7Pe_pJPW4udQzqiL5OFoOFBH-v0bR-Ofmi7BQgUmvZOR95dhxhYrcHDWHxL7AeYkj_6yehi-D72BXzKper3C67wRX8AqdFvnt2eXfrUNwdchi2I_0TaIJTMhk1gkGZlGETc6DaxmgUqzEF3BmSG4J1EJYwW3TirpKAtqLUy0BrXisbDr4CGqQtSkrMyoRy1y0qVaMJ5pFTjuJGvA4fjscl1RhdPGin6OJUNpbD42tgG7E9GnET_GT0Kn5ICJAFFalw_w_PIqQnIdyihAe2hwkfaiqUA7fAmqL6SxTDdga-y-vIqzl5wKRs4RkooG7E1c-rsmG_-S2oT5kFpX6PIp24La4HlotxF7DFQTZkXrogn10_N256ZZVvDN6ov8AIX22w0 | 
    
| linkProvider | Unpaywall | 
    
| linkToHtml | http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwtV1ZSyQxEC5cBfVFvBbHs3F3xZfGdNLTx4OIJ-M1iCj41uYUcegZdQaZP-dvs6qne1RYffO1CSFdqaS-SirfB_AXI3pc5xLXt0m0H1plfBVGzq8nTEmhFOeObnTPm1HjOjy5qd-MwGv1FobKKqs9sdioTVvTGfkWpQZxjOAj2ek8-qQaRberlYSGLKUVzHZBMVY-7Di1_RdM4Z63jw9wvv9xfnR4td_wS5UBX3MmQr-ubRBKZjiTmHwYGQkRGx0FVrNARSnHX4yZIRglMRUyNomtk0o6ii5aJ0Zgv79gLBRhisnf2N5h8-Ly_ZQnFSIIxIDSKMToz7aodIvq-T4FwkIv4BPInejlHdl_ka3Wh3h3NA1TJVD1dgeeNQMjNp-F8aqCeQ429ls9IlnA0OfRYa6Hu8TDfdcvCFu9-9y76BMtwTxc_4g5fsNo3s7tAngI3hCcKStTKoUTTrpIJyxOtQpc7CSrwWZlgUyXjOQkjNHKMDMpjJVVxqrB-rBpZ0DD8b9Ge2TGYQNizi4-tJ_usnIhZppLEeD_0PtIkl9TgXbYCQ4_kcYyXYPlahKycjk_Z-_OV4M_w4n5eiSL33eyBhONq_Oz7Oy4eboEk5zKZOiiK12G0e5Tz64gzumq1dKZPLj9af99A-D-FA8 | 
    
| linkToUnpaywall | http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV1bS8NQDA5ze9AXL6g4b9QL4kvn6fW0jzoUEZQ9OJhP5VxhOKroiuivN-nauYmKryWU5KQhX3qSLwDHmNF55AuMb50oNzRSuzKMrRslTIpASt-3dKN7exdf98ObQTRowEE9CzNzfx9ismRn1OlE7W8L0IojhNtNaPXveucP5dK4lKOX42TCGDQvP5dnSjr-OQy5WOTP4v1NjEYz6eRq5WsoZ9JF8tgpxrKjPr5xNP6p6SosV2DSOZ94fw0aJl-Hk-6oIPYDzEkO_WV1MHwfh2O3ZFJ1hrnTeye-gA3oX13ed6_dahuCq3wWhG6kjBcKpn0msEjSIg4CrlXsGcU8Gac-uoIzTXBPoBLaJNxYIYWlLKhUooNNaOZPudkCB1EVoiZpREo9aoEVNlYJ46mSnuVWsDac1meXqYoqnDZWjDIsGUpjs9rYNhxORZ8n_Bg_CV2QA6YCRGldPsDzy6oIyZQvAg_tocFF2osmPWXxJah-IrRhqg27tfuyKs5eMyoYOUdImrThaOrS3zXZ_pfUDiz51LpCl0_pLjTHL4XZQ-wxlvvV1_cJkxHXmA | 
    
| openUrl | ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Clustering+with+Scikit-Learn+in+Python&rft.jtitle=The+programming+historian&rft.au=Thomas+Jurczyk&rft.date=2021-09-29&rft.pub=Editorial+Board+of+the+Programming+Historian&rft.eissn=2397-2068&rft.volume=10&rft_id=info:doi/10.46430%2Fphen0094&rft.externalDBID=DOA&rft.externalDocID=oai_doaj_org_article_c2a31c6110374417b1cf430cee8ade0c | 
    
| thumbnail_l | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=2397-2068&client=summon | 
    
| thumbnail_m | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=2397-2068&client=summon | 
    
| thumbnail_s | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=2397-2068&client=summon |