Clustering with Scikit-Learn in Python

This tutorial demonstrates how to apply clustering algorithms with Python to a dataset with two concrete use cases. The first example uses clustering to identify meaningful groups of Greco-Roman authors based on their publications and their reception. The second use case applies clustering algorithm...

Full description

Saved in:
Bibliographic Details
Published inThe programming historian Vol. 10; no. 10
Main Author Jurczyk, Thomas
Format Journal Article
LanguageEnglish
Published ProgHist Ltd 29.09.2021
Editorial Board of the Programming Historian
Subjects
Online AccessGet full text
ISSN2397-2068
2397-2068
DOI10.46430/phen0094

Cover

Abstract This tutorial demonstrates how to apply clustering algorithms with Python to a dataset with two concrete use cases. The first example uses clustering to identify meaningful groups of Greco-Roman authors based on their publications and their reception. The second use case applies clustering algorithms to textual data in order to discover thematic groups. After finishing this tutorial, you will be able to use clustering in Python with Scikit-learn applied to your own data, adding an invaluable method to your toolbox for exploratory data analysis.
AbstractList This tutorial demonstrates how to apply clustering algorithms with Python to a dataset with two concrete use cases. The first example uses clustering to identify meaningful groups of Greco-Roman authors based on their publications and their reception. The second use case applies clustering algorithms to textual data in order to discover thematic groups. After finishing this tutorial, you will be able to use clustering in Python with Scikit-learn applied to your own data, adding an invaluable method to your toolbox for exploratory data analysis.
Author Jurczyk, Thomas
Author_xml – sequence: 1
  givenname: Thomas
  surname: Jurczyk
  fullname: Jurczyk, Thomas
BookMark eNp9kF1LwzAUhoNMcM5d-A8KgqBQl48ubS9l-DEYKKjX4SxJt8ya1DRl7N8bVhUR8eocDs95OOc9RgPrrEbolOCrjGcMT5q1thiX2QEaUlbmKcW8GPzoj9C4bTcYY1IyRggbovNZ3bVBe2NXydaEdfIkzasJ6UKDt4mxyeMurJ09QYcV1K0ef9YRerm9eZ7dp4uHu_nsepFKilmWTqUmGWBFMdCiVMAZy5XkREtMlrykUkKOVVkSDvFKpYtcV7CESmqtpCwUG6F571UONqLx5g38TjgwYj9wfiXAByNrLSQFRqKbYJZnGcmXRFYxg6gqQGkso-uyd3W2gd0W6vpbSLDYBya-AovwWQ833r13ug1i4zpv46-CFrTI8ynnRaQuekp617ZeV_8aJ79YaQIE42zwYOo_Nj4AzgaJIw
CitedBy_id crossref_primary_10_3390_f14010136
crossref_primary_10_18698_0236_3933_2021_4_109_121
ContentType Journal Article
Contributor Walsh, Melanie
Huang, Luling
Contributor_xml – sequence: 1
  givenname: Melanie
  surname: Walsh
  fullname: Walsh, Melanie
– sequence: 2
  givenname: Luling
  surname: Huang
  fullname: Huang, Luling
Copyright 2021. This work is published under https://creativecommons.org/licenses/by/4.0/deed.en (the “License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.
Copyright_xml – notice: 2021. This work is published under https://creativecommons.org/licenses/by/4.0/deed.en (the “License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.
DBID AAYXX
CITATION
8FE
8FG
ABUWG
AFKRA
ARAPS
AZQEC
BENPR
BGLVJ
CCPQU
DWQXO
GNUQQ
HCIFZ
JQ2
K7-
P62
PHGZM
PHGZT
PIMPY
PKEHL
PQEST
PQGLB
PQQKQ
PQUKI
ADTOC
UNPAY
DOA
DOI 10.46430/phen0094
DatabaseName CrossRef
ProQuest SciTech Collection
ProQuest Technology Collection
ProQuest Central (Alumni)
ProQuest Central UK/Ireland
ProQuest Advanced Technologies & Aerospace Database
ProQuest Central Essentials
ProQuest Central
ProQuest Technology Collection (LUT)
ProQuest One Community College
ProQuest Central Korea
ProQuest Central Student
SciTech Premium Collection
ProQuest Computer Science Collection
ProQuest Computer Science Database
ProQuest Advanced Technologies & Aerospace Collection
ProQuest Central Premium
ProQuest One Academic
Publicly Available Content Database
ProQuest One Academic Middle East (New)
ProQuest One Academic Eastern Edition (DO NOT USE)
ProQuest One Applied & Life Sciences
ProQuest One Academic
ProQuest One Academic UKI Edition
Unpaywall for CDI: Periodical Content
Unpaywall
DOAJ Directory of Open Access Journals
DatabaseTitle CrossRef
Publicly Available Content Database
Advanced Technologies & Aerospace Collection
Computer Science Database
ProQuest Central Student
Technology Collection
ProQuest One Academic Middle East (New)
ProQuest Advanced Technologies & Aerospace Collection
ProQuest Central Essentials
ProQuest Computer Science Collection
ProQuest One Academic Eastern Edition
ProQuest Central (Alumni Edition)
SciTech Premium Collection
ProQuest One Community College
ProQuest Technology Collection
ProQuest SciTech Collection
ProQuest Central
ProQuest One Applied & Life Sciences
ProQuest One Academic UKI Edition
ProQuest Central Korea
ProQuest Central (New)
ProQuest One Academic
ProQuest One Academic (New)
DatabaseTitleList
CrossRef
Publicly Available Content Database
Database_xml – sequence: 1
  dbid: DOA
  name: DOAJ Directory of Open Access Journals
  url: https://www.doaj.org/
  sourceTypes: Open Website
– sequence: 2
  dbid: UNPAY
  name: Unpaywall
  url: https://proxy.k.utb.cz/login?url=https://unpaywall.org/
  sourceTypes: Open Access Repository
– sequence: 3
  dbid: 8FG
  name: ProQuest Technology Collection
  url: https://search.proquest.com/technologycollection1
  sourceTypes: Aggregation Database
DeliveryMethod fulltext_linktorsrc
Discipline Religion
EISSN 2397-2068
ExternalDocumentID oai_doaj_org_article_c2a31c6110374417b1cf430cee8ade0c
10.46430/phen0094
10_46430_phen0094
GroupedDBID 5VS
AAFWJ
AAYXX
ADBBV
AFKRA
AFPKN
ALMA_UNASSIGNED_HOLDINGS
ARAPS
BCNDV
BENPR
BGLVJ
CCPQU
CITATION
GROUPED_DOAJ
HCIFZ
K7-
KQ8
M~E
PHGZM
PHGZT
PIMPY
PQGLB
PUEGO
8FE
8FG
ABUWG
AZQEC
DWQXO
GNUQQ
JQ2
P62
PKEHL
PQEST
PQQKQ
PQUKI
ADTOC
IPNFZ
RIG
UNPAY
ID FETCH-LOGICAL-c2034-5ce14a0d20a289da6337dc61ec01b692cca70d9916a094de87efabafceedcc8d3
IEDL.DBID UNPAY
ISSN 2397-2068
IngestDate Fri Oct 03 12:39:16 EDT 2025
Tue Aug 19 20:14:05 EDT 2025
Fri Jul 25 08:45:18 EDT 2025
Thu Apr 24 23:12:15 EDT 2025
Wed Oct 01 05:57:02 EDT 2025
IsDoiOpenAccess true
IsOpenAccess true
IsPeerReviewed true
IsScholarly true
Issue 10
Language English
License https://creativecommons.org/licenses/by/4.0/deed.en
cc-by
LinkModel DirectLink
MergedId FETCHMERGED-LOGICAL-c2034-5ce14a0d20a289da6337dc61ec01b692cca70d9916a094de87efabafceedcc8d3
Notes ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 14
ORCID 0000-0002-5943-2305
OpenAccessLink https://proxy.k.utb.cz/login?url=https://doi.org/10.46430/phen0094
PQID 2828775668
PQPubID 6458210
ParticipantIDs doaj_primary_oai_doaj_org_article_c2a31c6110374417b1cf430cee8ade0c
unpaywall_primary_10_46430_phen0094
proquest_journals_2828775668
crossref_primary_10_46430_phen0094
crossref_citationtrail_10_46430_phen0094
ProviderPackageCode CITATION
AAYXX
PublicationCentury 2000
PublicationDate 20210929
PublicationDateYYYYMMDD 2021-09-29
PublicationDate_xml – month: 09
  year: 2021
  text: 20210929
  day: 29
PublicationDecade 2020
PublicationTitle The programming historian
PublicationYear 2021
Publisher ProgHist Ltd
Editorial Board of the Programming Historian
Publisher_xml – name: ProgHist Ltd
– name: Editorial Board of the Programming Historian
SSID ssj0001933113
Score 2.1606996
Snippet This tutorial demonstrates how to apply clustering algorithms with Python to a dataset with two concrete use cases. The first example uses clustering to...
SourceID doaj
unpaywall
proquest
crossref
SourceType Open Website
Open Access Repository
Aggregation Database
Enrichment Source
Index Database
SubjectTerms Algorithms
Artificial intelligence
Bibliographic literature
Case studies
Clustering
Data analysis
Datasets
Libraries
Machine learning
Religion
SummonAdditionalLinks – databaseName: DOAJ Directory of Open Access Journals
  dbid: DOA
  link: http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwrV1LS8NAEF6kF72IomK0SlARL6H7SLOboxalCIoHC72FfUIxxKIt0n_vTJKWFhQvXsMcZmcyO9_szn5DyBVkdNnnGuLbKZuk3rjEpFlI-ooaLYzhPOCN7tNzNhylj-P-eG3UF_aENfTAjeF6lmvBbMbwPRuOyzLMhlRQ2NuVdp5a3H2pyteKqfp0Bep0xkRDJZRC1qU9bJnCPrqNBFTz9G-Ay-15NdWLL12Wa3nmYY_stgAxvm0U2ydbvjog14NyjowGkGdiPDmNISTfJrOkZkeNJ1X8skAOgEMyerh_HQyTdsJBYjkVadK3nqWaOk41FD5OZ0JIB4v1ljKT5RzMK6lDCKdBc-eV9EEbHTCzWaucOCKd6r3yxyQGpARIyHidY9-ZCDpkVlGZW8OCDJpG5Ga57MK29N84haIsoAyoLVQsLRSRi5XotOG8-EnoDm23EkCa6voDOK9onVf85byIdJeWL9rY-SywCJQSYKaKyOXKG79rcvIfmpySHY7dKnjflHdJZ_Yx92cAN2bmvP6zvgGA79Ov
  priority: 102
  providerName: Directory of Open Access Journals
– databaseName: ProQuest Central
  dbid: BENPR
  link: http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwfV1LSwMxEB5qBfUiPrFaZVERL4vZ7LMHEZWWIlhEFLwteUpx2dbaIv33zmx3WwX1GkJIZjKZb5LJNwCn6NHjkAu0b50oNzBSuzKIrBsmTApfSs4tveje96Luc3D3Er7UoFf9haG0yupMLA5qPVB0R35BoUEcI_hIrobvLlWNotfVqoSGKEsr6MuCYmwJljkxY9Vh-abde3hc3Lpg_O55_oxiKEBvzC4olYry6344poK__wfoXJ3kQzH9FFn2zf90NmC9BI7O9UzTm1Az-RasVBnF23B2m02I9ABdkUOXqw5a7Vt_7BYEqk4_dx6mRBOwA8-d9tNt1y2LILiKMz9wQ2W8QDDNmcDYSIvI92OtIs8o5smoxVEDMdOE8gQuQpskNlZIYcn5KZVofxfq-SA3e-AgmEKwJI1oUWqab4WNVMLilpKeja1gDTivJJCqkiGcClVkKUYKhbDSSlgNOJ53Hc5oMX7rdENinHcgJuuiYTB6TUvDSBUXvofrof-KVA5NesriIDj9RGjDVAOalRLS0rw-0sVmaMDJXDF_z2T__0EOYI1Tqgo9NrWaUB-PJuYQscZYHpUb6AvGzdTq
  priority: 102
  providerName: ProQuest
Title Clustering with Scikit-Learn in Python
URI https://www.proquest.com/docview/2828775668
https://doi.org/10.46430/phen0094
https://doaj.org/article/c2a31c6110374417b1cf430cee8ade0c
UnpaywallVersion publishedVersion
Volume 10
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
journalDatabaseRights – providerCode: PRVAFT
  databaseName: Open Access Digital Library
  customDbUrl:
  eissn: 2397-2068
  dateEnd: 99991231
  omitProxy: true
  ssIdentifier: ssj0001933113
  issn: 2397-2068
  databaseCode: KQ8
  dateStart: 20120101
  isFulltext: true
  titleUrlDefault: http://grweb.coalliance.org/oadl/oadl.html
  providerName: Colorado Alliance of Research Libraries
– providerCode: PRVAON
  databaseName: DOAJ Directory of Open Access Journals
  customDbUrl:
  eissn: 2397-2068
  dateEnd: 99991231
  omitProxy: true
  ssIdentifier: ssj0001933113
  issn: 2397-2068
  databaseCode: DOA
  dateStart: 20120101
  isFulltext: true
  titleUrlDefault: https://www.doaj.org/
  providerName: Directory of Open Access Journals
– providerCode: PRVHPJ
  databaseName: ROAD: Directory of Open Access Scholarly Resources
  customDbUrl:
  eissn: 2397-2068
  dateEnd: 99991231
  omitProxy: true
  ssIdentifier: ssj0001933113
  issn: 2397-2068
  databaseCode: M~E
  dateStart: 20120101
  isFulltext: true
  titleUrlDefault: https://road.issn.org
  providerName: ISSN International Centre
– providerCode: PRVPQU
  databaseName: ProQuest Central
  customDbUrl: http://www.proquest.com/pqcentral?accountid=15518
  eissn: 2397-2068
  dateEnd: 99991231
  omitProxy: true
  ssIdentifier: ssj0001933113
  issn: 2397-2068
  databaseCode: BENPR
  dateStart: 20120627
  isFulltext: true
  titleUrlDefault: https://www.proquest.com/central
  providerName: ProQuest
link http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV1LSwNBDA7aHvTiAxXro6wPxMvq7HNmjypWESxFLOhpmScUyyraIvrrTbbbakXF6xKWZLIhX3aSLwD7mNF5EkqMbyO0H1tlfBWnzk8EUzJSKgwd3ehet9PLbnx1l9zNwM54FubL_X2MyZIdU6cTtb_NQj1NEG7XoN5td07uy6VxGUcvp2LEGDQtP5VnSjr-KQw5Nyye5Nur7Pe_pJPW4udQzqiL5OFoOFBH-v0bR-Ofmi7BQgUmvZOR95dhxhYrcHDWHxL7AeYkj_6yehi-D72BXzKper3C67wRX8AqdFvnt2eXfrUNwdchi2I_0TaIJTMhk1gkGZlGETc6DaxmgUqzEF3BmSG4J1EJYwW3TirpKAtqLUy0BrXisbDr4CGqQtSkrMyoRy1y0qVaMJ5pFTjuJGvA4fjscl1RhdPGin6OJUNpbD42tgG7E9GnET_GT0Kn5ICJAFFalw_w_PIqQnIdyihAe2hwkfaiqUA7fAmqL6SxTDdga-y-vIqzl5wKRs4RkooG7E1c-rsmG_-S2oT5kFpX6PIp24La4HlotxF7DFQTZkXrogn10_N256ZZVvDN6ov8AIX22w0
linkProvider Unpaywall
linkToHtml http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwtV1ZSyQxEC5cBfVFvBbHs3F3xZfGdNLTx4OIJ-M1iCj41uYUcegZdQaZP-dvs6qne1RYffO1CSFdqaS-SirfB_AXI3pc5xLXt0m0H1plfBVGzq8nTEmhFOeObnTPm1HjOjy5qd-MwGv1FobKKqs9sdioTVvTGfkWpQZxjOAj2ek8-qQaRberlYSGLKUVzHZBMVY-7Di1_RdM4Z63jw9wvv9xfnR4td_wS5UBX3MmQr-ubRBKZjiTmHwYGQkRGx0FVrNARSnHX4yZIRglMRUyNomtk0o6ii5aJ0Zgv79gLBRhisnf2N5h8-Ly_ZQnFSIIxIDSKMToz7aodIvq-T4FwkIv4BPInejlHdl_ka3Wh3h3NA1TJVD1dgeeNQMjNp-F8aqCeQ429ls9IlnA0OfRYa6Hu8TDfdcvCFu9-9y76BMtwTxc_4g5fsNo3s7tAngI3hCcKStTKoUTTrpIJyxOtQpc7CSrwWZlgUyXjOQkjNHKMDMpjJVVxqrB-rBpZ0DD8b9Ge2TGYQNizi4-tJ_usnIhZppLEeD_0PtIkl9TgXbYCQ4_kcYyXYPlahKycjk_Z-_OV4M_w4n5eiSL33eyBhONq_Oz7Oy4eboEk5zKZOiiK12G0e5Tz64gzumq1dKZPLj9af99A-D-FA8
linkToUnpaywall http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV1bS8NQDA5ze9AXL6g4b9QL4kvn6fW0jzoUEZQ9OJhP5VxhOKroiuivN-nauYmKryWU5KQhX3qSLwDHmNF55AuMb50oNzRSuzKMrRslTIpASt-3dKN7exdf98ObQTRowEE9CzNzfx9ismRn1OlE7W8L0IojhNtNaPXveucP5dK4lKOX42TCGDQvP5dnSjr-OQy5WOTP4v1NjEYz6eRq5WsoZ9JF8tgpxrKjPr5xNP6p6SosV2DSOZ94fw0aJl-Hk-6oIPYDzEkO_WV1MHwfh2O3ZFJ1hrnTeye-gA3oX13ed6_dahuCq3wWhG6kjBcKpn0msEjSIg4CrlXsGcU8Gac-uoIzTXBPoBLaJNxYIYWlLKhUooNNaOZPudkCB1EVoiZpREo9aoEVNlYJ46mSnuVWsDac1meXqYoqnDZWjDIsGUpjs9rYNhxORZ8n_Bg_CV2QA6YCRGldPsDzy6oIyZQvAg_tocFF2osmPWXxJah-IrRhqg27tfuyKs5eMyoYOUdImrThaOrS3zXZ_pfUDiz51LpCl0_pLjTHL4XZQ-wxlvvV1_cJkxHXmA
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Clustering+with+Scikit-Learn+in+Python&rft.jtitle=The+programming+historian&rft.au=Thomas+Jurczyk&rft.date=2021-09-29&rft.pub=Editorial+Board+of+the+Programming+Historian&rft.eissn=2397-2068&rft.volume=10&rft_id=info:doi/10.46430%2Fphen0094&rft.externalDBID=DOA&rft.externalDocID=oai_doaj_org_article_c2a31c6110374417b1cf430cee8ade0c
thumbnail_l http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=2397-2068&client=summon
thumbnail_m http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=2397-2068&client=summon
thumbnail_s http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=2397-2068&client=summon