A Python Clustering Analysis Protocol of Genes Expression Data Sets

Gene expression and SNPs data hold great potential for a new understanding of disease prognosis, drug sensitivity, and toxicity evaluations. Cluster analysis is used to analyze data that do not contain any specific subgroups. The goal is to use the data itself to recognize meaningful and informative...

Full description

Saved in:
Bibliographic Details
Published inGenes Vol. 13; no. 10; p. 1839
Main Authors Agapito, Giuseppe, Milano, Marianna, Cannataro, Mario
Format Journal Article
LanguageEnglish
Published Switzerland MDPI AG 12.10.2022
MDPI
Subjects
Online AccessGet full text
ISSN2073-4425
2073-4425
DOI10.3390/genes13101839

Cover

Abstract Gene expression and SNPs data hold great potential for a new understanding of disease prognosis, drug sensitivity, and toxicity evaluations. Cluster analysis is used to analyze data that do not contain any specific subgroups. The goal is to use the data itself to recognize meaningful and informative subgroups. In addition, cluster investigation helps data reduction purposes, exposes hidden patterns, and generates hypotheses regarding the relationship between genes and phenotypes. Cluster analysis could also be used to identify bio-markers and yield computational predictive models. The methods used to analyze microarrays data can profoundly influence the interpretation of the results. Therefore, a basic understanding of these computational tools is necessary for optimal experimental design and meaningful data analysis. This manuscript provides an analysis protocol to effectively analyze gene expression data sets through the K-means and DBSCAN algorithms. The general protocol enables analyzing omics data to identify subsets of features with low redundancy and high robustness, speeding up the identification of new bio-markers through pathway enrichment analysis. In addition, to demonstrate the effectiveness of our clustering analysis protocol, we analyze a real data set from the GEO database. Finally, the manuscript provides some best practice and tips to overcome some issues in the analysis of omics data sets through unsupervised learning.
AbstractList Gene expression and SNPs data hold great potential for a new understanding of disease prognosis, drug sensitivity, and toxicity evaluations. Cluster analysis is used to analyze data that do not contain any specific subgroups. The goal is to use the data itself to recognize meaningful and informative subgroups. In addition, cluster investigation helps data reduction purposes, exposes hidden patterns, and generates hypotheses regarding the relationship between genes and phenotypes. Cluster analysis could also be used to identify bio-markers and yield computational predictive models. The methods used to analyze microarrays data can profoundly influence the interpretation of the results. Therefore, a basic understanding of these computational tools is necessary for optimal experimental design and meaningful data analysis. This manuscript provides an analysis protocol to effectively analyze gene expression data sets through the K-means and DBSCAN algorithms. The general protocol enables analyzing omics data to identify subsets of features with low redundancy and high robustness, speeding up the identification of new bio-markers through pathway enrichment analysis. In addition, to demonstrate the effectiveness of our clustering analysis protocol, we analyze a real data set from the GEO database. Finally, the manuscript provides some best practice and tips to overcome some issues in the analysis of omics data sets through unsupervised learning.
Gene expression and SNPs data hold great potential for a new understanding of disease prognosis, drug sensitivity, and toxicity evaluations. Cluster analysis is used to analyze data that do not contain any specific subgroups. The goal is to use the data itself to recognize meaningful and informative subgroups. In addition, cluster investigation helps data reduction purposes, exposes hidden patterns, and generates hypotheses regarding the relationship between genes and phenotypes. Cluster analysis could also be used to identify bio-markers and yield computational predictive models. The methods used to analyze microarrays data can profoundly influence the interpretation of the results. Therefore, a basic understanding of these computational tools is necessary for optimal experimental design and meaningful data analysis. This manuscript provides an analysis protocol to effectively analyze gene expression data sets through the K-means and DBSCAN algorithms. The general protocol enables analyzing omics data to identify subsets of features with low redundancy and high robustness, speeding up the identification of new bio-markers through pathway enrichment analysis. In addition, to demonstrate the effectiveness of our clustering analysis protocol, we analyze a real data set from the GEO database. Finally, the manuscript provides some best practice and tips to overcome some issues in the analysis of omics data sets through unsupervised learning.Gene expression and SNPs data hold great potential for a new understanding of disease prognosis, drug sensitivity, and toxicity evaluations. Cluster analysis is used to analyze data that do not contain any specific subgroups. The goal is to use the data itself to recognize meaningful and informative subgroups. In addition, cluster investigation helps data reduction purposes, exposes hidden patterns, and generates hypotheses regarding the relationship between genes and phenotypes. Cluster analysis could also be used to identify bio-markers and yield computational predictive models. The methods used to analyze microarrays data can profoundly influence the interpretation of the results. Therefore, a basic understanding of these computational tools is necessary for optimal experimental design and meaningful data analysis. This manuscript provides an analysis protocol to effectively analyze gene expression data sets through the K-means and DBSCAN algorithms. The general protocol enables analyzing omics data to identify subsets of features with low redundancy and high robustness, speeding up the identification of new bio-markers through pathway enrichment analysis. In addition, to demonstrate the effectiveness of our clustering analysis protocol, we analyze a real data set from the GEO database. Finally, the manuscript provides some best practice and tips to overcome some issues in the analysis of omics data sets through unsupervised learning.
Audience Academic
Author Milano, Marianna
Cannataro, Mario
Agapito, Giuseppe
AuthorAffiliation 1 Department of Law, Economics and Social Sciences, University Magna Græcia of Catanzaro, 88100 Catanzaro, Italy
3 Department of Medical and Clinical Surgery, University Magna Græcia of Catanzaro, 88100 Catanzaro, Italy
2 Data Analytics Research Center, University Magna Græcia of Catanzaro, 88100 Catanzaro, Italy
AuthorAffiliation_xml – name: 2 Data Analytics Research Center, University Magna Græcia of Catanzaro, 88100 Catanzaro, Italy
– name: 1 Department of Law, Economics and Social Sciences, University Magna Græcia of Catanzaro, 88100 Catanzaro, Italy
– name: 3 Department of Medical and Clinical Surgery, University Magna Græcia of Catanzaro, 88100 Catanzaro, Italy
Author_xml – sequence: 1
  givenname: Giuseppe
  orcidid: 0000-0003-2868-7732
  surname: Agapito
  fullname: Agapito, Giuseppe
– sequence: 2
  givenname: Marianna
  orcidid: 0000-0003-1561-725X
  surname: Milano
  fullname: Milano, Marianna
– sequence: 3
  givenname: Mario
  orcidid: 0000-0003-1502-2387
  surname: Cannataro
  fullname: Cannataro, Mario
BackLink https://www.ncbi.nlm.nih.gov/pubmed/36292724$$D View this record in MEDLINE/PubMed
BookMark eNqFUs1vFCEUJ6bG1tqjVzOJFy9TGb5muJhs1lpNmthEPZO3M2-2NCyswFT3v5d1q9ttYoQDBH4fvN_jOTnywSMhLxt6zrmmb5foMTW8oU3H9RNywmjLayGYPHqwPyZnKd3SMgRllMpn5JgrplnLxAmZz6rrTb4Jvpq7KWWM1i-rmQe3STZV1zHk0AdXhbG63HpVFz_XEVOyhfAeMlRfMKcX5OkILuHZ_XpKvn24-Dr_WF99vvw0n13VvehYrkdFRxS6FZqyQSomGwrQQAd6ZFrKXowLptSCAioUg-QCFtgqNiBoLnU78FNyvtOd_Bo2P8A5s452BXFjGmq2gZiDQArh3Y6wnhYrHHr0OcKeFMCawxtvb8wy3BmtaMNpVwTe3AvE8H3ClM3Kph6dA49hSqZkqCXrBNt6vX4EvQ1TLEH-RnWibaVSe9QSHBrrx1B8-62ombVCtowzzveFHqDKHHBl-_IFRlvODwivHhb6t8I_fS6AegfoY0gp4vjf5PgjfG8z5NL28hLr_sH6BQX4yXo
CitedBy_id crossref_primary_10_1002_cam4_5581
crossref_primary_10_7454_jessd_v7i2_1258
crossref_primary_10_26641_2307_0404_2024_1_300508
crossref_primary_10_3390_genes14020412
crossref_primary_10_3390_genes15060714
crossref_primary_10_3390_ijms24098236
Cites_doi 10.1371/journal.pcbi.1002833
10.1109/CITSM.2016.7577578
10.1093/bib/6.4.331
10.1038/s41598-022-14048-6
10.1016/j.mce.2013.06.003
10.1162/neco.1989.1.3.295
10.1186/1471-2105-13-258
10.1186/1471-2105-10-11
10.1371/journal.pone.0048146
10.7717/peerj-cs.270
10.1007/s00500-020-05243-6
10.18632/oncotarget.13135
10.3390/genes12040502
10.1038/ng2028
10.1007/978-1-0716-1839-4_9
10.1016/j.tem.2018.10.006
10.1146/annurev.bioeng.4.020702.153438
10.1109/TIT.1982.1056489
10.1016/j.eswa.2017.01.056
10.1007/10_2007_087
10.1517/phgs.4.1.41.22581
10.1007/978-3-319-27400-3_25
10.1038/s41417-022-00520-y
10.1186/bcr2921
10.1007/978-1-4899-7687-1
10.1016/j.jbi.2015.06.005
10.1158/1078-0432.CCR-07-4532
10.1145/2393216.2393309
10.1007/s00280-015-2916-3
10.1186/gb-2003-4-4-210
10.1109/EIT.2009.5189632
10.3233/FI-2011-376
10.1152/physiolgenomics.00314.2004
10.1093/bioinformatics/btaa529
10.1137/120875909
10.1016/j.jbi.2004.07.002
10.18632/oncotarget.9927
10.1201/9781420028096.ch3
10.1016/j.compbiolchem.2004.11.001
10.1126/science.274.5294.1855
10.1142/S0129626421420020
10.1093/nar/gkl887
10.1002/cpt.1391
10.3390/cells11020189
10.1007/BF02365362
10.1142/S0219720018400231
10.1093/nar/gki022
10.1186/1471-2105-15-S2-S10
10.1038/ng1032
10.1016/j.taap.2022.116215
10.1093/bib/bbv076
10.1038/35076576
10.1371/journal.pcbi.1007665
10.1186/1471-2407-9-353
10.1017/9780511811487
10.1109/TEVC.2005.859371
10.1145/980972.980974
10.1093/bioinformatics/btz699
ContentType Journal Article
Copyright COPYRIGHT 2022 MDPI AG
2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.
2022 by the authors. 2022
Copyright_xml – notice: COPYRIGHT 2022 MDPI AG
– notice: 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.
– notice: 2022 by the authors. 2022
DBID AAYXX
CITATION
CGR
CUY
CVF
ECM
EIF
NPM
8FD
8FE
8FH
ABUWG
AFKRA
AZQEC
BBNVY
BENPR
BHPHI
CCPQU
COVID
DWQXO
FR3
GNUQQ
HCIFZ
LK8
M7P
P64
PHGZM
PHGZT
PIMPY
PKEHL
PQEST
PQGLB
PQQKQ
PQUKI
PRINS
RC3
7X8
5PM
ADTOC
UNPAY
DOI 10.3390/genes13101839
DatabaseName CrossRef
Medline
MEDLINE
MEDLINE (Ovid)
MEDLINE
MEDLINE
PubMed
Technology Research Database
ProQuest SciTech Collection
ProQuest Natural Science Journals
ProQuest Central (Alumni)
ProQuest Central UK/Ireland
ProQuest Central Essentials
Biological Science Collection
ProQuest Central
Natural Science Collection
ProQuest One Community College
Coronavirus Research Database
ProQuest Central
Engineering Research Database
ProQuest Central Student
SciTech Premium Collection (Proquest)
Biological Sciences
Biological Science Database
Biotechnology and BioEngineering Abstracts
ProQuest Central Premium
ProQuest One Academic
Publicly Available Content Database
ProQuest One Academic Middle East (New)
ProQuest One Academic Eastern Edition (DO NOT USE)
ProQuest One Applied & Life Sciences
ProQuest One Academic
ProQuest One Academic UKI Edition
ProQuest Central China
Genetics Abstracts
MEDLINE - Academic
PubMed Central (Full Participant titles)
Unpaywall for CDI: Periodical Content
Unpaywall
DatabaseTitle CrossRef
MEDLINE
Medline Complete
MEDLINE with Full Text
PubMed
MEDLINE (Ovid)
Publicly Available Content Database
ProQuest Central Student
Technology Research Database
ProQuest One Academic Middle East (New)
ProQuest Central Essentials
ProQuest Central (Alumni Edition)
SciTech Premium Collection
ProQuest One Community College
ProQuest Natural Science Collection
ProQuest Central China
ProQuest Central
ProQuest One Applied & Life Sciences
Genetics Abstracts
Natural Science Collection
ProQuest Central Korea
Biological Science Collection
ProQuest Central (New)
ProQuest Biological Science Collection
ProQuest One Academic Eastern Edition
Coronavirus Research Database
Biological Science Database
ProQuest SciTech Collection
Biotechnology and BioEngineering Abstracts
ProQuest One Academic UKI Edition
Engineering Research Database
ProQuest One Academic
ProQuest One Academic (New)
MEDLINE - Academic
DatabaseTitleList

CrossRef
MEDLINE
MEDLINE - Academic
Publicly Available Content Database
Database_xml – sequence: 1
  dbid: NPM
  name: PubMed
  url: https://proxy.k.utb.cz/login?url=http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=PubMed
  sourceTypes: Index Database
– sequence: 2
  dbid: EIF
  name: MEDLINE
  url: https://proxy.k.utb.cz/login?url=https://www.webofscience.com/wos/medline/basic-search
  sourceTypes: Index Database
– sequence: 3
  dbid: UNPAY
  name: Unpaywall
  url: https://proxy.k.utb.cz/login?url=https://unpaywall.org/
  sourceTypes: Open Access Repository
– sequence: 4
  dbid: BENPR
  name: ProQuest Central
  url: http://www.proquest.com/pqcentral?accountid=15518
  sourceTypes: Aggregation Database
DeliveryMethod fulltext_linktorsrc
Discipline Biology
EISSN 2073-4425
ExternalDocumentID 10.3390/genes13101839
PMC9601308
A745723233
36292724
10_3390_genes13101839
Genre Journal Article
GeographicLocations Italy
GeographicLocations_xml – name: Italy
GroupedDBID ---
53G
5VS
8FE
8FH
AADQD
AAFWJ
AAHBH
AAYXX
ADBBV
AENEX
AFKRA
AFZYC
ALMA_UNASSIGNED_HOLDINGS
AOIJS
BAWUL
BBNVY
BCNDV
BENPR
BHPHI
CCPQU
CITATION
DIK
EBD
HCIFZ
HYE
IAO
IHR
ITC
KQ8
LK8
M48
M7P
MODMG
M~E
OK1
PGMZT
PHGZM
PHGZT
PIMPY
PQGLB
PROAC
RPM
CGR
CUY
CVF
ECM
EIF
NPM
8FD
ABUWG
AZQEC
COVID
DWQXO
FR3
GNUQQ
P64
PKEHL
PQEST
PQQKQ
PQUKI
PRINS
RC3
7X8
PUEGO
5PM
ADRAZ
ADTOC
C1A
IPNFZ
RIG
UNPAY
ID FETCH-LOGICAL-c482t-f60fe4974902d562510aa1a8a9f2955c4fb266b0ae6e4d534abe762dea93597d3
IEDL.DBID M48
ISSN 2073-4425
IngestDate Sun Oct 26 03:54:10 EDT 2025
Tue Sep 30 17:18:04 EDT 2025
Wed Oct 01 14:59:04 EDT 2025
Fri Jul 25 11:58:23 EDT 2025
Mon Oct 20 22:54:22 EDT 2025
Mon Oct 20 16:59:47 EDT 2025
Mon Jul 21 06:08:02 EDT 2025
Thu Apr 24 23:11:47 EDT 2025
Thu Oct 16 04:38:27 EDT 2025
IsDoiOpenAccess true
IsOpenAccess true
IsPeerReviewed true
IsScholarly true
Issue 10
Keywords SNPs
DEGs
clustering
data mining
microarrays
unsupervised learning
Language English
License Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
cc-by
LinkModel DirectLink
MergedId FETCHMERGED-LOGICAL-c482t-f60fe4974902d562510aa1a8a9f2955c4fb266b0ae6e4d534abe762dea93597d3
Notes ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 14
content type line 23
ORCID 0000-0003-2868-7732
0000-0003-1561-725X
0000-0003-1502-2387
OpenAccessLink https://www.proquest.com/docview/2728477566?pq-origsite=%requestingapplication%&accountid=15518
PMID 36292724
PQID 2728477566
PQPubID 2032392
ParticipantIDs unpaywall_primary_10_3390_genes13101839
pubmedcentral_primary_oai_pubmedcentral_nih_gov_9601308
proquest_miscellaneous_2729528429
proquest_journals_2728477566
gale_infotracmisc_A745723233
gale_infotracacademiconefile_A745723233
pubmed_primary_36292724
crossref_primary_10_3390_genes13101839
crossref_citationtrail_10_3390_genes13101839
PublicationCentury 2000
PublicationDate 20221012
PublicationDateYYYYMMDD 2022-10-12
PublicationDate_xml – month: 10
  year: 2022
  text: 20221012
  day: 12
PublicationDecade 2020
PublicationPlace Switzerland
PublicationPlace_xml – name: Switzerland
– name: Basel
PublicationTitle Genes
PublicationTitleAlternate Genes (Basel)
PublicationYear 2022
Publisher MDPI AG
MDPI
Publisher_xml – name: MDPI AG
– name: MDPI
References Zeng (ref_45) 2020; 36
Quackenbush (ref_50) 2002; 32
Agapito (ref_17) 2021; 32
Annathurai (ref_56) 2022; 19
Liberti (ref_51) 2014; 56
ref_14
ref_58
Barrett (ref_60) 2005; 33
ref_13
ref_10
ref_54
ref_53
ref_52
Barrett (ref_59) 2007; 35
Liu (ref_66) 2022; 454
Ma (ref_40) 2006; 10
ref_19
Tamayo (ref_11) 2003; 5
Carter (ref_6) 2007; 39
ref_61
Lloyd (ref_57) 1982; 28
Arbitrio (ref_4) 2016; 77
ref_67
ref_22
ref_21
ref_65
ref_64
Shannon (ref_38) 2003; 4
Quackenbush (ref_43) 2001; 2
ref_29
ref_26
Ivchenko (ref_55) 1998; 88
Jiang (ref_69) 2014; 382
Mills (ref_8) 2005; 17
Owzar (ref_32) 2008; 14
Yin (ref_44) 2018; 16
Rahmati (ref_62) 2020; 48
Scionti (ref_5) 2016; 7
Miao (ref_18) 2016; 39
Heller (ref_3) 2002; 4
ref_34
Hannun (ref_68) 1996; 274
ref_31
Saha (ref_36) 2011; 106
Furth (ref_70) 2011; 13
ref_39
ref_37
Arca (ref_63) 2019; 30
Bucheli (ref_27) 2020; 6
Verducci (ref_12) 2006; 25
Arbitrio (ref_1) 2016; 7
ref_47
ref_46
Agapito (ref_23) 2015; 56
Wang (ref_28) 2005; 29
Arbitrio (ref_2) 2019; 106
Agapito (ref_15) 2020; 36
ref_42
ref_41
Barlow (ref_33) 1989; 1
Nancy (ref_20) 2017; 78
Kuo (ref_24) 2004; 37
Zhang (ref_25) 2022; 12
Guzzi (ref_9) 2016; 17
Agapito (ref_16) 2020; 24
Cui (ref_30) 2003; 4
ref_49
ref_48
Boutros (ref_35) 2005; 6
ref_7
References_xml – ident: ref_22
  doi: 10.1371/journal.pcbi.1002833
– ident: ref_54
  doi: 10.1109/CITSM.2016.7577578
– volume: 6
  start-page: 331
  year: 2005
  ident: ref_35
  article-title: Unsupervised pattern recognition: An introduction to the whys and wherefores of clustering microarray data
  publication-title: Briefings Bioinform.
  doi: 10.1093/bib/6.4.331
– volume: 12
  start-page: 9962
  year: 2022
  ident: ref_25
  article-title: Identification of useful genes from multiple microarrays for ulcerative colitis diagnosis based on machine learning methods
  publication-title: Sci. Rep.
  doi: 10.1038/s41598-022-14048-6
– volume: 382
  start-page: 673
  year: 2014
  ident: ref_69
  article-title: The immune system and inflammation in breast cancer
  publication-title: Mol. Cell. Endocrinol.
  doi: 10.1016/j.mce.2013.06.003
– ident: ref_26
– volume: 1
  start-page: 295
  year: 1989
  ident: ref_33
  article-title: Unsupervised learning
  publication-title: Neural Comput.
  doi: 10.1162/neco.1989.1.3.295
– ident: ref_29
  doi: 10.1186/1471-2105-13-258
– ident: ref_49
  doi: 10.1186/1471-2105-10-11
– ident: ref_67
  doi: 10.1371/journal.pone.0048146
– volume: 6
  start-page: e270
  year: 2020
  ident: ref_27
  article-title: A comparative study of machine learning and deep learning algorithms to classify cancer types based on microarray gene expression data
  publication-title: PeerJ Comput. Sci.
  doi: 10.7717/peerj-cs.270
– volume: 24
  start-page: 17561
  year: 2020
  ident: ref_16
  article-title: cPEA: A parallel method to perform pathway enrichment analysis using multiple pathways databases
  publication-title: Soft Comput.
  doi: 10.1007/s00500-020-05243-6
– volume: 7
  start-page: 85895
  year: 2016
  ident: ref_5
  article-title: Genetic variants associated with gastrointestinal symptoms in Fabry disease
  publication-title: Oncotarget
  doi: 10.18632/oncotarget.13135
– ident: ref_42
– ident: ref_65
  doi: 10.3390/genes12040502
– ident: ref_58
– volume: 39
  start-page: S16
  year: 2007
  ident: ref_6
  article-title: Methods and strategies for analyzing copy number variation using DNA microarrays
  publication-title: Nat. Genet.
  doi: 10.1038/ng2028
– ident: ref_37
  doi: 10.1007/978-1-0716-1839-4_9
– volume: 30
  start-page: 25
  year: 2019
  ident: ref_63
  article-title: Crosstalk between Estrogen Signaling and Breast Cancer Metabolism
  publication-title: Trends Endocrinol. Metab.
  doi: 10.1016/j.tem.2018.10.006
– volume: 4
  start-page: 129
  year: 2002
  ident: ref_3
  article-title: DNA microarray technology: Devices, systems, and applications
  publication-title: Annu. Rev. Biomed. Eng.
  doi: 10.1146/annurev.bioeng.4.020702.153438
– volume: 28
  start-page: 129
  year: 1982
  ident: ref_57
  article-title: Least squares quantization in PCM
  publication-title: IEEE Trans. Inf. Theory
  doi: 10.1109/TIT.1982.1056489
– ident: ref_31
– volume: 48
  start-page: D479
  year: 2020
  ident: ref_62
  article-title: pathDIP 4: An extended pathway annotations and enrichment analysis resource for human, model organisms and domesticated species
  publication-title: Nucleic Acids Res.
– ident: ref_48
– ident: ref_10
– volume: 78
  start-page: 283
  year: 2017
  ident: ref_20
  article-title: A bio-statistical mining approach for classifying multivariate clinical time series data observed at irregular intervals
  publication-title: Expert Syst. Appl.
  doi: 10.1016/j.eswa.2017.01.056
– ident: ref_7
  doi: 10.1007/10_2007_087
– volume: 4
  start-page: 41
  year: 2003
  ident: ref_38
  article-title: Analyzing microarray data using cluster analysis
  publication-title: Pharmacogenomics
  doi: 10.1517/phgs.4.1.41.22581
– ident: ref_13
– ident: ref_47
  doi: 10.1007/978-3-319-27400-3_25
– ident: ref_41
  doi: 10.1038/s41417-022-00520-y
– volume: 13
  start-page: 220
  year: 2011
  ident: ref_70
  article-title: Signal transducer and activator of transcription 5 as a key signaling pathway in normal mammary gland developmental biology and breast cancer
  publication-title: Breast Cancer Res.
  doi: 10.1186/bcr2921
– ident: ref_52
  doi: 10.1007/978-1-4899-7687-1
– volume: 56
  start-page: 273
  year: 2015
  ident: ref_23
  article-title: DMET-Miner: Efficient discovery of association rules from pharmacogenomic data
  publication-title: J. Biomed. Inform.
  doi: 10.1016/j.jbi.2015.06.005
– volume: 14
  start-page: 5959
  year: 2008
  ident: ref_32
  article-title: Statistical challenges in preprocessing in microarray experiments in cancer
  publication-title: Clin. Cancer Res.
  doi: 10.1158/1078-0432.CCR-07-4532
– ident: ref_39
  doi: 10.1145/2393216.2393309
– volume: 77
  start-page: 205
  year: 2016
  ident: ref_4
  article-title: Identification of polymorphic variants associated with erlotinib-related skin toxicity in advanced non-small cell lung cancer patients by DMET microarray analysis
  publication-title: Cancer Chemother. Pharmacol.
  doi: 10.1007/s00280-015-2916-3
– volume: 4
  start-page: 210
  year: 2003
  ident: ref_30
  article-title: Statistical tests for differential expression in cDNA microarray experiments
  publication-title: Genome Biol.
  doi: 10.1186/gb-2003-4-4-210
– volume: 39
  start-page: 359
  year: 2016
  ident: ref_18
  article-title: Data Mining of Differentially Expressed Genes Based on Gene Expression Profiling Microarray
  publication-title: Rev. Téc. Ing. Univ. Zulia.
– ident: ref_34
– ident: ref_46
  doi: 10.1109/EIT.2009.5189632
– volume: 106
  start-page: 45
  year: 2011
  ident: ref_36
  article-title: Unsupervised and supervised learning approaches together for microarray analysis
  publication-title: Fundam. Inform.
  doi: 10.3233/FI-2011-376
– volume: 25
  start-page: 355
  year: 2006
  ident: ref_12
  article-title: Microarray analysis of gene expression: Considerations in data mining and statistical treatment
  publication-title: Physiol. Genom.
  doi: 10.1152/physiolgenomics.00314.2004
– volume: 36
  start-page: 4377
  year: 2020
  ident: ref_15
  article-title: BioPAX-Parser: Parsing and enrichment analysis of BioPAX pathways
  publication-title: Bioinformatics
  doi: 10.1093/bioinformatics/btaa529
– ident: ref_14
– volume: 56
  start-page: 3
  year: 2014
  ident: ref_51
  article-title: Euclidean distance geometry and applications
  publication-title: SIAM Rev.
  doi: 10.1137/120875909
– volume: 37
  start-page: 293
  year: 2004
  ident: ref_24
  article-title: A primer on gene expression and microarrays for machine learning researchers
  publication-title: J. Biomed. Inform.
  doi: 10.1016/j.jbi.2004.07.002
– volume: 7
  start-page: 54028
  year: 2016
  ident: ref_1
  article-title: DMET™(Drug Metabolism Enzymes and Transporters): A pharmacogenomic platform for precision medicine
  publication-title: Oncotarget
  doi: 10.18632/oncotarget.9927
– volume: 17
  start-page: 43
  year: 2005
  ident: ref_8
  article-title: Analysis of microarray data
  publication-title: Oxidative Stress Dis.
  doi: 10.1201/9781420028096.ch3
– volume: 29
  start-page: 37
  year: 2005
  ident: ref_28
  article-title: Gene selection from microarray data for cancer classification—a machine learning approach
  publication-title: Comput. Biol. Chem.
  doi: 10.1016/j.compbiolchem.2004.11.001
– volume: 274
  start-page: 1855
  year: 1996
  ident: ref_68
  article-title: Functions of ceramide in coordinating cellular responses to stress
  publication-title: Science
  doi: 10.1126/science.274.5294.1855
– volume: 32
  start-page: 2142002
  year: 2021
  ident: ref_17
  article-title: Parallel Network Analysis and Communities Detection (PANC) Pipeline for the Analysis and Visualization of COVID-19 Data
  publication-title: Parallel Process. Lett.
  doi: 10.1142/S0129626421420020
– volume: 35
  start-page: D760
  year: 2007
  ident: ref_59
  article-title: NCBI GEO: Mining tens of millions of expression profiles—database and tools update
  publication-title: Nucleic Acids Res.
  doi: 10.1093/nar/gkl887
– volume: 106
  start-page: 422
  year: 2019
  ident: ref_2
  article-title: Polymorphic Variants in NR 1I3 and UGT 2B7 Predict Taxane Neurotoxicity and Have Prognostic Relevance in Patients With Breast Cancer: A Case-Control Study
  publication-title: Clin. Pharmacol. Ther.
  doi: 10.1002/cpt.1391
– ident: ref_61
  doi: 10.3390/cells11020189
– volume: 88
  start-page: 789
  year: 1998
  ident: ref_55
  article-title: On the jaccard similarity test
  publication-title: J. Math. Sci.
  doi: 10.1007/BF02365362
– volume: 19
  start-page: 11
  year: 2022
  ident: ref_56
  article-title: Sorensen-dice similarity indexing based weighted iterative clustering for big data analytics
  publication-title: Int. Arab J. Inf. Technol.
– volume: 16
  start-page: 1840023
  year: 2018
  ident: ref_44
  article-title: Computational identification of physicochemical signatures for host tropism of influenza A virus
  publication-title: J. Bioinform. Comput. Biol.
  doi: 10.1142/S0219720018400231
– volume: 33
  start-page: D562
  year: 2005
  ident: ref_60
  article-title: NCBI GEO: Mining millions of expression profiles—database and tools
  publication-title: Nucleic Acids Res.
  doi: 10.1093/nar/gki022
– ident: ref_64
  doi: 10.1186/1471-2105-15-S2-S10
– volume: 32
  start-page: 496
  year: 2002
  ident: ref_50
  article-title: Microarray data normalization and transformation
  publication-title: Nat. Genet.
  doi: 10.1038/ng1032
– volume: 454
  start-page: 116215
  year: 2022
  ident: ref_66
  article-title: Catalpol induces apoptosis in breast cancer in vitro and in vivo: Involvement of mitochondria apoptosis pathway and post-translational modifications
  publication-title: Toxicol. Appl. Pharmacol.
  doi: 10.1016/j.taap.2022.116215
– volume: 17
  start-page: 553
  year: 2016
  ident: ref_9
  article-title: Methodologies and experimental platforms for generating and analysing microarray and mass spectrometry-based omics data to support P4 medicine
  publication-title: Briefings Bioinform.
  doi: 10.1093/bib/bbv076
– volume: 2
  start-page: 418
  year: 2001
  ident: ref_43
  article-title: Computational analysis of microarray data
  publication-title: Nat. Rev. Genet.
  doi: 10.1038/35076576
– ident: ref_21
  doi: 10.1371/journal.pcbi.1007665
– ident: ref_19
  doi: 10.1186/1471-2407-9-353
– ident: ref_53
  doi: 10.1017/9780511811487
– volume: 10
  start-page: 296
  year: 2006
  ident: ref_40
  article-title: An evolutionary clustering algorithm for gene expression microarray data analysis
  publication-title: IEEE Trans. Evol. Comput.
  doi: 10.1109/TEVC.2005.859371
– volume: 5
  start-page: 1
  year: 2003
  ident: ref_11
  article-title: Microarray data mining: Facing the challenges
  publication-title: ACM SIGKDD Explor. Newsl.
  doi: 10.1145/980972.980974
– volume: 36
  start-page: 1114
  year: 2020
  ident: ref_45
  article-title: Protein–protein interaction site prediction through combining local and global features with deep neural networks
  publication-title: Bioinformatics
  doi: 10.1093/bioinformatics/btz699
SSID ssj0000402005
Score 2.3243093
Snippet Gene expression and SNPs data hold great potential for a new understanding of disease prognosis, drug sensitivity, and toxicity evaluations. Cluster analysis...
SourceID unpaywall
pubmedcentral
proquest
gale
pubmed
crossref
SourceType Open Access Repository
Aggregation Database
Index Database
Enrichment Source
StartPage 1839
SubjectTerms Algorithms
Analysis
Animals
Boidae
Cluster Analysis
Clustering
Computer applications
Data Analysis
Datasets
DNA microarrays
Gene expression
Genomes
Hybridization
Information management
Medical research
Methods
Microarray Analysis
Phenotypes
Prediction models
Sensitivity analysis
Single-nucleotide polymorphism
Toxicity
SummonAdditionalLinks – databaseName: ProQuest Central
  dbid: BENPR
  link: http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwfV3da9RAEB_qFbEvUj8bbWUF0RdDc8luPh6kXM8rRfA41ELfwiS7q8KRnL0cev-9M_nyUrDPMwmb3fnczPwG4E2WyCjTyrpG822Vh5mL0Vi5RqKXSaOTOORu5M_z8PJKfrpW13sw73phuKyys4m1odZlznfkp37EhjSi6ONs9cvlqVH8d7UboYHtaAX9oYYYuwf7PiNjjWD_fDZffOlvXTxOlzzVgG0GlO-ffmeTMg4YuIoHhu84p9smesdH3a6ffLApVrj9jcvljnO6OISHbVQpJo0YPII9UzyG-82cye0TmE7EYssgAWK63DAyAvkr0cGRiMVNWZUkD6K0glGo12L2py2PLcRHrFB8NdX6KVxdzL5NL912eoKby9ivXBt61khKFxLP15zmjD3EMcaYWD9RKpc2I-eceWhCI7UKJGaGLKM2yM26kQ6ewagoC3MEItN-6MdEVdbKXEZoTexpREXu3VJG6MD7btvSvIUW5wkXy5RSDN7ldLDLDrzt2VcNpsb_GN_xGaSsa_S-HNuWAVoVo1alk0iqiELCIHDgeMBJOpIPyd0ppq2OrtN_EuXA657MT3LdWWHKTc2TKGLzaS3Pm0Pvl0yuPyG6dCAaiEPPwMjdQ0rx80eN4E1pI8UOMX1fLzh378SLuz_gJRz43JZRV9ocw6i62ZgTCpaq7FWrAX8BejUU0A
  priority: 102
  providerName: ProQuest
– databaseName: Unpaywall
  dbid: UNPAY
  link: http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwpV3di9NAEF-0h-iL57fxTllB9MVc87Gbj6ej1DsOwaOghfMpzGZ3z8OSlGui1r_-ZpoPmoIi-DyTstv9zc5MMvMbxt6oVMRKS-saTW-rPFAuxL50jQBPCaPTJKJu5E_n0dlcfLyQF1td_FRWian41eaSDhB_rkBUjf0QzXtM3ny81Pb4R_suib55RSl5qdtsL5IYjY_Y3vx8NvlKM-W6pxtqzRCz-_ElXSB-SDRVNB58yxXtXshbHmm3WvJuXSxh_RMWiy1XdLrPoNtEU4Hy_aiu1FH-e4ff8X92-YDdb-NUPmmA9ZDdMsUjdqeZXLl-zKYTPlsT7QCfLmriWkAPyDuCEz67LqsSEcZLy4nXesVPfrUFtwX_ABXwz6ZaPWHz05Mv0zO3ncfg5iIJKtdGnjUCE5DUCzQlTr4H4EMCqQ1SKXNhFbp75YGJjNAyFKAM3rXaALX_xjp8ykZFWZjnjCsdREGCUmmtyEUM1iSeBpAYMFjMMR32vjuaLG_JymlmxiLDpIVOMhucpMPe9urLhqXjT4rv6Jwzsl78vRzaJgRcFfFgZZNYyBiDzDB02OFAE60uH4o7pGSt1a-yICZnH2OE7LDXvZiepEq2wpT1RieVqBbgWp41wOqXjMFEinLhsHgAuV6BuMCHkuLq24YTHKGP0UiC--vB-fd_4sU_ax6wewH1fGzKeA7ZqLquzUuMxCr1qjW2GxZ9LXI
  priority: 102
  providerName: Unpaywall
Title A Python Clustering Analysis Protocol of Genes Expression Data Sets
URI https://www.ncbi.nlm.nih.gov/pubmed/36292724
https://www.proquest.com/docview/2728477566
https://www.proquest.com/docview/2729528429
https://pubmed.ncbi.nlm.nih.gov/PMC9601308
https://www.mdpi.com/2073-4425/13/10/1839/pdf?version=1666692181
UnpaywallVersion publishedVersion
Volume 13
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
journalDatabaseRights – providerCode: PRVAFT
  databaseName: Open Access Digital Library
  customDbUrl:
  eissn: 2073-4425
  dateEnd: 99991231
  omitProxy: true
  ssIdentifier: ssj0000402005
  issn: 2073-4425
  databaseCode: KQ8
  dateStart: 20100101
  isFulltext: true
  titleUrlDefault: http://grweb.coalliance.org/oadl/oadl.html
  providerName: Colorado Alliance of Research Libraries
– providerCode: PRVBFR
  databaseName: Free Medical Journals
  customDbUrl:
  eissn: 2073-4425
  dateEnd: 99991231
  omitProxy: true
  ssIdentifier: ssj0000402005
  issn: 2073-4425
  databaseCode: DIK
  dateStart: 20100101
  isFulltext: true
  titleUrlDefault: http://www.freemedicaljournals.com
  providerName: Flying Publisher
– providerCode: PRVHPJ
  databaseName: ROAD: Directory of Open Access Scholarly Resources
  customDbUrl:
  eissn: 2073-4425
  dateEnd: 99991231
  omitProxy: true
  ssIdentifier: ssj0000402005
  issn: 2073-4425
  databaseCode: M~E
  dateStart: 20100101
  isFulltext: true
  titleUrlDefault: https://road.issn.org
  providerName: ISSN International Centre
– providerCode: PRVAQN
  databaseName: PubMed Central (Free)
  customDbUrl:
  eissn: 2073-4425
  dateEnd: 99991231
  omitProxy: true
  ssIdentifier: ssj0000402005
  issn: 2073-4425
  databaseCode: RPM
  dateStart: 20100101
  isFulltext: true
  titleUrlDefault: https://www.ncbi.nlm.nih.gov/pmc/
  providerName: National Library of Medicine
– providerCode: PRVPQU
  databaseName: ProQuest Central
  customDbUrl: http://www.proquest.com/pqcentral?accountid=15518
  eissn: 2073-4425
  dateEnd: 99991231
  omitProxy: true
  ssIdentifier: ssj0000402005
  issn: 2073-4425
  databaseCode: BENPR
  dateStart: 20100301
  isFulltext: true
  titleUrlDefault: https://www.proquest.com/central
  providerName: ProQuest
– providerCode: PRVFZP
  databaseName: Scholars Portal Journals: Open Access
  customDbUrl:
  eissn: 2073-4425
  dateEnd: 20250831
  omitProxy: true
  ssIdentifier: ssj0000402005
  issn: 2073-4425
  databaseCode: M48
  dateStart: 20100601
  isFulltext: true
  titleUrlDefault: http://journals.scholarsportal.info
  providerName: Scholars Portal
link http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV3db9MwELdgE2IviO8FxmQkBC-EpY4dxw8IldJpQlpVAZXGU2THNiBFyWhTsf733CVpWMaHeIzuYjnns3930fl3hDwziktjhQ-dxb9VkTahliMROq4jw51VaYK3kU9nycmCvz8TZ78ohToDrv6Y2mE_qcWyeHXxffMGNvxrzDghZT_6gqfCKEbuqVhdJ7sAUgq7OJx2kX5zKGOe1BQ0MnDqkIOrtoybv48wQKir5_QloLpaRHlzXZ7rzQ9dFJcQ6vg2udWFlnTc-sIdcs2Vd8mNttnk5h6ZjOl8g0wBdFKskR4BQItuOUnofFnVFTgFrTxFKuoVnV50NbIlfadrTT-6enWfLI6nnyYnYddCIcx5yurQJ5F3HHIGFTGLuc4o0nqkU608U0Lk3BtAaBNplzhuRcy1cXA8Wqfxxq608QOyU1al2yfUWJawFKTCe55zqb1LI6u1AIz3kBYG5OXWbFne8Ytjm4sigzwDrZwNrByQ5736eUus8TfFF7gGGboAjJfr7t4AzAqpq7Kx5EJCXBjHATkYaMJGyYfi7SpmWz_LmER8lhDUBuRpL8Y3sfisdNW60VEC1BjM5WG76P2UAf8VyHlA5MAdegWk7x5Kym9fGxpvyB0hgEjh-3rH-bclHv23JR6TPYbXNJrKmwOyUy_X7gkET7U5JLtvp7P5h8Nme8DTYjYff_4JGYsadw
linkProvider Scholars Portal
linkToHtml http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwtV1Lb9NAEB6VVKhcEG8MBRaJxwWrG3vXj0OFQpoqpW0UQSv1ZtbedYsU2aFxVPLn-G3M-EVcid56nslqPTs7j83MNwDv4lD4sZapbTS9VnEV28rvS9sIxWNhdBh41I18PPHGp-LrmTzbgD9NLwyVVTY2sTTUOk_ojXzH8cmQ-hh9fJ7_smlqFP272ozQUPVoBb1bQozVjR2HZnWFKdxi92APz_u94-yPToZju54yYCcicAo79XhqBIbVIXc0pQN9rlRfBSpMnVDKRKQxOrGYK-MZoaUrVGzQgmijqKnV1y6uewc2hYtr9GDzy2gy_da-8nBKz7iswD1dN-Q752TC-i4BZdGA8jVneN0lrPnE6_WaW8tsrlZXajZbc4b7D-B-HcWyQaV2D2HDZI_gbjXXcvUYhgM2XREoARvOloTEgP6RNfAnbHqZFznqH8tTRqjXCzb6XZfjZmxPFYp9N8XiCZzeihyfQi_LM_McWKwdzwmQKtNUJMJXqQm4VkpiOJFiBmrBp0ZsUVJDmdNEjVmEKQ1JOepI2YIPLfu8wvD4H-NHOoOI7jaul6i6RQF3RShZ0cAX0scQ1HUt2O5w4p1MuuTmFKPaJiyifxpswduWTL-kOrfM5MuSJ5TI5uBenlWH3m4ZQ40Q6cICv6MOLQMhhXcp2c-LEjEc01SMVQL8vlZxbpbEi5s_4A1sjU-Oj6Kjg8nhS7jnUEtIWeWzDb3icmleYaBWxK_r28Dgx21fwL9o8FGg
linkToPdf http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwtV1Zb9QwELZKK44XxE2ggJE4XhqtN7FzPFRo2UMthdUKqNS34MR2i7RKlm5WZf8iv4qZxAmbSvStzzOJHHtOZ-YbQt6kMQ9TJYyrFd5WMZm6MuwLV3PJUq5VHAXYjfxlGhwc808n4mSL_Gl6YbCssrGJlaFWRYZ35D0vREMaQvTRM7YsYjaafFj8cnGCFP5pbcZpSDtmQe1XcGO2yeNIry8gnVvuH47g7N963mT8fXjg2okDbsYjr3RNwIzmEGLHzFOYGvSZlH0Zydh4sRAZNyk4tJRJHWiuhM9lqsGaKC2xwTVUPrz3BtnBn19gJHY-jqezr-2ND8NUjYka6NP3Y9Y7RXPW9xE0C4eVbzjGy-5hwz9ert28vcoXcn0h5_MNxzi5R-7aiJYOahG8T7Z0_oDcrGdcrh-S4YDO1ghQQIfzFaIygK-kDRQKnZ0XZQGySAtDEQF7Sce_bWluTkeylPSbLpePyPG17ONjsp0XuX5KaKq8wIuAKozhGQ-l0RFTUgoILQxkow7Za7YtySysOU7XmCeQ3uAuJ51ddsi7ln1R43n8j_E9nkGCeg7vy6RtV4BVIWJWMgi5CCEc9X2H7HY4QT-zLrk5xcTah2XyT5od8rol45NY85brYlXxxALYPFjLk_rQ2yVD2BEDnTsk7IhDy4Co4V1K_vOsQg-HlBXilgi-rxWcq3fi2dUf8IrcAkVMPh9Oj56TOx52h1QFP7tkuzxf6RcQs5XpS6sMlPy4bv37CwLjVc8
linkToUnpaywall http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwpV3di9NAEF-0h-iL57fxTllB9MVc87Gbj6ej1DsOwaOghfMpzGZ3z8OSlGui1r_-ZpoPmoIi-DyTstv9zc5MMvMbxt6oVMRKS-saTW-rPFAuxL50jQBPCaPTJKJu5E_n0dlcfLyQF1td_FRWian41eaSDhB_rkBUjf0QzXtM3ny81Pb4R_suib55RSl5qdtsL5IYjY_Y3vx8NvlKM-W6pxtqzRCz-_ElXSB-SDRVNB58yxXtXshbHmm3WvJuXSxh_RMWiy1XdLrPoNtEU4Hy_aiu1FH-e4ff8X92-YDdb-NUPmmA9ZDdMsUjdqeZXLl-zKYTPlsT7QCfLmriWkAPyDuCEz67LqsSEcZLy4nXesVPfrUFtwX_ABXwz6ZaPWHz05Mv0zO3ncfg5iIJKtdGnjUCE5DUCzQlTr4H4EMCqQ1SKXNhFbp75YGJjNAyFKAM3rXaALX_xjp8ykZFWZjnjCsdREGCUmmtyEUM1iSeBpAYMFjMMR32vjuaLG_JymlmxiLDpIVOMhucpMPe9urLhqXjT4rv6Jwzsl78vRzaJgRcFfFgZZNYyBiDzDB02OFAE60uH4o7pGSt1a-yICZnH2OE7LDXvZiepEq2wpT1RieVqBbgWp41wOqXjMFEinLhsHgAuV6BuMCHkuLq24YTHKGP0UiC--vB-fd_4sU_ax6wewH1fGzKeA7ZqLquzUuMxCr1qjW2GxZ9LXI
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=A+Python+Clustering+Analysis+Protocol+of+Genes+Expression+Data+Sets&rft.jtitle=Genes&rft.au=Agapito%2C+Giuseppe&rft.au=Milano%2C+Marianna&rft.au=Cannataro%2C+Mario&rft.date=2022-10-12&rft.pub=MDPI+AG&rft.issn=2073-4425&rft.eissn=2073-4425&rft.volume=13&rft.issue=10&rft_id=info:doi/10.3390%2Fgenes13101839&rft.externalDocID=A745723233
thumbnail_l http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=2073-4425&client=summon
thumbnail_m http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=2073-4425&client=summon
thumbnail_s http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=2073-4425&client=summon