A Python Clustering Analysis Protocol of Genes Expression Data Sets
Gene expression and SNPs data hold great potential for a new understanding of disease prognosis, drug sensitivity, and toxicity evaluations. Cluster analysis is used to analyze data that do not contain any specific subgroups. The goal is to use the data itself to recognize meaningful and informative...
Saved in:
| Published in | Genes Vol. 13; no. 10; p. 1839 |
|---|---|
| Main Authors | , , |
| Format | Journal Article |
| Language | English |
| Published |
Switzerland
MDPI AG
12.10.2022
MDPI |
| Subjects | |
| Online Access | Get full text |
| ISSN | 2073-4425 2073-4425 |
| DOI | 10.3390/genes13101839 |
Cover
| Abstract | Gene expression and SNPs data hold great potential for a new understanding of disease prognosis, drug sensitivity, and toxicity evaluations. Cluster analysis is used to analyze data that do not contain any specific subgroups. The goal is to use the data itself to recognize meaningful and informative subgroups. In addition, cluster investigation helps data reduction purposes, exposes hidden patterns, and generates hypotheses regarding the relationship between genes and phenotypes. Cluster analysis could also be used to identify bio-markers and yield computational predictive models. The methods used to analyze microarrays data can profoundly influence the interpretation of the results. Therefore, a basic understanding of these computational tools is necessary for optimal experimental design and meaningful data analysis. This manuscript provides an analysis protocol to effectively analyze gene expression data sets through the K-means and DBSCAN algorithms. The general protocol enables analyzing omics data to identify subsets of features with low redundancy and high robustness, speeding up the identification of new bio-markers through pathway enrichment analysis. In addition, to demonstrate the effectiveness of our clustering analysis protocol, we analyze a real data set from the GEO database. Finally, the manuscript provides some best practice and tips to overcome some issues in the analysis of omics data sets through unsupervised learning. |
|---|---|
| AbstractList | Gene expression and SNPs data hold great potential for a new understanding of disease prognosis, drug sensitivity, and toxicity evaluations. Cluster analysis is used to analyze data that do not contain any specific subgroups. The goal is to use the data itself to recognize meaningful and informative subgroups. In addition, cluster investigation helps data reduction purposes, exposes hidden patterns, and generates hypotheses regarding the relationship between genes and phenotypes. Cluster analysis could also be used to identify bio-markers and yield computational predictive models. The methods used to analyze microarrays data can profoundly influence the interpretation of the results. Therefore, a basic understanding of these computational tools is necessary for optimal experimental design and meaningful data analysis. This manuscript provides an analysis protocol to effectively analyze gene expression data sets through the K-means and DBSCAN algorithms. The general protocol enables analyzing omics data to identify subsets of features with low redundancy and high robustness, speeding up the identification of new bio-markers through pathway enrichment analysis. In addition, to demonstrate the effectiveness of our clustering analysis protocol, we analyze a real data set from the GEO database. Finally, the manuscript provides some best practice and tips to overcome some issues in the analysis of omics data sets through unsupervised learning. Gene expression and SNPs data hold great potential for a new understanding of disease prognosis, drug sensitivity, and toxicity evaluations. Cluster analysis is used to analyze data that do not contain any specific subgroups. The goal is to use the data itself to recognize meaningful and informative subgroups. In addition, cluster investigation helps data reduction purposes, exposes hidden patterns, and generates hypotheses regarding the relationship between genes and phenotypes. Cluster analysis could also be used to identify bio-markers and yield computational predictive models. The methods used to analyze microarrays data can profoundly influence the interpretation of the results. Therefore, a basic understanding of these computational tools is necessary for optimal experimental design and meaningful data analysis. This manuscript provides an analysis protocol to effectively analyze gene expression data sets through the K-means and DBSCAN algorithms. The general protocol enables analyzing omics data to identify subsets of features with low redundancy and high robustness, speeding up the identification of new bio-markers through pathway enrichment analysis. In addition, to demonstrate the effectiveness of our clustering analysis protocol, we analyze a real data set from the GEO database. Finally, the manuscript provides some best practice and tips to overcome some issues in the analysis of omics data sets through unsupervised learning.Gene expression and SNPs data hold great potential for a new understanding of disease prognosis, drug sensitivity, and toxicity evaluations. Cluster analysis is used to analyze data that do not contain any specific subgroups. The goal is to use the data itself to recognize meaningful and informative subgroups. In addition, cluster investigation helps data reduction purposes, exposes hidden patterns, and generates hypotheses regarding the relationship between genes and phenotypes. Cluster analysis could also be used to identify bio-markers and yield computational predictive models. The methods used to analyze microarrays data can profoundly influence the interpretation of the results. Therefore, a basic understanding of these computational tools is necessary for optimal experimental design and meaningful data analysis. This manuscript provides an analysis protocol to effectively analyze gene expression data sets through the K-means and DBSCAN algorithms. The general protocol enables analyzing omics data to identify subsets of features with low redundancy and high robustness, speeding up the identification of new bio-markers through pathway enrichment analysis. In addition, to demonstrate the effectiveness of our clustering analysis protocol, we analyze a real data set from the GEO database. Finally, the manuscript provides some best practice and tips to overcome some issues in the analysis of omics data sets through unsupervised learning. |
| Audience | Academic |
| Author | Milano, Marianna Cannataro, Mario Agapito, Giuseppe |
| AuthorAffiliation | 1 Department of Law, Economics and Social Sciences, University Magna Græcia of Catanzaro, 88100 Catanzaro, Italy 3 Department of Medical and Clinical Surgery, University Magna Græcia of Catanzaro, 88100 Catanzaro, Italy 2 Data Analytics Research Center, University Magna Græcia of Catanzaro, 88100 Catanzaro, Italy |
| AuthorAffiliation_xml | – name: 2 Data Analytics Research Center, University Magna Græcia of Catanzaro, 88100 Catanzaro, Italy – name: 1 Department of Law, Economics and Social Sciences, University Magna Græcia of Catanzaro, 88100 Catanzaro, Italy – name: 3 Department of Medical and Clinical Surgery, University Magna Græcia of Catanzaro, 88100 Catanzaro, Italy |
| Author_xml | – sequence: 1 givenname: Giuseppe orcidid: 0000-0003-2868-7732 surname: Agapito fullname: Agapito, Giuseppe – sequence: 2 givenname: Marianna orcidid: 0000-0003-1561-725X surname: Milano fullname: Milano, Marianna – sequence: 3 givenname: Mario orcidid: 0000-0003-1502-2387 surname: Cannataro fullname: Cannataro, Mario |
| BackLink | https://www.ncbi.nlm.nih.gov/pubmed/36292724$$D View this record in MEDLINE/PubMed |
| BookMark | eNqFUs1vFCEUJ6bG1tqjVzOJFy9TGb5muJhs1lpNmthEPZO3M2-2NCyswFT3v5d1q9ttYoQDBH4fvN_jOTnywSMhLxt6zrmmb5foMTW8oU3H9RNywmjLayGYPHqwPyZnKd3SMgRllMpn5JgrplnLxAmZz6rrTb4Jvpq7KWWM1i-rmQe3STZV1zHk0AdXhbG63HpVFz_XEVOyhfAeMlRfMKcX5OkILuHZ_XpKvn24-Dr_WF99vvw0n13VvehYrkdFRxS6FZqyQSomGwrQQAd6ZFrKXowLptSCAioUg-QCFtgqNiBoLnU78FNyvtOd_Bo2P8A5s452BXFjGmq2gZiDQArh3Y6wnhYrHHr0OcKeFMCawxtvb8wy3BmtaMNpVwTe3AvE8H3ClM3Kph6dA49hSqZkqCXrBNt6vX4EvQ1TLEH-RnWibaVSe9QSHBrrx1B8-62ombVCtowzzveFHqDKHHBl-_IFRlvODwivHhb6t8I_fS6AegfoY0gp4vjf5PgjfG8z5NL28hLr_sH6BQX4yXo |
| CitedBy_id | crossref_primary_10_1002_cam4_5581 crossref_primary_10_7454_jessd_v7i2_1258 crossref_primary_10_26641_2307_0404_2024_1_300508 crossref_primary_10_3390_genes14020412 crossref_primary_10_3390_genes15060714 crossref_primary_10_3390_ijms24098236 |
| Cites_doi | 10.1371/journal.pcbi.1002833 10.1109/CITSM.2016.7577578 10.1093/bib/6.4.331 10.1038/s41598-022-14048-6 10.1016/j.mce.2013.06.003 10.1162/neco.1989.1.3.295 10.1186/1471-2105-13-258 10.1186/1471-2105-10-11 10.1371/journal.pone.0048146 10.7717/peerj-cs.270 10.1007/s00500-020-05243-6 10.18632/oncotarget.13135 10.3390/genes12040502 10.1038/ng2028 10.1007/978-1-0716-1839-4_9 10.1016/j.tem.2018.10.006 10.1146/annurev.bioeng.4.020702.153438 10.1109/TIT.1982.1056489 10.1016/j.eswa.2017.01.056 10.1007/10_2007_087 10.1517/phgs.4.1.41.22581 10.1007/978-3-319-27400-3_25 10.1038/s41417-022-00520-y 10.1186/bcr2921 10.1007/978-1-4899-7687-1 10.1016/j.jbi.2015.06.005 10.1158/1078-0432.CCR-07-4532 10.1145/2393216.2393309 10.1007/s00280-015-2916-3 10.1186/gb-2003-4-4-210 10.1109/EIT.2009.5189632 10.3233/FI-2011-376 10.1152/physiolgenomics.00314.2004 10.1093/bioinformatics/btaa529 10.1137/120875909 10.1016/j.jbi.2004.07.002 10.18632/oncotarget.9927 10.1201/9781420028096.ch3 10.1016/j.compbiolchem.2004.11.001 10.1126/science.274.5294.1855 10.1142/S0129626421420020 10.1093/nar/gkl887 10.1002/cpt.1391 10.3390/cells11020189 10.1007/BF02365362 10.1142/S0219720018400231 10.1093/nar/gki022 10.1186/1471-2105-15-S2-S10 10.1038/ng1032 10.1016/j.taap.2022.116215 10.1093/bib/bbv076 10.1038/35076576 10.1371/journal.pcbi.1007665 10.1186/1471-2407-9-353 10.1017/9780511811487 10.1109/TEVC.2005.859371 10.1145/980972.980974 10.1093/bioinformatics/btz699 |
| ContentType | Journal Article |
| Copyright | COPYRIGHT 2022 MDPI AG 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License. 2022 by the authors. 2022 |
| Copyright_xml | – notice: COPYRIGHT 2022 MDPI AG – notice: 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License. – notice: 2022 by the authors. 2022 |
| DBID | AAYXX CITATION CGR CUY CVF ECM EIF NPM 8FD 8FE 8FH ABUWG AFKRA AZQEC BBNVY BENPR BHPHI CCPQU COVID DWQXO FR3 GNUQQ HCIFZ LK8 M7P P64 PHGZM PHGZT PIMPY PKEHL PQEST PQGLB PQQKQ PQUKI PRINS RC3 7X8 5PM ADTOC UNPAY |
| DOI | 10.3390/genes13101839 |
| DatabaseName | CrossRef Medline MEDLINE MEDLINE (Ovid) MEDLINE MEDLINE PubMed Technology Research Database ProQuest SciTech Collection ProQuest Natural Science Journals ProQuest Central (Alumni) ProQuest Central UK/Ireland ProQuest Central Essentials Biological Science Collection ProQuest Central Natural Science Collection ProQuest One Community College Coronavirus Research Database ProQuest Central Engineering Research Database ProQuest Central Student SciTech Premium Collection (Proquest) Biological Sciences Biological Science Database Biotechnology and BioEngineering Abstracts ProQuest Central Premium ProQuest One Academic Publicly Available Content Database ProQuest One Academic Middle East (New) ProQuest One Academic Eastern Edition (DO NOT USE) ProQuest One Applied & Life Sciences ProQuest One Academic ProQuest One Academic UKI Edition ProQuest Central China Genetics Abstracts MEDLINE - Academic PubMed Central (Full Participant titles) Unpaywall for CDI: Periodical Content Unpaywall |
| DatabaseTitle | CrossRef MEDLINE Medline Complete MEDLINE with Full Text PubMed MEDLINE (Ovid) Publicly Available Content Database ProQuest Central Student Technology Research Database ProQuest One Academic Middle East (New) ProQuest Central Essentials ProQuest Central (Alumni Edition) SciTech Premium Collection ProQuest One Community College ProQuest Natural Science Collection ProQuest Central China ProQuest Central ProQuest One Applied & Life Sciences Genetics Abstracts Natural Science Collection ProQuest Central Korea Biological Science Collection ProQuest Central (New) ProQuest Biological Science Collection ProQuest One Academic Eastern Edition Coronavirus Research Database Biological Science Database ProQuest SciTech Collection Biotechnology and BioEngineering Abstracts ProQuest One Academic UKI Edition Engineering Research Database ProQuest One Academic ProQuest One Academic (New) MEDLINE - Academic |
| DatabaseTitleList | CrossRef MEDLINE MEDLINE - Academic Publicly Available Content Database |
| Database_xml | – sequence: 1 dbid: NPM name: PubMed url: https://proxy.k.utb.cz/login?url=http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=PubMed sourceTypes: Index Database – sequence: 2 dbid: EIF name: MEDLINE url: https://proxy.k.utb.cz/login?url=https://www.webofscience.com/wos/medline/basic-search sourceTypes: Index Database – sequence: 3 dbid: UNPAY name: Unpaywall url: https://proxy.k.utb.cz/login?url=https://unpaywall.org/ sourceTypes: Open Access Repository – sequence: 4 dbid: BENPR name: ProQuest Central url: http://www.proquest.com/pqcentral?accountid=15518 sourceTypes: Aggregation Database |
| DeliveryMethod | fulltext_linktorsrc |
| Discipline | Biology |
| EISSN | 2073-4425 |
| ExternalDocumentID | 10.3390/genes13101839 PMC9601308 A745723233 36292724 10_3390_genes13101839 |
| Genre | Journal Article |
| GeographicLocations | Italy |
| GeographicLocations_xml | – name: Italy |
| GroupedDBID | --- 53G 5VS 8FE 8FH AADQD AAFWJ AAHBH AAYXX ADBBV AENEX AFKRA AFZYC ALMA_UNASSIGNED_HOLDINGS AOIJS BAWUL BBNVY BCNDV BENPR BHPHI CCPQU CITATION DIK EBD HCIFZ HYE IAO IHR ITC KQ8 LK8 M48 M7P MODMG M~E OK1 PGMZT PHGZM PHGZT PIMPY PQGLB PROAC RPM CGR CUY CVF ECM EIF NPM 8FD ABUWG AZQEC COVID DWQXO FR3 GNUQQ P64 PKEHL PQEST PQQKQ PQUKI PRINS RC3 7X8 PUEGO 5PM ADRAZ ADTOC C1A IPNFZ RIG UNPAY |
| ID | FETCH-LOGICAL-c482t-f60fe4974902d562510aa1a8a9f2955c4fb266b0ae6e4d534abe762dea93597d3 |
| IEDL.DBID | M48 |
| ISSN | 2073-4425 |
| IngestDate | Sun Oct 26 03:54:10 EDT 2025 Tue Sep 30 17:18:04 EDT 2025 Wed Oct 01 14:59:04 EDT 2025 Fri Jul 25 11:58:23 EDT 2025 Mon Oct 20 22:54:22 EDT 2025 Mon Oct 20 16:59:47 EDT 2025 Mon Jul 21 06:08:02 EDT 2025 Thu Apr 24 23:11:47 EDT 2025 Thu Oct 16 04:38:27 EDT 2025 |
| IsDoiOpenAccess | true |
| IsOpenAccess | true |
| IsPeerReviewed | true |
| IsScholarly | true |
| Issue | 10 |
| Keywords | SNPs DEGs clustering data mining microarrays unsupervised learning |
| Language | English |
| License | Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/). cc-by |
| LinkModel | DirectLink |
| MergedId | FETCHMERGED-LOGICAL-c482t-f60fe4974902d562510aa1a8a9f2955c4fb266b0ae6e4d534abe762dea93597d3 |
| Notes | ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14 content type line 23 |
| ORCID | 0000-0003-2868-7732 0000-0003-1561-725X 0000-0003-1502-2387 |
| OpenAccessLink | https://www.proquest.com/docview/2728477566?pq-origsite=%requestingapplication%&accountid=15518 |
| PMID | 36292724 |
| PQID | 2728477566 |
| PQPubID | 2032392 |
| ParticipantIDs | unpaywall_primary_10_3390_genes13101839 pubmedcentral_primary_oai_pubmedcentral_nih_gov_9601308 proquest_miscellaneous_2729528429 proquest_journals_2728477566 gale_infotracmisc_A745723233 gale_infotracacademiconefile_A745723233 pubmed_primary_36292724 crossref_primary_10_3390_genes13101839 crossref_citationtrail_10_3390_genes13101839 |
| PublicationCentury | 2000 |
| PublicationDate | 20221012 |
| PublicationDateYYYYMMDD | 2022-10-12 |
| PublicationDate_xml | – month: 10 year: 2022 text: 20221012 day: 12 |
| PublicationDecade | 2020 |
| PublicationPlace | Switzerland |
| PublicationPlace_xml | – name: Switzerland – name: Basel |
| PublicationTitle | Genes |
| PublicationTitleAlternate | Genes (Basel) |
| PublicationYear | 2022 |
| Publisher | MDPI AG MDPI |
| Publisher_xml | – name: MDPI AG – name: MDPI |
| References | Zeng (ref_45) 2020; 36 Quackenbush (ref_50) 2002; 32 Agapito (ref_17) 2021; 32 Annathurai (ref_56) 2022; 19 Liberti (ref_51) 2014; 56 ref_14 ref_58 Barrett (ref_60) 2005; 33 ref_13 ref_10 ref_54 ref_53 ref_52 Barrett (ref_59) 2007; 35 Liu (ref_66) 2022; 454 Ma (ref_40) 2006; 10 ref_19 Tamayo (ref_11) 2003; 5 Carter (ref_6) 2007; 39 ref_61 Lloyd (ref_57) 1982; 28 Arbitrio (ref_4) 2016; 77 ref_67 ref_22 ref_21 ref_65 ref_64 Shannon (ref_38) 2003; 4 Quackenbush (ref_43) 2001; 2 ref_29 ref_26 Ivchenko (ref_55) 1998; 88 Jiang (ref_69) 2014; 382 Mills (ref_8) 2005; 17 Owzar (ref_32) 2008; 14 Yin (ref_44) 2018; 16 Rahmati (ref_62) 2020; 48 Scionti (ref_5) 2016; 7 Miao (ref_18) 2016; 39 Heller (ref_3) 2002; 4 ref_34 Hannun (ref_68) 1996; 274 ref_31 Saha (ref_36) 2011; 106 Furth (ref_70) 2011; 13 ref_39 ref_37 Arca (ref_63) 2019; 30 Bucheli (ref_27) 2020; 6 Verducci (ref_12) 2006; 25 Arbitrio (ref_1) 2016; 7 ref_47 ref_46 Agapito (ref_23) 2015; 56 Wang (ref_28) 2005; 29 Arbitrio (ref_2) 2019; 106 Agapito (ref_15) 2020; 36 ref_42 ref_41 Barlow (ref_33) 1989; 1 Nancy (ref_20) 2017; 78 Kuo (ref_24) 2004; 37 Zhang (ref_25) 2022; 12 Guzzi (ref_9) 2016; 17 Agapito (ref_16) 2020; 24 Cui (ref_30) 2003; 4 ref_49 ref_48 Boutros (ref_35) 2005; 6 ref_7 |
| References_xml | – ident: ref_22 doi: 10.1371/journal.pcbi.1002833 – ident: ref_54 doi: 10.1109/CITSM.2016.7577578 – volume: 6 start-page: 331 year: 2005 ident: ref_35 article-title: Unsupervised pattern recognition: An introduction to the whys and wherefores of clustering microarray data publication-title: Briefings Bioinform. doi: 10.1093/bib/6.4.331 – volume: 12 start-page: 9962 year: 2022 ident: ref_25 article-title: Identification of useful genes from multiple microarrays for ulcerative colitis diagnosis based on machine learning methods publication-title: Sci. Rep. doi: 10.1038/s41598-022-14048-6 – volume: 382 start-page: 673 year: 2014 ident: ref_69 article-title: The immune system and inflammation in breast cancer publication-title: Mol. Cell. Endocrinol. doi: 10.1016/j.mce.2013.06.003 – ident: ref_26 – volume: 1 start-page: 295 year: 1989 ident: ref_33 article-title: Unsupervised learning publication-title: Neural Comput. doi: 10.1162/neco.1989.1.3.295 – ident: ref_29 doi: 10.1186/1471-2105-13-258 – ident: ref_49 doi: 10.1186/1471-2105-10-11 – ident: ref_67 doi: 10.1371/journal.pone.0048146 – volume: 6 start-page: e270 year: 2020 ident: ref_27 article-title: A comparative study of machine learning and deep learning algorithms to classify cancer types based on microarray gene expression data publication-title: PeerJ Comput. Sci. doi: 10.7717/peerj-cs.270 – volume: 24 start-page: 17561 year: 2020 ident: ref_16 article-title: cPEA: A parallel method to perform pathway enrichment analysis using multiple pathways databases publication-title: Soft Comput. doi: 10.1007/s00500-020-05243-6 – volume: 7 start-page: 85895 year: 2016 ident: ref_5 article-title: Genetic variants associated with gastrointestinal symptoms in Fabry disease publication-title: Oncotarget doi: 10.18632/oncotarget.13135 – ident: ref_42 – ident: ref_65 doi: 10.3390/genes12040502 – ident: ref_58 – volume: 39 start-page: S16 year: 2007 ident: ref_6 article-title: Methods and strategies for analyzing copy number variation using DNA microarrays publication-title: Nat. Genet. doi: 10.1038/ng2028 – ident: ref_37 doi: 10.1007/978-1-0716-1839-4_9 – volume: 30 start-page: 25 year: 2019 ident: ref_63 article-title: Crosstalk between Estrogen Signaling and Breast Cancer Metabolism publication-title: Trends Endocrinol. Metab. doi: 10.1016/j.tem.2018.10.006 – volume: 4 start-page: 129 year: 2002 ident: ref_3 article-title: DNA microarray technology: Devices, systems, and applications publication-title: Annu. Rev. Biomed. Eng. doi: 10.1146/annurev.bioeng.4.020702.153438 – volume: 28 start-page: 129 year: 1982 ident: ref_57 article-title: Least squares quantization in PCM publication-title: IEEE Trans. Inf. Theory doi: 10.1109/TIT.1982.1056489 – ident: ref_31 – volume: 48 start-page: D479 year: 2020 ident: ref_62 article-title: pathDIP 4: An extended pathway annotations and enrichment analysis resource for human, model organisms and domesticated species publication-title: Nucleic Acids Res. – ident: ref_48 – ident: ref_10 – volume: 78 start-page: 283 year: 2017 ident: ref_20 article-title: A bio-statistical mining approach for classifying multivariate clinical time series data observed at irregular intervals publication-title: Expert Syst. Appl. doi: 10.1016/j.eswa.2017.01.056 – ident: ref_7 doi: 10.1007/10_2007_087 – volume: 4 start-page: 41 year: 2003 ident: ref_38 article-title: Analyzing microarray data using cluster analysis publication-title: Pharmacogenomics doi: 10.1517/phgs.4.1.41.22581 – ident: ref_13 – ident: ref_47 doi: 10.1007/978-3-319-27400-3_25 – ident: ref_41 doi: 10.1038/s41417-022-00520-y – volume: 13 start-page: 220 year: 2011 ident: ref_70 article-title: Signal transducer and activator of transcription 5 as a key signaling pathway in normal mammary gland developmental biology and breast cancer publication-title: Breast Cancer Res. doi: 10.1186/bcr2921 – ident: ref_52 doi: 10.1007/978-1-4899-7687-1 – volume: 56 start-page: 273 year: 2015 ident: ref_23 article-title: DMET-Miner: Efficient discovery of association rules from pharmacogenomic data publication-title: J. Biomed. Inform. doi: 10.1016/j.jbi.2015.06.005 – volume: 14 start-page: 5959 year: 2008 ident: ref_32 article-title: Statistical challenges in preprocessing in microarray experiments in cancer publication-title: Clin. Cancer Res. doi: 10.1158/1078-0432.CCR-07-4532 – ident: ref_39 doi: 10.1145/2393216.2393309 – volume: 77 start-page: 205 year: 2016 ident: ref_4 article-title: Identification of polymorphic variants associated with erlotinib-related skin toxicity in advanced non-small cell lung cancer patients by DMET microarray analysis publication-title: Cancer Chemother. Pharmacol. doi: 10.1007/s00280-015-2916-3 – volume: 4 start-page: 210 year: 2003 ident: ref_30 article-title: Statistical tests for differential expression in cDNA microarray experiments publication-title: Genome Biol. doi: 10.1186/gb-2003-4-4-210 – volume: 39 start-page: 359 year: 2016 ident: ref_18 article-title: Data Mining of Differentially Expressed Genes Based on Gene Expression Profiling Microarray publication-title: Rev. Téc. Ing. Univ. Zulia. – ident: ref_34 – ident: ref_46 doi: 10.1109/EIT.2009.5189632 – volume: 106 start-page: 45 year: 2011 ident: ref_36 article-title: Unsupervised and supervised learning approaches together for microarray analysis publication-title: Fundam. Inform. doi: 10.3233/FI-2011-376 – volume: 25 start-page: 355 year: 2006 ident: ref_12 article-title: Microarray analysis of gene expression: Considerations in data mining and statistical treatment publication-title: Physiol. Genom. doi: 10.1152/physiolgenomics.00314.2004 – volume: 36 start-page: 4377 year: 2020 ident: ref_15 article-title: BioPAX-Parser: Parsing and enrichment analysis of BioPAX pathways publication-title: Bioinformatics doi: 10.1093/bioinformatics/btaa529 – ident: ref_14 – volume: 56 start-page: 3 year: 2014 ident: ref_51 article-title: Euclidean distance geometry and applications publication-title: SIAM Rev. doi: 10.1137/120875909 – volume: 37 start-page: 293 year: 2004 ident: ref_24 article-title: A primer on gene expression and microarrays for machine learning researchers publication-title: J. Biomed. Inform. doi: 10.1016/j.jbi.2004.07.002 – volume: 7 start-page: 54028 year: 2016 ident: ref_1 article-title: DMET™(Drug Metabolism Enzymes and Transporters): A pharmacogenomic platform for precision medicine publication-title: Oncotarget doi: 10.18632/oncotarget.9927 – volume: 17 start-page: 43 year: 2005 ident: ref_8 article-title: Analysis of microarray data publication-title: Oxidative Stress Dis. doi: 10.1201/9781420028096.ch3 – volume: 29 start-page: 37 year: 2005 ident: ref_28 article-title: Gene selection from microarray data for cancer classification—a machine learning approach publication-title: Comput. Biol. Chem. doi: 10.1016/j.compbiolchem.2004.11.001 – volume: 274 start-page: 1855 year: 1996 ident: ref_68 article-title: Functions of ceramide in coordinating cellular responses to stress publication-title: Science doi: 10.1126/science.274.5294.1855 – volume: 32 start-page: 2142002 year: 2021 ident: ref_17 article-title: Parallel Network Analysis and Communities Detection (PANC) Pipeline for the Analysis and Visualization of COVID-19 Data publication-title: Parallel Process. Lett. doi: 10.1142/S0129626421420020 – volume: 35 start-page: D760 year: 2007 ident: ref_59 article-title: NCBI GEO: Mining tens of millions of expression profiles—database and tools update publication-title: Nucleic Acids Res. doi: 10.1093/nar/gkl887 – volume: 106 start-page: 422 year: 2019 ident: ref_2 article-title: Polymorphic Variants in NR 1I3 and UGT 2B7 Predict Taxane Neurotoxicity and Have Prognostic Relevance in Patients With Breast Cancer: A Case-Control Study publication-title: Clin. Pharmacol. Ther. doi: 10.1002/cpt.1391 – ident: ref_61 doi: 10.3390/cells11020189 – volume: 88 start-page: 789 year: 1998 ident: ref_55 article-title: On the jaccard similarity test publication-title: J. Math. Sci. doi: 10.1007/BF02365362 – volume: 19 start-page: 11 year: 2022 ident: ref_56 article-title: Sorensen-dice similarity indexing based weighted iterative clustering for big data analytics publication-title: Int. Arab J. Inf. Technol. – volume: 16 start-page: 1840023 year: 2018 ident: ref_44 article-title: Computational identification of physicochemical signatures for host tropism of influenza A virus publication-title: J. Bioinform. Comput. Biol. doi: 10.1142/S0219720018400231 – volume: 33 start-page: D562 year: 2005 ident: ref_60 article-title: NCBI GEO: Mining millions of expression profiles—database and tools publication-title: Nucleic Acids Res. doi: 10.1093/nar/gki022 – ident: ref_64 doi: 10.1186/1471-2105-15-S2-S10 – volume: 32 start-page: 496 year: 2002 ident: ref_50 article-title: Microarray data normalization and transformation publication-title: Nat. Genet. doi: 10.1038/ng1032 – volume: 454 start-page: 116215 year: 2022 ident: ref_66 article-title: Catalpol induces apoptosis in breast cancer in vitro and in vivo: Involvement of mitochondria apoptosis pathway and post-translational modifications publication-title: Toxicol. Appl. Pharmacol. doi: 10.1016/j.taap.2022.116215 – volume: 17 start-page: 553 year: 2016 ident: ref_9 article-title: Methodologies and experimental platforms for generating and analysing microarray and mass spectrometry-based omics data to support P4 medicine publication-title: Briefings Bioinform. doi: 10.1093/bib/bbv076 – volume: 2 start-page: 418 year: 2001 ident: ref_43 article-title: Computational analysis of microarray data publication-title: Nat. Rev. Genet. doi: 10.1038/35076576 – ident: ref_21 doi: 10.1371/journal.pcbi.1007665 – ident: ref_19 doi: 10.1186/1471-2407-9-353 – ident: ref_53 doi: 10.1017/9780511811487 – volume: 10 start-page: 296 year: 2006 ident: ref_40 article-title: An evolutionary clustering algorithm for gene expression microarray data analysis publication-title: IEEE Trans. Evol. Comput. doi: 10.1109/TEVC.2005.859371 – volume: 5 start-page: 1 year: 2003 ident: ref_11 article-title: Microarray data mining: Facing the challenges publication-title: ACM SIGKDD Explor. Newsl. doi: 10.1145/980972.980974 – volume: 36 start-page: 1114 year: 2020 ident: ref_45 article-title: Protein–protein interaction site prediction through combining local and global features with deep neural networks publication-title: Bioinformatics doi: 10.1093/bioinformatics/btz699 |
| SSID | ssj0000402005 |
| Score | 2.3243093 |
| Snippet | Gene expression and SNPs data hold great potential for a new understanding of disease prognosis, drug sensitivity, and toxicity evaluations. Cluster analysis... |
| SourceID | unpaywall pubmedcentral proquest gale pubmed crossref |
| SourceType | Open Access Repository Aggregation Database Index Database Enrichment Source |
| StartPage | 1839 |
| SubjectTerms | Algorithms Analysis Animals Boidae Cluster Analysis Clustering Computer applications Data Analysis Datasets DNA microarrays Gene expression Genomes Hybridization Information management Medical research Methods Microarray Analysis Phenotypes Prediction models Sensitivity analysis Single-nucleotide polymorphism Toxicity |
| SummonAdditionalLinks | – databaseName: ProQuest Central dbid: BENPR link: http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwfV3da9RAEB_qFbEvUj8bbWUF0RdDc8luPh6kXM8rRfA41ELfwiS7q8KRnL0cev-9M_nyUrDPMwmb3fnczPwG4E2WyCjTyrpG822Vh5mL0Vi5RqKXSaOTOORu5M_z8PJKfrpW13sw73phuKyys4m1odZlznfkp37EhjSi6ONs9cvlqVH8d7UboYHtaAX9oYYYuwf7PiNjjWD_fDZffOlvXTxOlzzVgG0GlO-ffmeTMg4YuIoHhu84p9smesdH3a6ffLApVrj9jcvljnO6OISHbVQpJo0YPII9UzyG-82cye0TmE7EYssgAWK63DAyAvkr0cGRiMVNWZUkD6K0glGo12L2py2PLcRHrFB8NdX6KVxdzL5NL912eoKby9ivXBt61khKFxLP15zmjD3EMcaYWD9RKpc2I-eceWhCI7UKJGaGLKM2yM26kQ6ewagoC3MEItN-6MdEVdbKXEZoTexpREXu3VJG6MD7btvSvIUW5wkXy5RSDN7ldLDLDrzt2VcNpsb_GN_xGaSsa_S-HNuWAVoVo1alk0iqiELCIHDgeMBJOpIPyd0ppq2OrtN_EuXA657MT3LdWWHKTc2TKGLzaS3Pm0Pvl0yuPyG6dCAaiEPPwMjdQ0rx80eN4E1pI8UOMX1fLzh378SLuz_gJRz43JZRV9ocw6i62ZgTCpaq7FWrAX8BejUU0A priority: 102 providerName: ProQuest – databaseName: Unpaywall dbid: UNPAY link: http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwpV3di9NAEF-0h-iL57fxTllB9MVc87Gbj6ej1DsOwaOghfMpzGZ3z8OSlGui1r_-ZpoPmoIi-DyTstv9zc5MMvMbxt6oVMRKS-saTW-rPFAuxL50jQBPCaPTJKJu5E_n0dlcfLyQF1td_FRWian41eaSDhB_rkBUjf0QzXtM3ny81Pb4R_suib55RSl5qdtsL5IYjY_Y3vx8NvlKM-W6pxtqzRCz-_ElXSB-SDRVNB58yxXtXshbHmm3WvJuXSxh_RMWiy1XdLrPoNtEU4Hy_aiu1FH-e4ff8X92-YDdb-NUPmmA9ZDdMsUjdqeZXLl-zKYTPlsT7QCfLmriWkAPyDuCEz67LqsSEcZLy4nXesVPfrUFtwX_ABXwz6ZaPWHz05Mv0zO3ncfg5iIJKtdGnjUCE5DUCzQlTr4H4EMCqQ1SKXNhFbp75YGJjNAyFKAM3rXaALX_xjp8ykZFWZjnjCsdREGCUmmtyEUM1iSeBpAYMFjMMR32vjuaLG_JymlmxiLDpIVOMhucpMPe9urLhqXjT4rv6Jwzsl78vRzaJgRcFfFgZZNYyBiDzDB02OFAE60uH4o7pGSt1a-yICZnH2OE7LDXvZiepEq2wpT1RieVqBbgWp41wOqXjMFEinLhsHgAuV6BuMCHkuLq24YTHKGP0UiC--vB-fd_4sU_ax6wewH1fGzKeA7ZqLquzUuMxCr1qjW2GxZ9LXI priority: 102 providerName: Unpaywall |
| Title | A Python Clustering Analysis Protocol of Genes Expression Data Sets |
| URI | https://www.ncbi.nlm.nih.gov/pubmed/36292724 https://www.proquest.com/docview/2728477566 https://www.proquest.com/docview/2729528429 https://pubmed.ncbi.nlm.nih.gov/PMC9601308 https://www.mdpi.com/2073-4425/13/10/1839/pdf?version=1666692181 |
| UnpaywallVersion | publishedVersion |
| Volume | 13 |
| hasFullText | 1 |
| inHoldings | 1 |
| isFullTextHit | |
| isPrint | |
| journalDatabaseRights | – providerCode: PRVAFT databaseName: Open Access Digital Library customDbUrl: eissn: 2073-4425 dateEnd: 99991231 omitProxy: true ssIdentifier: ssj0000402005 issn: 2073-4425 databaseCode: KQ8 dateStart: 20100101 isFulltext: true titleUrlDefault: http://grweb.coalliance.org/oadl/oadl.html providerName: Colorado Alliance of Research Libraries – providerCode: PRVBFR databaseName: Free Medical Journals customDbUrl: eissn: 2073-4425 dateEnd: 99991231 omitProxy: true ssIdentifier: ssj0000402005 issn: 2073-4425 databaseCode: DIK dateStart: 20100101 isFulltext: true titleUrlDefault: http://www.freemedicaljournals.com providerName: Flying Publisher – providerCode: PRVHPJ databaseName: ROAD: Directory of Open Access Scholarly Resources customDbUrl: eissn: 2073-4425 dateEnd: 99991231 omitProxy: true ssIdentifier: ssj0000402005 issn: 2073-4425 databaseCode: M~E dateStart: 20100101 isFulltext: true titleUrlDefault: https://road.issn.org providerName: ISSN International Centre – providerCode: PRVAQN databaseName: PubMed Central (Free) customDbUrl: eissn: 2073-4425 dateEnd: 99991231 omitProxy: true ssIdentifier: ssj0000402005 issn: 2073-4425 databaseCode: RPM dateStart: 20100101 isFulltext: true titleUrlDefault: https://www.ncbi.nlm.nih.gov/pmc/ providerName: National Library of Medicine – providerCode: PRVPQU databaseName: ProQuest Central customDbUrl: http://www.proquest.com/pqcentral?accountid=15518 eissn: 2073-4425 dateEnd: 99991231 omitProxy: true ssIdentifier: ssj0000402005 issn: 2073-4425 databaseCode: BENPR dateStart: 20100301 isFulltext: true titleUrlDefault: https://www.proquest.com/central providerName: ProQuest – providerCode: PRVFZP databaseName: Scholars Portal Journals: Open Access customDbUrl: eissn: 2073-4425 dateEnd: 20250831 omitProxy: true ssIdentifier: ssj0000402005 issn: 2073-4425 databaseCode: M48 dateStart: 20100601 isFulltext: true titleUrlDefault: http://journals.scholarsportal.info providerName: Scholars Portal |
| link | http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV3db9MwELdgE2IviO8FxmQkBC-EpY4dxw8IldJpQlpVAZXGU2THNiBFyWhTsf733CVpWMaHeIzuYjnns3930fl3hDwziktjhQ-dxb9VkTahliMROq4jw51VaYK3kU9nycmCvz8TZ78ohToDrv6Y2mE_qcWyeHXxffMGNvxrzDghZT_6gqfCKEbuqVhdJ7sAUgq7OJx2kX5zKGOe1BQ0MnDqkIOrtoybv48wQKir5_QloLpaRHlzXZ7rzQ9dFJcQ6vg2udWFlnTc-sIdcs2Vd8mNttnk5h6ZjOl8g0wBdFKskR4BQItuOUnofFnVFTgFrTxFKuoVnV50NbIlfadrTT-6enWfLI6nnyYnYddCIcx5yurQJ5F3HHIGFTGLuc4o0nqkU608U0Lk3BtAaBNplzhuRcy1cXA8Wqfxxq608QOyU1al2yfUWJawFKTCe55zqb1LI6u1AIz3kBYG5OXWbFne8Ytjm4sigzwDrZwNrByQ5736eUus8TfFF7gGGboAjJfr7t4AzAqpq7Kx5EJCXBjHATkYaMJGyYfi7SpmWz_LmER8lhDUBuRpL8Y3sfisdNW60VEC1BjM5WG76P2UAf8VyHlA5MAdegWk7x5Kym9fGxpvyB0hgEjh-3rH-bclHv23JR6TPYbXNJrKmwOyUy_X7gkET7U5JLtvp7P5h8Nme8DTYjYff_4JGYsadw |
| linkProvider | Scholars Portal |
| linkToHtml | http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwtV1Lb9NAEB6VVKhcEG8MBRaJxwWrG3vXj0OFQpoqpW0UQSv1ZtbedYsU2aFxVPLn-G3M-EVcid56nslqPTs7j83MNwDv4lD4sZapbTS9VnEV28rvS9sIxWNhdBh41I18PPHGp-LrmTzbgD9NLwyVVTY2sTTUOk_ojXzH8cmQ-hh9fJ7_smlqFP272ozQUPVoBb1bQozVjR2HZnWFKdxi92APz_u94-yPToZju54yYCcicAo79XhqBIbVIXc0pQN9rlRfBSpMnVDKRKQxOrGYK-MZoaUrVGzQgmijqKnV1y6uewc2hYtr9GDzy2gy_da-8nBKz7iswD1dN-Q752TC-i4BZdGA8jVneN0lrPnE6_WaW8tsrlZXajZbc4b7D-B-HcWyQaV2D2HDZI_gbjXXcvUYhgM2XREoARvOloTEgP6RNfAnbHqZFznqH8tTRqjXCzb6XZfjZmxPFYp9N8XiCZzeihyfQi_LM_McWKwdzwmQKtNUJMJXqQm4VkpiOJFiBmrBp0ZsUVJDmdNEjVmEKQ1JOepI2YIPLfu8wvD4H-NHOoOI7jaul6i6RQF3RShZ0cAX0scQ1HUt2O5w4p1MuuTmFKPaJiyifxpswduWTL-kOrfM5MuSJ5TI5uBenlWH3m4ZQ40Q6cICv6MOLQMhhXcp2c-LEjEc01SMVQL8vlZxbpbEi5s_4A1sjU-Oj6Kjg8nhS7jnUEtIWeWzDb3icmleYaBWxK_r28Dgx21fwL9o8FGg |
| linkToPdf | http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwtV1Zb9QwELZKK44XxE2ggJE4XhqtN7FzPFRo2UMthdUKqNS34MR2i7RKlm5WZf8iv4qZxAmbSvStzzOJHHtOZ-YbQt6kMQ9TJYyrFd5WMZm6MuwLV3PJUq5VHAXYjfxlGhwc808n4mSL_Gl6YbCssrGJlaFWRYZ35D0vREMaQvTRM7YsYjaafFj8cnGCFP5pbcZpSDtmQe1XcGO2yeNIry8gnVvuH47g7N963mT8fXjg2okDbsYjr3RNwIzmEGLHzFOYGvSZlH0Zydh4sRAZNyk4tJRJHWiuhM9lqsGaKC2xwTVUPrz3BtnBn19gJHY-jqezr-2ND8NUjYka6NP3Y9Y7RXPW9xE0C4eVbzjGy-5hwz9ert28vcoXcn0h5_MNxzi5R-7aiJYOahG8T7Z0_oDcrGdcrh-S4YDO1ghQQIfzFaIygK-kDRQKnZ0XZQGySAtDEQF7Sce_bWluTkeylPSbLpePyPG17ONjsp0XuX5KaKq8wIuAKozhGQ-l0RFTUgoILQxkow7Za7YtySysOU7XmCeQ3uAuJ51ddsi7ln1R43n8j_E9nkGCeg7vy6RtV4BVIWJWMgi5CCEc9X2H7HY4QT-zLrk5xcTah2XyT5od8rol45NY85brYlXxxALYPFjLk_rQ2yVD2BEDnTsk7IhDy4Co4V1K_vOsQg-HlBXilgi-rxWcq3fi2dUf8IrcAkVMPh9Oj56TOx52h1QFP7tkuzxf6RcQs5XpS6sMlPy4bv37CwLjVc8 |
| linkToUnpaywall | http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwpV3di9NAEF-0h-iL57fxTllB9MVc87Gbj6ej1DsOwaOghfMpzGZ3z8OSlGui1r_-ZpoPmoIi-DyTstv9zc5MMvMbxt6oVMRKS-saTW-rPFAuxL50jQBPCaPTJKJu5E_n0dlcfLyQF1td_FRWian41eaSDhB_rkBUjf0QzXtM3ny81Pb4R_suib55RSl5qdtsL5IYjY_Y3vx8NvlKM-W6pxtqzRCz-_ElXSB-SDRVNB58yxXtXshbHmm3WvJuXSxh_RMWiy1XdLrPoNtEU4Hy_aiu1FH-e4ff8X92-YDdb-NUPmmA9ZDdMsUjdqeZXLl-zKYTPlsT7QCfLmriWkAPyDuCEz67LqsSEcZLy4nXesVPfrUFtwX_ABXwz6ZaPWHz05Mv0zO3ncfg5iIJKtdGnjUCE5DUCzQlTr4H4EMCqQ1SKXNhFbp75YGJjNAyFKAM3rXaALX_xjp8ykZFWZjnjCsdREGCUmmtyEUM1iSeBpAYMFjMMR32vjuaLG_JymlmxiLDpIVOMhucpMPe9urLhqXjT4rv6Jwzsl78vRzaJgRcFfFgZZNYyBiDzDB02OFAE60uH4o7pGSt1a-yICZnH2OE7LDXvZiepEq2wpT1RieVqBbgWp41wOqXjMFEinLhsHgAuV6BuMCHkuLq24YTHKGP0UiC--vB-fd_4sU_ax6wewH1fGzKeA7ZqLquzUuMxCr1qjW2GxZ9LXI |
| openUrl | ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=A+Python+Clustering+Analysis+Protocol+of+Genes+Expression+Data+Sets&rft.jtitle=Genes&rft.au=Agapito%2C+Giuseppe&rft.au=Milano%2C+Marianna&rft.au=Cannataro%2C+Mario&rft.date=2022-10-12&rft.pub=MDPI+AG&rft.issn=2073-4425&rft.eissn=2073-4425&rft.volume=13&rft.issue=10&rft_id=info:doi/10.3390%2Fgenes13101839&rft.externalDocID=A745723233 |
| thumbnail_l | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=2073-4425&client=summon |
| thumbnail_m | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=2073-4425&client=summon |
| thumbnail_s | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=2073-4425&client=summon |