Genomics Data Analysis via Spectral Shape and Topology
Mapper, a topological algorithm, is frequently used as an exploratory tool to build a graphical representation of data. This representation can help to gain a better understanding of the intrinsic shape of high-dimensional genomic data and to retain information that may be lost using standard dimens...
        Saved in:
      
    
          | Published in | arXiv.org | 
|---|---|
| Main Authors | , , , | 
| Format | Paper Journal Article | 
| Language | English | 
| Published | 
        Ithaca
          Cornell University Library, arXiv.org
    
        02.11.2022
     | 
| Subjects | |
| Online Access | Get full text | 
| ISSN | 2331-8422 | 
| DOI | 10.48550/arxiv.2211.00938 | 
Cover
| Abstract | Mapper, a topological algorithm, is frequently used as an exploratory tool to build a graphical representation of data. This representation can help to gain a better understanding of the intrinsic shape of high-dimensional genomic data and to retain information that may be lost using standard dimension-reduction algorithms. We propose a novel workflow to process and analyze RNA-seq data from tumor and healthy subjects integrating Mapper and differential gene expression. Precisely, we show that a Gaussian mixture approximation method can be used to produce graphical structures that successfully separate tumor and healthy subjects, and produce two subgroups of tumor subjects. A further analysis using DESeq2, a popular tool for the detection of differentially expressed genes, shows that these two subgroups of tumor cells bear two distinct gene regulations, suggesting two discrete paths for forming lung cancer, which could not be highlighted by other popular clustering methods, including t-SNE. Although Mapper shows promise in analyzing high-dimensional data, building tools to statistically analyze Mapper graphical structures is limited in the existing literature. In this paper, we develop a scoring method using heat kernel signatures that provides an empirical setting for statistical inferences such as hypothesis testing, sensitivity analysis, and correlation analysis. | 
    
|---|---|
| AbstractList | Mapper, a topological algorithm, is frequently used as an exploratory tool to
build a graphical representation of data. This representation can help to gain
a better understanding of the intrinsic shape of high-dimensional genomic data
and to retain information that may be lost using standard dimension-reduction
algorithms. We propose a novel workflow to process and analyze RNA-seq data
from tumor and healthy subjects integrating Mapper and differential gene
expression. Precisely, we show that a Gaussian mixture approximation method can
be used to produce graphical structures that successfully separate tumor and
healthy subjects, and produce two subgroups of tumor subjects. A further
analysis using DESeq2, a popular tool for the detection of differentially
expressed genes, shows that these two subgroups of tumor cells bear two
distinct gene regulations, suggesting two discrete paths for forming lung
cancer, which could not be highlighted by other popular clustering methods,
including t-SNE. Although Mapper shows promise in analyzing high-dimensional
data, building tools to statistically analyze Mapper graphical structures is
limited in the existing literature. In this paper, we develop a scoring method
using heat kernel signatures that provides an empirical setting for statistical
inferences such as hypothesis testing, sensitivity analysis, and correlation
analysis. Mapper, a topological algorithm, is frequently used as an exploratory tool to build a graphical representation of data. This representation can help to gain a better understanding of the intrinsic shape of high-dimensional genomic data and to retain information that may be lost using standard dimension-reduction algorithms. We propose a novel workflow to process and analyze RNA-seq data from tumor and healthy subjects integrating Mapper and differential gene expression. Precisely, we show that a Gaussian mixture approximation method can be used to produce graphical structures that successfully separate tumor and healthy subjects, and produce two subgroups of tumor subjects. A further analysis using DESeq2, a popular tool for the detection of differentially expressed genes, shows that these two subgroups of tumor cells bear two distinct gene regulations, suggesting two discrete paths for forming lung cancer, which could not be highlighted by other popular clustering methods, including t-SNE. Although Mapper shows promise in analyzing high-dimensional data, building tools to statistically analyze Mapper graphical structures is limited in the existing literature. In this paper, we develop a scoring method using heat kernel signatures that provides an empirical setting for statistical inferences such as hypothesis testing, sensitivity analysis, and correlation analysis.  | 
    
| Author | Farzana Nasrin Yoshizawa, Masato Storey, Kathleen M Amézquita, Erik J  | 
    
| Author_xml | – sequence: 1 givenname: Erik surname: Amézquita middlename: J fullname: Amézquita, Erik J – sequence: 2 fullname: Farzana Nasrin – sequence: 3 givenname: Kathleen surname: Storey middlename: M fullname: Storey, Kathleen M – sequence: 4 givenname: Masato surname: Yoshizawa fullname: Yoshizawa, Masato  | 
    
| BackLink | https://doi.org/10.1371/journal.pone.0284820$$DView published paper (Access to full text may be restricted) https://doi.org/10.48550/arXiv.2211.00938$$DView paper in arXiv  | 
    
| BookMark | eNotj8FOwzAQRC0EEqX0AzhhiXOCvRsn7rEqtCBV4tDco43jQKo0DnZa0b8ntFxmLk-jeXfsunOdZexBijjRSoln8j_NMQaQMhZijvqKTQBRRjoBuGWzEHZCCEgzUAonLF3bzu0bE_gLDcQXHbWn0AR-bIhve2sGTy3fflFvOXUVz13vWvd5umc3NbXBzv57yvLVa758izYf6_flYhORAozKirRKamsTlUEJWiKhFXbMuTCKqtRoLLUuNaGo7XiKrDRJqQFBZ6UBnLLHy-xZquh9syd_Kv7kirPcSDxdiN6774MNQ7FzBz9ahAIylKkUWYr4C7wiUmg | 
    
| ContentType | Paper Journal Article  | 
    
| Copyright | 2022. This work is published under http://arxiv.org/licenses/nonexclusive-distrib/1.0/ (the “License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License. http://arxiv.org/licenses/nonexclusive-distrib/1.0  | 
    
| Copyright_xml | – notice: 2022. This work is published under http://arxiv.org/licenses/nonexclusive-distrib/1.0/ (the “License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License. – notice: http://arxiv.org/licenses/nonexclusive-distrib/1.0  | 
    
| DBID | 8FE 8FG ABJCF ABUWG AFKRA AZQEC BENPR BGLVJ CCPQU DWQXO HCIFZ L6V M7S PHGZM PHGZT PIMPY PKEHL PQEST PQGLB PQQKQ PQUKI PRINS PTHSS AKZ ALC EPD GOX  | 
    
| DOI | 10.48550/arxiv.2211.00938 | 
    
| DatabaseName | ProQuest SciTech Collection ProQuest Technology Collection Materials Science & Engineering Collection ProQuest Central (Alumni) ProQuest Central UK/Ireland ProQuest Central Essentials ProQuest Central Technology Collection ProQuest One Community College ProQuest Central SciTech Premium Collection ProQuest Engineering Collection Engineering Database ProQuest Central Premium ProQuest One Academic ProQuest Publicly Available Content Database ProQuest One Academic Middle East (New) ProQuest One Academic Eastern Edition (DO NOT USE) ProQuest One Applied & Life Sciences ProQuest One Academic ProQuest One Academic UKI Edition ProQuest Central China Engineering Collection arXiv Mathematics arXiv Quantitative Biology arXiv Statistics arXiv.org  | 
    
| DatabaseTitle | Publicly Available Content Database Engineering Database Technology Collection ProQuest One Academic Middle East (New) ProQuest Central Essentials ProQuest One Academic Eastern Edition ProQuest Central (Alumni Edition) SciTech Premium Collection ProQuest One Community College ProQuest Technology Collection ProQuest SciTech Collection ProQuest Central China ProQuest Central ProQuest One Applied & Life Sciences ProQuest Engineering Collection ProQuest One Academic UKI Edition ProQuest Central Korea Materials Science & Engineering Collection ProQuest Central (New) ProQuest One Academic ProQuest One Academic (New) Engineering Collection  | 
    
| DatabaseTitleList | Publicly Available Content Database  | 
    
| Database_xml | – sequence: 1 dbid: GOX name: arXiv.org url: http://arxiv.org/find sourceTypes: Open Access Repository – sequence: 2 dbid: 8FG name: ProQuest Technology Collection url: https://search.proquest.com/technologycollection1 sourceTypes: Aggregation Database  | 
    
| DeliveryMethod | fulltext_linktorsrc | 
    
| Discipline | Physics | 
    
| EISSN | 2331-8422 | 
    
| ExternalDocumentID | 2211_00938 | 
    
| Genre | Working Paper/Pre-Print | 
    
| GroupedDBID | 8FE 8FG ABJCF ABUWG AFKRA ALMA_UNASSIGNED_HOLDINGS AZQEC BENPR BGLVJ CCPQU DWQXO FRJ HCIFZ L6V M7S M~E PHGZM PHGZT PIMPY PKEHL PQEST PQGLB PQQKQ PQUKI PRINS PTHSS AKZ ALC EPD GOX  | 
    
| ID | FETCH-LOGICAL-a523-bda854fee4572b2813a3e0e3a390c5ad6c83b88b8a30fe026ae1c4b823287bc23 | 
    
| IEDL.DBID | BENPR | 
    
| IngestDate | Tue Jul 22 23:13:41 EDT 2025 Mon Jun 30 09:30:46 EDT 2025  | 
    
| IsDoiOpenAccess | true | 
    
| IsOpenAccess | true | 
    
| IsPeerReviewed | false | 
    
| IsScholarly | false | 
    
| Language | English | 
    
| LinkModel | DirectLink | 
    
| MergedId | FETCHMERGED-LOGICAL-a523-bda854fee4572b2813a3e0e3a390c5ad6c83b88b8a30fe026ae1c4b823287bc23 | 
    
| Notes | SourceType-Working Papers-1 ObjectType-Working Paper/Pre-Print-1 content type line 50  | 
    
| OpenAccessLink | https://www.proquest.com/docview/2731610763?pq-origsite=%requestingapplication%&accountid=15518 | 
    
| PQID | 2731610763 | 
    
| PQPubID | 2050157 | 
    
| ParticipantIDs | arxiv_primary_2211_00938 proquest_journals_2731610763  | 
    
| PublicationCentury | 2000 | 
    
| PublicationDate | 20221102 2022-11-02  | 
    
| PublicationDateYYYYMMDD | 2022-11-02 | 
    
| PublicationDate_xml | – month: 11 year: 2022 text: 20221102 day: 02  | 
    
| PublicationDecade | 2020 | 
    
| PublicationPlace | Ithaca | 
    
| PublicationPlace_xml | – name: Ithaca | 
    
| PublicationTitle | arXiv.org | 
    
| PublicationYear | 2022 | 
    
| Publisher | Cornell University Library, arXiv.org | 
    
| Publisher_xml | – name: Cornell University Library, arXiv.org | 
    
| SSID | ssj0002672553 | 
    
| Score | 1.814917 | 
    
| SecondaryResourceType | preprint | 
    
| Snippet | Mapper, a topological algorithm, is frequently used as an exploratory tool to build a graphical representation of data. This representation can help to gain a... Mapper, a topological algorithm, is frequently used as an exploratory tool to build a graphical representation of data. This representation can help to gain a...  | 
    
| SourceID | arxiv proquest  | 
    
| SourceType | Open Access Repository Aggregation Database  | 
    
| SubjectTerms | Algorithms Clustering Correlation analysis Data analysis Dimensional analysis Empirical analysis Gene expression Graphical representations Hypothesis testing Mathematics - Algebraic Topology Quantitative Biology - Genomics Sensitivity analysis Statistics - Other Statistics Subgroups Topology Tumors Workflow  | 
    
| SummonAdditionalLinks | – databaseName: arXiv.org dbid: GOX link: http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwdV09TwMxDLXaTiwIBKiFgjKwnkjzcc2NCCgVEjBQpG4nO-cTXVDVlor-e5LcVQyIJUOUDHGi2M-J3wO4tk575SrM8mpUZ4YCTimQfIaSc8Ra28rE4uTnl3z6bp7mdt4Bsa-FwdX3YtvwA9P6RqnIsBlAt-tCNwQKsZj3dd48TiYqrnb877gQY6auP1dr8heTIzhsAz1x2-zMMXT48wTyR05lwGtxjxsUe0YQsV2giErwMe0g3j5wySIgfDFrFAx2pzCbPMzuplmrXJBhAHYZVeisqZmNHStSbqRRs-TQFtJbrHLvNDlHDrWsOaAg5JE35EJ048bklT6DXgD_3AdRR2UMWRXWEhnvNSEGVDkmqdkXXhYD6Kf1lsuGnKKMpiiTKQYw3JugbA_mulRRqSpAvlyf_z_zAg5U_OUfs6dqCL3N6osvg-_d0FXagB-d14Os priority: 102 providerName: Cornell University  | 
    
| Title | Genomics Data Analysis via Spectral Shape and Topology | 
    
| URI | https://www.proquest.com/docview/2731610763 https://arxiv.org/abs/2211.00938  | 
    
| hasFullText | 1 | 
    
| inHoldings | 1 | 
    
| isFullTextHit | |
| isPrint | |
| link | http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwfV1LSwMxEB76QPDmk1ZrycHr2m2yj-xBBLUPhNaiFXpbJo_FXra1rUUv_naTdFcPgpdAsuwhkzCZb5L5PoDLkDNJuUIvUt3MC4TBKQkK6aGvI8SMhSqwxcmjcTR8CR5m4awC47IWxj6rLH2ic9RqIW2OvEOtxJLBKhG7Wb55VjXK3q6WEhpYSCuoa0cxVoU6tcxYNajf9saTp5-sC41iE0Oz3fWmI_Pq4Opjvr2i1HJ5GnjPTZTqhv44Z3fi9A-gPsGlXh1CRedHsOceasr1MUQD7QqJ1-QeN0hKThGynSOxWvI2cUGeX82_BHNFpjsNhM8TmPZ707uhV2gfeGigoScU8jDItA7CmArKuwyZ9rVpE1-GqCLJmeBccGR-ps20UHdlILiJj3gsJGWnUMsXuW4Ayay2hq-SMBQikJIJRINLY-EzLRPpJ01ouPmmyx29RWpNkTpTNKFVmiAttvY6_V2Is_8_n8M-tbUCNgdLW1DbrN71hTnBN6INVd4ftIvFMb3B48y0o6_eN3hNnfw | 
    
| linkProvider | ProQuest | 
    
| linkToHtml | http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwtV27TsMwFLVQKwQbT7U8PcAYSG0ncYYKCVoo0FYIitQtun5EdAmlKYV-HP-G7SYwILF1yZAoQ64d33uufc5B6CTgVBKuwAtVI_WYMDglBiE98HUIkNJAMUtO7vXDzjO7GwbDFfRVcmHsscpyTXQLtXqVtkd-TqzFksEqIb0Yv3nWNcrurpYWGlBYK6imkxgriB33ev5hIFzevG2Z8T4l5Lo9uOp4hcuABwaEeUIBD1iqNQsiIghvUKDa1-Ya-zIAFUpOBeeCA_VTbRAL6IZkgptKhEdCWt0DkwGqjLLYYL_qZbv_8PjT5CFhZEp2uthNddph5zD5HM3OCLHSoX5saTFVd-tPLnAJ7noDVR9grCebaEVnW2jVnQuV-TYKb7TjLee4BVPApYQJno0AW-t62yfBTy_mXQyZwoOF5cJ8Bw2WEYRdVMleM11DOLVWHr6Kg0AIJiUVAAYGR8KnWsbSj-uo5r43GS_UNBIbisSFoo4OyhAkxZ-UJ7_jvvf_42O01hn0ukn3tn-_j9aJpSnY9i85QJXp5F0fmuJhKo6KIcIoWfKk-Aau3dc6 | 
    
| openUrl | ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Genomics+Data+Analysis+via+Spectral+Shape+and+Topology&rft.jtitle=arXiv.org&rft.au=Am%C3%A9zquita%2C+Erik+J&rft.au=Farzana+Nasrin&rft.au=Storey%2C+Kathleen+M&rft.au=Yoshizawa%2C+Masato&rft.date=2022-11-02&rft.pub=Cornell+University+Library%2C+arXiv.org&rft.eissn=2331-8422&rft_id=info:doi/10.48550%2Farxiv.2211.00938 |