Predictive Analytics on Genomic Data with High-Performance Computing
Recent technological advancements and scientific discoveries have revolutionized the current era of genomics. Next-generation sequencing (NGS) technologies have led to tremendous reduction in the sequencing time and given rise to the production and collection of high volumes of genomic datasets. Pre...
        Saved in:
      
    
          | Published in | 2020 IEEE International Conference on Bioinformatics and Biomedicine (BIBM) pp. 2187 - 2194 | 
|---|---|
| Main Authors | , , | 
| Format | Conference Proceeding | 
| Language | English | 
| Published | 
            IEEE
    
        16.12.2020
     | 
| Subjects | |
| Online Access | Get full text | 
| DOI | 10.1109/BIBM49941.2020.9312982 | 
Cover
| Abstract | Recent technological advancements and scientific discoveries have revolutionized the current era of genomics. Next-generation sequencing (NGS) technologies have led to tremendous reduction in the sequencing time and given rise to the production and collection of high volumes of genomic datasets. Predicting protein-coding genes from these copious genomic datasets is significant for the synthesis of protein and the understating of the regulatory function of the non-coding region. Methods have been developed to find protein-coding genes from the genome of organisms. Notwithstanding, the recent data explosion in genomics accentuates the need for more efficient algorithms for gene prediction. In this paper, we explore predictive analytics on genomic data. In particular, we present a scalable naïve Bayes-based algorithm that is deployed over a cluster of Apache Spark framework for efficient prediction of genes in the genome of eukaryotic organisms. Evaluation results on the human genome chromosome GRCh37 and GRCh38 show that effectiveness of our algorithm for predictive analytics on genomic data with high-performance computing. high sensitivity, specificity and accuracy. | 
    
|---|---|
| AbstractList | Recent technological advancements and scientific discoveries have revolutionized the current era of genomics. Next-generation sequencing (NGS) technologies have led to tremendous reduction in the sequencing time and given rise to the production and collection of high volumes of genomic datasets. Predicting protein-coding genes from these copious genomic datasets is significant for the synthesis of protein and the understating of the regulatory function of the non-coding region. Methods have been developed to find protein-coding genes from the genome of organisms. Notwithstanding, the recent data explosion in genomics accentuates the need for more efficient algorithms for gene prediction. In this paper, we explore predictive analytics on genomic data. In particular, we present a scalable naïve Bayes-based algorithm that is deployed over a cluster of Apache Spark framework for efficient prediction of genes in the genome of eukaryotic organisms. Evaluation results on the human genome chromosome GRCh37 and GRCh38 show that effectiveness of our algorithm for predictive analytics on genomic data with high-performance computing. high sensitivity, specificity and accuracy. | 
    
| Author | Leung, Carson K. Sarumi, Oluwafemi A. Zhang, Christine Y.  | 
    
| Author_xml | – sequence: 1 givenname: Carson K. surname: Leung fullname: Leung, Carson K. email: kleung@cs.umanitoba.ca organization: University of Manitoba,Department of Computer Science,Winnipeg,MB,Canada – sequence: 2 givenname: Oluwafemi A. surname: Sarumi fullname: Sarumi, Oluwafemi A. organization: Federal University of Technology Akure (FUTA),Department of Computer Science,Akure,Ondo State,Nigeria – sequence: 3 givenname: Christine Y. surname: Zhang fullname: Zhang, Christine Y. organization: University of Manitoba,Department of Immunology,Winnipeg,MB,Canada  | 
    
| BookMark | eNotz0FPwjAUwPGayEGRT2Bi-gU2-_q2te8IQ4EEIwc8k668QRPWkVE1fHsPcvrffsn_UdzHPrIQL6ByAEWvs9XsoyAqINdKq5wQNFl9JyZkLBhtodJQmgcx3wy8Dz6FH5bT6E7XFPxF9lEuOPZd8HLukpO_IR3lMhyO2YaHth86Fz3Luu_O3ynEw5MYte504cmtY_H1_ratl9n6c7Gqp-ssaIUp82gK9tr7prXIlfXEJThsCoVNuWdFaBygso2tsPLUki5NqzxVhVJIUOJYPP-7gZl35yF0brjubm_4BxU0SEo | 
    
| ContentType | Conference Proceeding | 
    
| DBID | 6IE 6IL CBEJK RIE RIL  | 
    
| DOI | 10.1109/BIBM49941.2020.9312982 | 
    
| DatabaseName | IEEE Electronic Library (IEL) Conference Proceedings IEEE Xplore POP ALL IEEE Xplore All Conference Proceedings IEEE Xplore IEEE Proceedings Order Plans (POP All) 1998-Present  | 
    
| DatabaseTitleList | |
| Database_xml | – sequence: 1 dbid: RIE name: IEEE Electronic Library (IEL) url: https://proxy.k.utb.cz/login?url=https://ieeexplore.ieee.org/ sourceTypes: Publisher  | 
    
| DeliveryMethod | fulltext_linktorsrc | 
    
| EISBN | 9781728162157 1728162157  | 
    
| EndPage | 2194 | 
    
| ExternalDocumentID | 9312982 | 
    
| Genre | orig-research | 
    
| GrantInformation_xml | – fundername: Association of Commonwealth Universities funderid: 10.13039/501100000531 – fundername: Natural Sciences and Engineering Research Council of Canada funderid: 10.13039/501100000038  | 
    
| GroupedDBID | 6IE 6IL CBEJK RIE RIL  | 
    
| ID | FETCH-LOGICAL-i203t-c374ec2ccbf83e68c9e51a3b403b5de0937a1308b8636c9f9257f0c9640039153 | 
    
| IEDL.DBID | RIE | 
    
| IngestDate | Thu Jun 29 18:38:07 EDT 2023 | 
    
| IsPeerReviewed | false | 
    
| IsScholarly | false | 
    
| Language | English | 
    
| LinkModel | DirectLink | 
    
| MergedId | FETCHMERGED-LOGICAL-i203t-c374ec2ccbf83e68c9e51a3b403b5de0937a1308b8636c9f9257f0c9640039153 | 
    
| PageCount | 8 | 
    
| ParticipantIDs | ieee_primary_9312982 | 
    
| PublicationCentury | 2000 | 
    
| PublicationDate | 2020-Dec.-16 | 
    
| PublicationDateYYYYMMDD | 2020-12-16 | 
    
| PublicationDate_xml | – month: 12 year: 2020 text: 2020-Dec.-16 day: 16  | 
    
| PublicationDecade | 2020 | 
    
| PublicationTitle | 2020 IEEE International Conference on Bioinformatics and Biomedicine (BIBM) | 
    
| PublicationTitleAbbrev | BIBM | 
    
| PublicationYear | 2020 | 
    
| Publisher | IEEE | 
    
| Publisher_xml | – name: IEEE | 
    
| Score | 1.8538567 | 
    
| Snippet | Recent technological advancements and scientific discoveries have revolutionized the current era of genomics. Next-generation sequencing (NGS) technologies... | 
    
| SourceID | ieee | 
    
| SourceType | Publisher | 
    
| StartPage | 2187 | 
    
| SubjectTerms | Apache Spark big data Bioinformatics data mining data science gene prediction Genomics high performance computing machine learning Organisms Prediction algorithms Proteins Sparks Training  | 
    
| Title | Predictive Analytics on Genomic Data with High-Performance Computing | 
    
| URI | https://ieeexplore.ieee.org/document/9312982 | 
    
| hasFullText | 1 | 
    
| inHoldings | 1 | 
    
| isFullTextHit | |
| isPrint | |
| link | http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV3LS8MwHA5zJ08qm_gmB4-mS5v3dc45hckODnYbSZqACJ2M9uJfb5LWDcWDt1BCmgfJl8f3fT8AbqnAPmeWIMu5RlTJEsnCUCSYEZ7KUhQ2sXxf-GxJn1ds1QN3Oy2Mcy6Rz1wWk-ktv9zYJl6VjRQJ6CTDgnsgJG-1Wp3oN8dqNH4az8P-ncZTX4GzLvOPqCkJNKZHYP79u5Yr8p41tcns5y8nxv_W5xgM9_I8uNgBzwnouWoAJottfHOJqxdMTiPRfxluKvjokvIYTnStYbx2hZHbgRZ7xQBsQzuEsoZgOX14vZ-hLkQCeiswqZElgjpbWGu8JI5LqxzLNTEUE8NKh8PmQweUkkZywq3yKsxQj63itLWGJ6egX20qdwYgFsZyIlg4YjBKdKl4SQ2TulCecer9ORjEHlh_tC4Y667xF39_vgSHcRQi8SPnV6Bfbxt3HeC7Njdp3L4ASDearw | 
    
| linkProvider | IEEE | 
    
| linkToHtml | http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV3NT8IwHG0IHvSkBozf9uDRjm79tWuviAgKhAMk3MjatYkxAUPGxb_edpsQjQdvzdJ0_Uj7-vHe-yF0Dyl1MTeMGCEyAkrmRCYaSMp16kDmaWJKlu9EDObwsuCLBnrYaWGstSX5zEYhWb7l52uzDVdlHcU8Okm_4B5wAOCVWquW_cZUdbrD7tjv4CGc-xIa1dl_xE0pYaN_jMbfP6zYIu_RttCR-fzlxfjfGp2g9l6gh6c76DlFDbtqod50E15dwvqFS6-R4MCM1yv8bEvtMe5lRYbDxSsO7A4y3WsGcBXcwZfVRvP-0-xxQOogCeQtoawghqVgTWKMdpJZIY2yPM6YBso0zy3124_M45TUUjBhlFN-jjpqlIDKHJ6doeZqvbLnCNNUG8FS7g8ZHFiWK5GD5jJLlOMCnLtArdADy4_KB2NZN_7y78936HAwG4-Wo-Hk9QodhREJNJBYXKNmsdnaGw_mhb4tx_ALlBSd_A | 
    
| openUrl | ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Abook&rft.genre=proceeding&rft.title=2020+IEEE+International+Conference+on+Bioinformatics+and+Biomedicine+%28BIBM%29&rft.atitle=Predictive+Analytics+on+Genomic+Data+with+High-Performance+Computing&rft.au=Leung%2C+Carson+K.&rft.au=Sarumi%2C+Oluwafemi+A.&rft.au=Zhang%2C+Christine+Y.&rft.date=2020-12-16&rft.pub=IEEE&rft.spage=2187&rft.epage=2194&rft_id=info:doi/10.1109%2FBIBM49941.2020.9312982&rft.externalDocID=9312982 |