Predictive Analytics on Genomic Data with High-Performance Computing

Recent technological advancements and scientific discoveries have revolutionized the current era of genomics. Next-generation sequencing (NGS) technologies have led to tremendous reduction in the sequencing time and given rise to the production and collection of high volumes of genomic datasets. Pre...

Full description

Saved in:
Bibliographic Details
Published in2020 IEEE International Conference on Bioinformatics and Biomedicine (BIBM) pp. 2187 - 2194
Main Authors Leung, Carson K., Sarumi, Oluwafemi A., Zhang, Christine Y.
Format Conference Proceeding
LanguageEnglish
Published IEEE 16.12.2020
Subjects
Online AccessGet full text
DOI10.1109/BIBM49941.2020.9312982

Cover

Abstract Recent technological advancements and scientific discoveries have revolutionized the current era of genomics. Next-generation sequencing (NGS) technologies have led to tremendous reduction in the sequencing time and given rise to the production and collection of high volumes of genomic datasets. Predicting protein-coding genes from these copious genomic datasets is significant for the synthesis of protein and the understating of the regulatory function of the non-coding region. Methods have been developed to find protein-coding genes from the genome of organisms. Notwithstanding, the recent data explosion in genomics accentuates the need for more efficient algorithms for gene prediction. In this paper, we explore predictive analytics on genomic data. In particular, we present a scalable naïve Bayes-based algorithm that is deployed over a cluster of Apache Spark framework for efficient prediction of genes in the genome of eukaryotic organisms. Evaluation results on the human genome chromosome GRCh37 and GRCh38 show that effectiveness of our algorithm for predictive analytics on genomic data with high-performance computing. high sensitivity, specificity and accuracy.
AbstractList Recent technological advancements and scientific discoveries have revolutionized the current era of genomics. Next-generation sequencing (NGS) technologies have led to tremendous reduction in the sequencing time and given rise to the production and collection of high volumes of genomic datasets. Predicting protein-coding genes from these copious genomic datasets is significant for the synthesis of protein and the understating of the regulatory function of the non-coding region. Methods have been developed to find protein-coding genes from the genome of organisms. Notwithstanding, the recent data explosion in genomics accentuates the need for more efficient algorithms for gene prediction. In this paper, we explore predictive analytics on genomic data. In particular, we present a scalable naïve Bayes-based algorithm that is deployed over a cluster of Apache Spark framework for efficient prediction of genes in the genome of eukaryotic organisms. Evaluation results on the human genome chromosome GRCh37 and GRCh38 show that effectiveness of our algorithm for predictive analytics on genomic data with high-performance computing. high sensitivity, specificity and accuracy.
Author Leung, Carson K.
Sarumi, Oluwafemi A.
Zhang, Christine Y.
Author_xml – sequence: 1
  givenname: Carson K.
  surname: Leung
  fullname: Leung, Carson K.
  email: kleung@cs.umanitoba.ca
  organization: University of Manitoba,Department of Computer Science,Winnipeg,MB,Canada
– sequence: 2
  givenname: Oluwafemi A.
  surname: Sarumi
  fullname: Sarumi, Oluwafemi A.
  organization: Federal University of Technology Akure (FUTA),Department of Computer Science,Akure,Ondo State,Nigeria
– sequence: 3
  givenname: Christine Y.
  surname: Zhang
  fullname: Zhang, Christine Y.
  organization: University of Manitoba,Department of Immunology,Winnipeg,MB,Canada
BookMark eNotz0FPwjAUwPGayEGRT2Bi-gU2-_q2te8IQ4EEIwc8k668QRPWkVE1fHsPcvrffsn_UdzHPrIQL6ByAEWvs9XsoyAqINdKq5wQNFl9JyZkLBhtodJQmgcx3wy8Dz6FH5bT6E7XFPxF9lEuOPZd8HLukpO_IR3lMhyO2YaHth86Fz3Luu_O3ynEw5MYte504cmtY_H1_ratl9n6c7Gqp-ssaIUp82gK9tr7prXIlfXEJThsCoVNuWdFaBygso2tsPLUki5NqzxVhVJIUOJYPP-7gZl35yF0brjubm_4BxU0SEo
ContentType Conference Proceeding
DBID 6IE
6IL
CBEJK
RIE
RIL
DOI 10.1109/BIBM49941.2020.9312982
DatabaseName IEEE Electronic Library (IEL) Conference Proceedings
IEEE Xplore POP ALL
IEEE Xplore All Conference Proceedings
IEEE Xplore
IEEE Proceedings Order Plans (POP All) 1998-Present
DatabaseTitleList
Database_xml – sequence: 1
  dbid: RIE
  name: IEEE Electronic Library (IEL)
  url: https://proxy.k.utb.cz/login?url=https://ieeexplore.ieee.org/
  sourceTypes: Publisher
DeliveryMethod fulltext_linktorsrc
EISBN 9781728162157
1728162157
EndPage 2194
ExternalDocumentID 9312982
Genre orig-research
GrantInformation_xml – fundername: Association of Commonwealth Universities
  funderid: 10.13039/501100000531
– fundername: Natural Sciences and Engineering Research Council of Canada
  funderid: 10.13039/501100000038
GroupedDBID 6IE
6IL
CBEJK
RIE
RIL
ID FETCH-LOGICAL-i203t-c374ec2ccbf83e68c9e51a3b403b5de0937a1308b8636c9f9257f0c9640039153
IEDL.DBID RIE
IngestDate Thu Jun 29 18:38:07 EDT 2023
IsPeerReviewed false
IsScholarly false
Language English
LinkModel DirectLink
MergedId FETCHMERGED-LOGICAL-i203t-c374ec2ccbf83e68c9e51a3b403b5de0937a1308b8636c9f9257f0c9640039153
PageCount 8
ParticipantIDs ieee_primary_9312982
PublicationCentury 2000
PublicationDate 2020-Dec.-16
PublicationDateYYYYMMDD 2020-12-16
PublicationDate_xml – month: 12
  year: 2020
  text: 2020-Dec.-16
  day: 16
PublicationDecade 2020
PublicationTitle 2020 IEEE International Conference on Bioinformatics and Biomedicine (BIBM)
PublicationTitleAbbrev BIBM
PublicationYear 2020
Publisher IEEE
Publisher_xml – name: IEEE
Score 1.8538567
Snippet Recent technological advancements and scientific discoveries have revolutionized the current era of genomics. Next-generation sequencing (NGS) technologies...
SourceID ieee
SourceType Publisher
StartPage 2187
SubjectTerms Apache Spark
big data
Bioinformatics
data mining
data science
gene prediction
Genomics
high performance computing
machine learning
Organisms
Prediction algorithms
Proteins
Sparks
Training
Title Predictive Analytics on Genomic Data with High-Performance Computing
URI https://ieeexplore.ieee.org/document/9312982
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
link http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV3LS8MwHA5zJ08qm_gmB4-mS5v3dc45hckODnYbSZqACJ2M9uJfb5LWDcWDt1BCmgfJl8f3fT8AbqnAPmeWIMu5RlTJEsnCUCSYEZ7KUhQ2sXxf-GxJn1ds1QN3Oy2Mcy6Rz1wWk-ktv9zYJl6VjRQJ6CTDgnsgJG-1Wp3oN8dqNH4az8P-ncZTX4GzLvOPqCkJNKZHYP79u5Yr8p41tcns5y8nxv_W5xgM9_I8uNgBzwnouWoAJottfHOJqxdMTiPRfxluKvjokvIYTnStYbx2hZHbgRZ7xQBsQzuEsoZgOX14vZ-hLkQCeiswqZElgjpbWGu8JI5LqxzLNTEUE8NKh8PmQweUkkZywq3yKsxQj63itLWGJ6egX20qdwYgFsZyIlg4YjBKdKl4SQ2TulCecer9ORjEHlh_tC4Y667xF39_vgSHcRQi8SPnV6Bfbxt3HeC7Njdp3L4ASDearw
linkProvider IEEE
linkToHtml http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV3NT8IwHG0IHvSkBozf9uDRjm79tWuviAgKhAMk3MjatYkxAUPGxb_edpsQjQdvzdJ0_Uj7-vHe-yF0Dyl1MTeMGCEyAkrmRCYaSMp16kDmaWJKlu9EDObwsuCLBnrYaWGstSX5zEYhWb7l52uzDVdlHcU8Okm_4B5wAOCVWquW_cZUdbrD7tjv4CGc-xIa1dl_xE0pYaN_jMbfP6zYIu_RttCR-fzlxfjfGp2g9l6gh6c76DlFDbtqod50E15dwvqFS6-R4MCM1yv8bEvtMe5lRYbDxSsO7A4y3WsGcBXcwZfVRvP-0-xxQOogCeQtoawghqVgTWKMdpJZIY2yPM6YBso0zy3124_M45TUUjBhlFN-jjpqlIDKHJ6doeZqvbLnCNNUG8FS7g8ZHFiWK5GD5jJLlOMCnLtArdADy4_KB2NZN_7y78936HAwG4-Wo-Hk9QodhREJNJBYXKNmsdnaGw_mhb4tx_ALlBSd_A
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Abook&rft.genre=proceeding&rft.title=2020+IEEE+International+Conference+on+Bioinformatics+and+Biomedicine+%28BIBM%29&rft.atitle=Predictive+Analytics+on+Genomic+Data+with+High-Performance+Computing&rft.au=Leung%2C+Carson+K.&rft.au=Sarumi%2C+Oluwafemi+A.&rft.au=Zhang%2C+Christine+Y.&rft.date=2020-12-16&rft.pub=IEEE&rft.spage=2187&rft.epage=2194&rft_id=info:doi/10.1109%2FBIBM49941.2020.9312982&rft.externalDocID=9312982