Predictive Analytics on Genomic Data with High-Performance Computing
Recent technological advancements and scientific discoveries have revolutionized the current era of genomics. Next-generation sequencing (NGS) technologies have led to tremendous reduction in the sequencing time and given rise to the production and collection of high volumes of genomic datasets. Pre...
Saved in:
| Published in | 2020 IEEE International Conference on Bioinformatics and Biomedicine (BIBM) pp. 2187 - 2194 |
|---|---|
| Main Authors | , , |
| Format | Conference Proceeding |
| Language | English |
| Published |
IEEE
16.12.2020
|
| Subjects | |
| Online Access | Get full text |
| DOI | 10.1109/BIBM49941.2020.9312982 |
Cover
| Summary: | Recent technological advancements and scientific discoveries have revolutionized the current era of genomics. Next-generation sequencing (NGS) technologies have led to tremendous reduction in the sequencing time and given rise to the production and collection of high volumes of genomic datasets. Predicting protein-coding genes from these copious genomic datasets is significant for the synthesis of protein and the understating of the regulatory function of the non-coding region. Methods have been developed to find protein-coding genes from the genome of organisms. Notwithstanding, the recent data explosion in genomics accentuates the need for more efficient algorithms for gene prediction. In this paper, we explore predictive analytics on genomic data. In particular, we present a scalable naïve Bayes-based algorithm that is deployed over a cluster of Apache Spark framework for efficient prediction of genes in the genome of eukaryotic organisms. Evaluation results on the human genome chromosome GRCh37 and GRCh38 show that effectiveness of our algorithm for predictive analytics on genomic data with high-performance computing. high sensitivity, specificity and accuracy. |
|---|---|
| DOI: | 10.1109/BIBM49941.2020.9312982 |