EPMA: Efficient pattern matching algorithm for DNA sequences

•We present a brief introduction to the applications of pattern matching.•We present a novel pattern matching algorithm for DNA sequences.•We present multithreading in pattern matching.•We use Turing machine for pattern matching.•We present comparative results with significance improvements. To solv...

Full description

Saved in:
Bibliographic Details
Published inExpert systems with applications Vol. 80; pp. 162 - 170
Main Authors Tahir, Muhammad, Sardaraz, Muhammad, Ikram, Ataul Aziz
Format Journal Article
LanguageEnglish
Published New York Elsevier Ltd 01.09.2017
Elsevier BV
Subjects
Online AccessGet full text
ISSN0957-4174
1873-6793
DOI10.1016/j.eswa.2017.03.026

Cover

More Information
Summary:•We present a brief introduction to the applications of pattern matching.•We present a novel pattern matching algorithm for DNA sequences.•We present multithreading in pattern matching.•We use Turing machine for pattern matching.•We present comparative results with significance improvements. To solve, manage and analyze biological problems using computer technology is called bioinformatics. With the emergent evolution in computing era, the volume of biological data has increased significantly. These large amounts of data have increased the need to analyze it in reasonable space and time. DNA sequences contain basic information of species, and pattern matching between different species is an important and challenging issue to cope with. There exist generalized string matching and some specialized DNA pattern matching algorithms in the literature. There is still need to develop fast and space efficient pattern matching algorithms that consider new hardware development. In this paper, we present a novel DNA sequences pattern matching algorithm called EPMA. The proposed algorithm utilizes fixed length 2-bits binary encoding, segmentation and multi-threading. The idea is to find the pattern with multiple searcher agents concurrently. The proposed algorithm is validated with comparative experimental results. The results show that the new algorithm is a good candidate for DNA sequence pattern matching applications. The algorithm effectively utilizes modern hardware and will help researchers in the sequence alignment, short read error correction, phylogenetic inference etc. Furthermore, the proposed method can be extended to generalized string matching and their applications.
Bibliography:ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 14
ISSN:0957-4174
1873-6793
DOI:10.1016/j.eswa.2017.03.026