Parallel and distributed association rule mining in life science: A novel parallel algorithm to mine genomics data

•A parallel algorithm for Association rule mining.•Association rule mining of genomics data.•A dynamic workload balancing algorithm for FP-Growth. Association rule mining (ARM) is largely employed in several scientific areas and application domains, and many different algorithms for learning associa...

Full description

Saved in:
Bibliographic Details
Published inInformation sciences Vol. 575; pp. 747 - 761
Main Authors Agapito, Giuseppe, Guzzi, Pietro Hiram, Cannataro, Mario
Format Journal Article
LanguageEnglish
Published Elsevier Inc 01.10.2021
Subjects
Online AccessGet full text
ISSN0020-0255
1872-6291
DOI10.1016/j.ins.2018.07.055

Cover

More Information
Summary:•A parallel algorithm for Association rule mining.•Association rule mining of genomics data.•A dynamic workload balancing algorithm for FP-Growth. Association rule mining (ARM) is largely employed in several scientific areas and application domains, and many different algorithms for learning association rules from databases have been introduced. Despite the presence of many existing algorithms, there is still room for the introduction of novel approaches tailored for novel kinds of datasets. Because often the efficiency of such algorithms depends on the type of analyzed dataset. For instance, classical ARM algorithms present some drawbacks for biological datasets produced by microarray technologies in particular containing Single Nucleotide Polymorphisms (SNPs). In particular classical algorithms require large execution times also with small datasets. Therefore the possibility to improve the performance of such algorithms by leveraging parallel computing is a growing research area. The main contributions of this paper are: a comparison among different sequential, parallels and distributed ARM techniques, and the presentation of a novel ARM algorithm, named Balanced Parallel Association Rule Extractor from SNPs (BPARES), that employs parallel computing and a novel balancing strategy to improve response time. BPARES improves performance without loosing in accuracy as well as it handles more efficiently the available computational power and reduces the memory consumption.
ISSN:0020-0255
1872-6291
DOI:10.1016/j.ins.2018.07.055