Association rule mining algorithm based on Spark for pesticide transaction data analyses

With the development of smart agriculture, the accumulation of data in the field of pesticide regulation has a certain scale. The pesticide transaction data collected by the Pesticide National Data Center alone produces more than 10 million records daily. However, due to the backward technical means...

Full description

Saved in:

Bibliographic Details
Published in	International journal of agricultural and biological engineering Vol. 12; no. 5; pp. 162 - 166
Main Authors	Bai, Xiaoning, Jia, Jingdun, Wei, Qiwen, Huang, Shuaiqi, Du, Weicheng, Gao, Wanlin
Format	Journal Article
Language	English
Published	Beijing International Journal of Agricultural and Biological Engineering (IJABE) 01.09.2019
Subjects	Agricultural production Agriculture Agrochemicals Algorithms Artificial intelligence Big Data Clustering Computer centers Computer engineering Data centers Data mining Digital agriculture Distributed memory Fault tolerance Machine learning Pesticides Supervision China
Online Access	Get full text
ISSN	1934-6344 1934-6352 1934-6352
DOI	10.25165/j.ijabe.20191205.4881

Cover

More Information
Summary:	With the development of smart agriculture, the accumulation of data in the field of pesticide regulation has a certain scale. The pesticide transaction data collected by the Pesticide National Data Center alone produces more than 10 million records daily. However, due to the backward technical means, the existing pesticide supervision data lack deep mining and usage. The Apriori algorithm is one of the classic algorithms in association rule mining, but it needs to traverse the transaction database multiple times, which will cause an extra IO burden. Spark is an emerging big data parallel computing framework with advantages such as memory computing and flexible distributed data sets. Compared with the Hadoop MapReduce computing framework, IO performance was greatly improved. Therefore, this paper proposed an improved Apriori algorithm based on Spark framework, ICAMA. The MapReduce process was used to support the candidate set and then to generate the candidate set. After experimental comparison, when the data volume exceeds 250 Mb, the performance of Spark-based Apriori algorithm was 20% higher than that of the traditional Hadoop-based Apriori algorithm, and with the increase of data volume, the performance improvement was more obvious.
Bibliography:	ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14
ISSN:	1934-6344 1934-6352 1934-6352
DOI:	10.25165/j.ijabe.20191205.4881