A fast big data collection system using MapReduce framework

Social network like a corpus with valuable data, has attracted much attention from a various fields of researchers in recent years, especially in the subject of big data analytics. However, as the foundation, the part of efficient and accurate data collection has not been focused much in the past pu...

Full description

Saved in:
Bibliographic Details
Published inIEEE ... International Conference on Cloud Computing and Intelligence Systems pp. 530 - 535
Main Authors Li, Bing, Chan, Keith C.C.
Format Conference Proceeding
LanguageEnglish
Published IEEE 01.11.2014
Subjects
Online AccessGet full text
ISBN1479947202
9781479947201
ISSN2376-5933
DOI10.1109/CCIS.2014.7175793

Cover

More Information
Summary:Social network like a corpus with valuable data, has attracted much attention from a various fields of researchers in recent years, especially in the subject of big data analytics. However, as the foundation, the part of efficient and accurate data collection has not been focused much in the past published works. During the data among the web increasing rapidly, this article will identify two major challenges that traditional distributed based web crawler systems cannot adapt, which is fast handling the big data in social networks and suiting for multiple web sources with a uniformed collecting model. To deal with these two challenges thus to build a foundation of the big data analytics, this article will propose an Ontology based adapted web crawler system called OACM system, which uses MapReduce model to effectively balance the processing resources thus to fasten the processing speed of the collection procedure and designs a uniformed Ontology model to estimate the semantic content of both social networks and collecting tasks to adapt different web sources. During a set of experiments, the proposed OACM system could optimize the system resource scheduling efficiently and could achieve the task of collecting large amount of data from multiple web sources.
ISBN:1479947202
9781479947201
ISSN:2376-5933
DOI:10.1109/CCIS.2014.7175793