一种分布式Twitter数据处理方案及应用

针对社交媒体数据的特点及其分析的挑战性,提出了一种基于实时计算框架Storm、批处理框架Hadoop和高效可水平扩展的No SQL数据库Mongo DB的分布式社交媒体数据处理方案,并依此指导实现基于Twitter流式数据的流感疫情可视化分析系统。实验证明,该分布式方案能较好支持Twitter流式数据的高效处理和储存,使之满足系统的性能需求。...

Full description

Saved in:
Bibliographic Details
Published in计算机应用研究 Vol. 32; no. 7; pp. 2073 - 2077
Main Author 张振华 吴开超
Format Journal Article
LanguageChinese
Published 中国科学院计算机网络信息中心,北京100190%中国科学院计算机网络信息中心,北京,100190 2015
中国科学院大学,北京100049
Subjects
Online AccessGet full text
ISSN1001-3695
DOI10.3969/j.issn.1001-3695.2015.07.038

Cover

More Information
Summary:针对社交媒体数据的特点及其分析的挑战性,提出了一种基于实时计算框架Storm、批处理框架Hadoop和高效可水平扩展的No SQL数据库Mongo DB的分布式社交媒体数据处理方案,并依此指导实现基于Twitter流式数据的流感疫情可视化分析系统。实验证明,该分布式方案能较好支持Twitter流式数据的高效处理和储存,使之满足系统的性能需求。
Bibliography:51-1196/TP
According to social media data features and challenges of associated analysis, this paper proposed a distributed social media data processing method based on real-time computing framework Storm, batch processing framework Hadoop and high-performance scalable NoSQL database MongoDB, and implemented a visual analytical system for flu epidemic detection based on Twitter streaming data. It proved that, this distributed method can process and store Twitter steaming data with high performance, and make it meet system performance requirements.
Zhang Zhenhua, Wu Kaichao ( 1. University of Chinese Academy of Sciences, Beijing 100049, China ; 2. Computer Network Information Center, Chinese Academy of Sciences, Beijing 100190, China)
social media; distributed processing framework ; Twitter streaming data ; flu epidemic detection ; distributed computing
ISSN:1001-3695
DOI:10.3969/j.issn.1001-3695.2015.07.038