Trust-Based Scheduling Framework for Big Data Processing with MapReduce

Security and privacy have become a great concern in cloud computing platforms in which users risk the leakage of their private data. The leakage can happen while the data is at rest (in storage), in processing, or on moving within a cloud or between different cloud infrastructures, e.g., from privat...

Full description

Saved in:

Bibliographic Details
Published in	IEEE transactions on services computing Vol. 15; no. 1; pp. 279 - 293
Main Authors	Dang, Thanh Dat, Hoang, Doan, Nguyen, Diep N.
Format	Journal Article
Language	English
Published	Piscataway IEEE 01.01.2022 The Institute of Electrical and Electronics Engineers, Inc. (IEEE)
Subjects	Algorithms Big Data Big Data applications big data security Cloud computing Data analysis Data privacy Data processing data sensitive Graph theory Heuristic algorithms Heuristic methods Leakage MapReduce Measurement Privacy Processor scheduling Scheduling Security Sensitivity Task analysis Trust-aware framework trust-based scheduling Trusted computing Trustworthiness
Online Access	Get full text
ISSN	1939-1374 2372-0204
DOI	10.1109/TSC.2019.2938959

Cover

More Information
Summary:	Security and privacy have become a great concern in cloud computing platforms in which users risk the leakage of their private data. The leakage can happen while the data is at rest (in storage), in processing, or on moving within a cloud or between different cloud infrastructures, e.g., from private to public clouds. This paper focuses on protecting data "in processing". For big data applications, the MapReduce framework has been proven as an efficient solution and has been widely deployed, e.g., in healthcare and business data analysis. In this article, we propose a trust-based framework for MapReduce in big data processing tasks. Specifically, we first quantify and propose to assign the sensitive values for data and trust values for map and reduce slots. We then compute the trust value of each resource employed in the big data processing tasks. Depending on the data's sensitivity level of a task, the task requires a given level of trust (i.e., higher sensitive data requires servers/slots with higher trust level). The MapReduce scheduling problem is then formulated as the maximum weighted matching problem of a bipartite graph that aims to maximize the total trust value over all possible assignments subject to various trust requirement of different tasks. The problem is known to be NP-hard. To tackle it, we observe that within a computing node (VM), slots share the same trust value granted from the secured transformation phase. This helps reduce the number of slot nodes of a weight bipartite graph. Leveraging this fact, we propose an efficient heuristic algorithm that achieves 94.7 percent of the optimal solution obtained via exhaustive search. Extensive simulations show that the trust-based scheduling scheme provides much higher protection for data sensitivity while ensuring good performance for big data applications.
Bibliography:	ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14
ISSN:	1939-1374 2372-0204
DOI:	10.1109/TSC.2019.2938959