Trust-Based Scheduling Framework for Big Data Processing with MapReduce

Security and privacy have become a great concern in cloud computing platforms in which users risk the leakage of their private data. The leakage can happen while the data is at rest (in storage), in processing, or on moving within a cloud or between different cloud infrastructures, e.g., from privat...

Full description

Saved in:
Bibliographic Details
Published inIEEE transactions on services computing Vol. 15; no. 1; pp. 279 - 293
Main Authors Dang, Thanh Dat, Hoang, Doan, Nguyen, Diep N.
Format Journal Article
LanguageEnglish
Published Piscataway IEEE 01.01.2022
The Institute of Electrical and Electronics Engineers, Inc. (IEEE)
Subjects
Online AccessGet full text
ISSN1939-1374
2372-0204
DOI10.1109/TSC.2019.2938959

Cover

Abstract Security and privacy have become a great concern in cloud computing platforms in which users risk the leakage of their private data. The leakage can happen while the data is at rest (in storage), in processing, or on moving within a cloud or between different cloud infrastructures, e.g., from private to public clouds. This paper focuses on protecting data "in processing". For big data applications, the MapReduce framework has been proven as an efficient solution and has been widely deployed, e.g., in healthcare and business data analysis. In this article, we propose a trust-based framework for MapReduce in big data processing tasks. Specifically, we first quantify and propose to assign the sensitive values for data and trust values for map and reduce slots. We then compute the trust value of each resource employed in the big data processing tasks. Depending on the data's sensitivity level of a task, the task requires a given level of trust (i.e., higher sensitive data requires servers/slots with higher trust level). The MapReduce scheduling problem is then formulated as the maximum weighted matching problem of a bipartite graph that aims to maximize the total trust value over all possible assignments subject to various trust requirement of different tasks. The problem is known to be NP-hard. To tackle it, we observe that within a computing node (VM), slots share the same trust value granted from the secured transformation phase. This helps reduce the number of slot nodes of a weight bipartite graph. Leveraging this fact, we propose an efficient heuristic algorithm that achieves 94.7 percent of the optimal solution obtained via exhaustive search. Extensive simulations show that the trust-based scheduling scheme provides much higher protection for data sensitivity while ensuring good performance for big data applications.
AbstractList Security and privacy have become a great concern in cloud computing platforms in which users risk the leakage of their private data. The leakage can happen while the data is at rest (in storage), in processing, or on moving within a cloud or between different cloud infrastructures, e.g., from private to public clouds. This paper focuses on protecting data "in processing". For big data applications, the MapReduce framework has been proven as an efficient solution and has been widely deployed, e.g., in healthcare and business data analysis. In this article, we propose a trust-based framework for MapReduce in big data processing tasks. Specifically, we first quantify and propose to assign the sensitive values for data and trust values for map and reduce slots. We then compute the trust value of each resource employed in the big data processing tasks. Depending on the data's sensitivity level of a task, the task requires a given level of trust (i.e., higher sensitive data requires servers/slots with higher trust level). The MapReduce scheduling problem is then formulated as the maximum weighted matching problem of a bipartite graph that aims to maximize the total trust value over all possible assignments subject to various trust requirement of different tasks. The problem is known to be NP-hard. To tackle it, we observe that within a computing node (VM), slots share the same trust value granted from the secured transformation phase. This helps reduce the number of slot nodes of a weight bipartite graph. Leveraging this fact, we propose an efficient heuristic algorithm that achieves 94.7 percent of the optimal solution obtained via exhaustive search. Extensive simulations show that the trust-based scheduling scheme provides much higher protection for data sensitivity while ensuring good performance for big data applications.
Author Dang, Thanh Dat
Nguyen, Diep N.
Hoang, Doan
Author_xml – sequence: 1
  givenname: Thanh Dat
  orcidid: 0000-0002-1827-3731
  surname: Dang
  fullname: Dang, Thanh Dat
  email: datth22@gmail.com
  organization: School of Electrical and Data Engineering, University of Technology Sydney, Ultimo, NSW, Australia
– sequence: 2
  givenname: Doan
  orcidid: 0000-0003-1798-4926
  surname: Hoang
  fullname: Hoang, Doan
  email: doan.hoang@uts.edu.au
  organization: School of Electrical and Data Engineering, University of Technology Sydney, Ultimo, NSW, Australia
– sequence: 3
  givenname: Diep N.
  orcidid: 0000-0003-2659-8648
  surname: Nguyen
  fullname: Nguyen, Diep N.
  email: diep.nguyen@uts.edu.au
  organization: School of Electrical and Data Engineering, University of Technology Sydney, Ultimo, NSW, Australia
BookMark eNp9kEtPAjEURhuDiYDuTdxM4nqwj2mnXQoKmmA0guumlDtQhCm2JcR_7xCICxeu7uac7yang1q1rwGha4J7hGB1N50MehQT1aOKScXVGWpTVtIcU1y0UJsopnLCyuICdWJcYSyolKqNRtOwiynvmwjzbGKXMN-tXb3IhsFsYO_DZ1b5kPXdInswyWRvwVuI8UDsXVpmL2b73igWLtF5ZdYRrk63iz6Gj9PBUz5-HT0P7se5pYqknMu55FgAzETFgbMZtkwyw6zkFShZUaUKUEaUjFSl4pZiSWakkJhLIg0mrItuj7vb4L92EJNe-V2om5eaCsqZKAtKG0ocKRt8jAEqbV0yyfk6BePWmmB9iKabaPoQTZ-iNSL-I26D25jw_Z9yc1QcAPziUlKGmWA_svN3jA
CODEN ITSCAD
CitedBy_id crossref_primary_10_1016_j_comnet_2024_110628
crossref_primary_10_1109_ACCESS_2021_3129885
crossref_primary_10_3390_s24072098
crossref_primary_10_1142_S1793962321500100
crossref_primary_10_3390_app132312799
crossref_primary_10_1109_ACCESS_2024_3509218
crossref_primary_10_1109_TMC_2024_3406721
crossref_primary_10_3390_app14031319
crossref_primary_10_3390_electronics12051182
crossref_primary_10_3390_math13050730
Cites_doi 10.1109/TNSM.2014.041614.120394
10.1109/CloudTech.2017.8284736
10.1109/TPDS.2014.2358556
10.1109/SERVICES.2012.28
10.1109/TrustCom.2011.129
10.1109/GLOCOM.2015.7417577
10.1109/TrustCom.2011.18
10.1109/INFOCOM.2014.6848063
10.1109/TCC.2015.2469659
10.1109/Trustcom/BigDataSE/ICESS.2017.281
10.1109/SP.2015.10
10.1016/j.cose.2016.06.003
10.1109/CCGrid.2012.77
10.1109/CCGrid.2014.96
10.1109/CLUSTER.2015.93
10.1109/INFCOM.2011.5935152
10.1109/BigData.2015.7363785
10.1109/TrustCom.2014.39
10.1109/TCC.2015.2474403
10.1109/TCC.2014.2379096
10.1109/ACCESS.2016.2558446
10.1145/1327452.1327492
10.1109/BigData.2015.7363748
10.1109/SYNASC.2015.59
10.1109/CCGrid.2014.39
10.1109/TSG.2016.2548565
ContentType Journal Article
Copyright Copyright The Institute of Electrical and Electronics Engineers, Inc. (IEEE) 2022
Copyright_xml – notice: Copyright The Institute of Electrical and Electronics Engineers, Inc. (IEEE) 2022
DBID 97E
RIA
RIE
AAYXX
CITATION
7SC
8FD
JQ2
L7M
L~C
L~D
DOI 10.1109/TSC.2019.2938959
DatabaseName IEEE Xplore (IEEE)
IEEE All-Society Periodicals Package (ASPP) 1998–Present
IEEE Electronic Library (IEL)
CrossRef
Computer and Information Systems Abstracts
Technology Research Database
ProQuest Computer Science Collection
Advanced Technologies Database with Aerospace
Computer and Information Systems Abstracts – Academic
Computer and Information Systems Abstracts Professional
DatabaseTitle CrossRef
Computer and Information Systems Abstracts
Technology Research Database
Computer and Information Systems Abstracts – Academic
Advanced Technologies Database with Aerospace
ProQuest Computer Science Collection
Computer and Information Systems Abstracts Professional
DatabaseTitleList
Computer and Information Systems Abstracts
Database_xml – sequence: 1
  dbid: RIE
  name: IEEE Electronic Library (IEL)
  url: https://proxy.k.utb.cz/login?url=https://ieeexplore.ieee.org/
  sourceTypes: Publisher
DeliveryMethod fulltext_linktorsrc
Discipline Engineering
EISSN 2372-0204
EndPage 293
ExternalDocumentID 10_1109_TSC_2019_2938959
8823036
Genre orig-research
GroupedDBID 0R~
29I
4.4
5VS
6IK
97E
AAJGR
AARMG
AASAJ
AAWTH
ABAZT
ABJNI
ABQJQ
ABVLG
ACGFO
ACIWK
AENEX
AETIX
AGQYO
AGSQL
AHBIQ
AKJIK
AKQYR
ALMA_UNASSIGNED_HOLDINGS
ATWAV
BEFXN
BFFAM
BGNUA
BKEBE
BPEOZ
EBS
EJD
HZ~
IEDLZ
IFIPE
IPLJI
JAVBF
M43
O9-
OCL
P2P
PQQKQ
RIA
RIE
RNI
RNS
RZB
AAYXX
CITATION
7SC
8FD
JQ2
L7M
L~C
L~D
ID FETCH-LOGICAL-c291t-58d8506eeb6f5e53b0c383a3c85fe98f2994e9a6731f795c2081b14805818a013
IEDL.DBID RIE
ISSN 1939-1374
IngestDate Sun Jun 29 16:57:10 EDT 2025
Thu Apr 24 23:01:47 EDT 2025
Wed Oct 01 01:39:49 EDT 2025
Wed Aug 27 02:23:57 EDT 2025
IsPeerReviewed true
IsScholarly true
Issue 1
Language English
License https://ieeexplore.ieee.org/Xplorehelp/downloads/license-information/IEEE.html
https://doi.org/10.15223/policy-029
https://doi.org/10.15223/policy-037
LinkModel DirectLink
MergedId FETCHMERGED-LOGICAL-c291t-58d8506eeb6f5e53b0c383a3c85fe98f2994e9a6731f795c2081b14805818a013
Notes ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 14
ORCID 0000-0003-1798-4926
0000-0002-1827-3731
0000-0003-2659-8648
PQID 2625367422
PQPubID 85503
PageCount 15
ParticipantIDs proquest_journals_2625367422
crossref_citationtrail_10_1109_TSC_2019_2938959
crossref_primary_10_1109_TSC_2019_2938959
ieee_primary_8823036
ProviderPackageCode CITATION
AAYXX
PublicationCentury 2000
PublicationDate 2022-Jan.-Feb.-1
2022-1-1
20220101
PublicationDateYYYYMMDD 2022-01-01
PublicationDate_xml – month: 01
  year: 2022
  text: 2022-Jan.-Feb.-1
PublicationDecade 2020
PublicationPlace Piscataway
PublicationPlace_xml – name: Piscataway
PublicationTitle IEEE transactions on services computing
PublicationTitleAbbrev TSC
PublicationYear 2022
Publisher IEEE
The Institute of Electrical and Electronics Engineers, Inc. (IEEE)
Publisher_xml – name: IEEE
– name: The Institute of Electrical and Electronics Engineers, Inc. (IEEE)
References ref13
ref12
Dinh (ref8)
ref15
ref14
ref31
ref30
ref33
ref10
(ref7) 2017
ref2
ref1
ref17
ref16
ref19
ref18
Schubert (ref11) 2012
ref24
ref23
ref26
ref25
ref20
ref22
ref21
ref28
ref27
Hoang (ref29)
ref9
ref4
ref3
ref6
Verge (ref5) 2014
References_xml – ident: ref12
  doi: 10.1109/TNSM.2014.041614.120394
– volume-title: Proc. 19th Pacific Asia Conf. Inf. Syst.
  ident: ref29
  article-title: Health data in cloud environments
– ident: ref14
  doi: 10.1109/CloudTech.2017.8284736
– ident: ref26
  doi: 10.1109/TPDS.2014.2358556
– volume-title: Hadoop: Open Source Implementation of MapReduce
  year: 2017
  ident: ref7
– ident: ref19
  doi: 10.1109/SERVICES.2012.28
– ident: ref30
  doi: 10.1109/TrustCom.2011.129
– ident: ref1
  doi: 10.1109/GLOCOM.2015.7417577
– start-page: 447
  volume-title: Proc. 24th USENIX Conf. Security Symp.
  ident: ref8
  article-title: M2R: Enabling stronger privacy in mapreduce computation
– ident: ref15
  doi: 10.1109/TrustCom.2011.18
– ident: ref23
  doi: 10.1109/INFOCOM.2014.6848063
– ident: ref33
  doi: 10.1109/TCC.2015.2469659
– ident: ref20
  doi: 10.1109/Trustcom/BigDataSE/ICESS.2017.281
– year: 2014
  ident: ref5
  article-title: iCloud hack leaks hundreds of nude celebrity photos
– ident: ref9
  doi: 10.1109/SP.2015.10
– ident: ref16
  doi: 10.1016/j.cose.2016.06.003
– ident: ref18
  doi: 10.1109/CCGrid.2012.77
– ident: ref10
  doi: 10.1109/CCGrid.2014.96
– ident: ref22
  doi: 10.1109/CLUSTER.2015.93
– ident: ref24
  doi: 10.1109/INFCOM.2011.5935152
– ident: ref17
  doi: 10.1109/BigData.2015.7363785
– ident: ref28
  doi: 10.1109/TrustCom.2014.39
– year: 2012
  ident: ref11
  article-title: Advances in clouds
– ident: ref21
  doi: 10.1109/TCC.2015.2474403
– ident: ref25
  doi: 10.1109/TCC.2014.2379096
– ident: ref4
  doi: 10.1109/ACCESS.2016.2558446
– ident: ref6
  doi: 10.1145/1327452.1327492
– ident: ref27
  doi: 10.1109/BigData.2015.7363785
– ident: ref2
  doi: 10.1109/BigData.2015.7363748
– ident: ref13
  doi: 10.1109/SYNASC.2015.59
– ident: ref31
  doi: 10.1109/CCGrid.2014.39
– ident: ref3
  doi: 10.1109/TSG.2016.2548565
SSID ssj0062889
Score 2.354554
Snippet Security and privacy have become a great concern in cloud computing platforms in which users risk the leakage of their private data. The leakage can happen...
SourceID proquest
crossref
ieee
SourceType Aggregation Database
Enrichment Source
Index Database
Publisher
StartPage 279
SubjectTerms Algorithms
Big Data
Big Data applications
big data security
Cloud computing
Data analysis
Data privacy
Data processing
data sensitive
Graph theory
Heuristic algorithms
Heuristic methods
Leakage
MapReduce
Measurement
Privacy
Processor scheduling
Scheduling
Security
Sensitivity
Task analysis
Trust-aware framework
trust-based scheduling
Trusted computing
Trustworthiness
Title Trust-Based Scheduling Framework for Big Data Processing with MapReduce
URI https://ieeexplore.ieee.org/document/8823036
https://www.proquest.com/docview/2625367422
Volume 15
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
journalDatabaseRights – providerCode: PRVIEE
  databaseName: IEEE Electronic Library (IEL)
  customDbUrl:
  eissn: 2372-0204
  dateEnd: 99991231
  omitProxy: false
  ssIdentifier: ssj0062889
  issn: 1939-1374
  databaseCode: RIE
  dateStart: 20080101
  isFulltext: true
  titleUrlDefault: https://ieeexplore.ieee.org/
  providerName: IEEE
link http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV1LT8MwDLa2neDAayAGA-XABYl2XfpKjmwwJqRxYJu0W9Um6YRA2wTdhV-P3cc0AULcenCkyE5jf479GeAqTBEk825q8UQoy9NcWjKRXYsHQqjQCG10zvb5FAyn3uPMn9XgZtMLY4zJi8-MTZ_5W75eqjWlyjqCXoXcoA71UARFr1Z169LUXFk9QzqyMxn3qW5L2ujOhCQm0i23k89R-XH55h5lsA-jai9FIcmrvc4SW31-o2n872YPYK8MLdltcRYOoWYWR7C7RTjYhIcJtVhYPXRdmo3RXpoK0edsUJVoMYxhWe9lzu7iLGZlFwFJUL6WjeLVM1G9mmOYDu4n_aFVjlKwFJfdzPKFJmo6Y5Ig9Y3vJo5CaBq7SlCxmUjRKXlGxkHodtNQ-opjpJAgUnJ8dOgxhokn0FgsF-YUmHGU0ISS0I6eRoClXcOlTzx5bhKmvAWdStORKnnGadzFW5TjDUdGaJuIbBOVtmnB9WbFquDY-EO2SareyJVabkG7MmZU_oQfEUds5waI_fnZ76vOYYdTN0OeUWlDI3tfmwuMMbLkMj9cX5tIzRo
linkProvider IEEE
linkToHtml http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV1NT8MwDLVgHIAD34jBgBy4INGtS5s2ObLBGLDtAEPiVrVJihBoTNBd-PXYXTtNgBC3HhwpstPYz7GfAU7CFEEyb6YOT6R2fMOVoxLVdHggpQ6tNNbkbJ-DoPvg3zyKxwU4m_XCWGvz4jNbp8_8Ld-86QmlyhqSXoW8YBGWhO_7YtqtVd67NDdXlQ-RrmoM79tUuaXq6NCkIi7SOceTT1L5cf3mPqWzDv1yN9NSkpf6JEvq-vMbUeN_t7sBa0Vwyc6np2ETFuxoC1bnKAe34WpITRZOC52XYfdoMUOl6E-sUxZpMYxiWev5iV3EWcyKPgKSoIwt68fjOyJ7tTvw0LkctrtOMUzB0Vw1M0dIQ-R01iZBKqzwElcjOI09LancTKbolnyr4iD0mmmohOYYKySIlVyBLj3GQHEXKqO3kd0DZl0tDeEktKRvEGIZz3IliCnPS8KUV6FRajrSBdM4Dbx4jXLE4aoIbRORbaLCNlU4na0YT1k2_pDdJlXP5AotV6FWGjMqfsOPiCO68wJE_3z_91XHsNwd9ntR73pwewArnHob8vxKDSrZ-8QeYsSRJUf5QfsChMvQZw
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Trust-Based+Scheduling+Framework+for+Big+Data+Processing+with+MapReduce&rft.jtitle=IEEE+transactions+on+services+computing&rft.au=Dang%2C+Thanh+Dat&rft.au=Hoang%2C+Doan&rft.au=Nguyen%2C+Diep+N.&rft.date=2022-01-01&rft.issn=1939-1374&rft.eissn=2372-0204&rft.volume=15&rft.issue=1&rft.spage=279&rft.epage=293&rft_id=info:doi/10.1109%2FTSC.2019.2938959&rft.externalDBID=n%2Fa&rft.externalDocID=10_1109_TSC_2019_2938959
thumbnail_l http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=1939-1374&client=summon
thumbnail_m http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=1939-1374&client=summon
thumbnail_s http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=1939-1374&client=summon