Decoupling computation and data scheduling in distributed data-intensive applications

In high-energy physics, bioinformatics, and other disciplines, we encounter applications involving numerous, loosely coupled jobs that both access and generate large data sets. So-called Data Grids seek to harness geographically distributed resources for such large-scale data-intensive problems. Yet...

Full description

Saved in:
Bibliographic Details
Published in11th International Symposium on High-Performance Distributed Computing (HPDC-11 2002) pp. 352 - 358
Main Authors Ranganathan, K., Foster, I.
Format Conference Proceeding
LanguageEnglish
Published IEEE 2002
Subjects
Online AccessGet full text
ISBN0769516866
9780769516868
ISSN1082-8907
DOI10.1109/HPDC.2002.1029935

Cover

Abstract In high-energy physics, bioinformatics, and other disciplines, we encounter applications involving numerous, loosely coupled jobs that both access and generate large data sets. So-called Data Grids seek to harness geographically distributed resources for such large-scale data-intensive problems. Yet effective scheduling in such environments is challenging, due to a need to address a variety of metrics and constraints while dealing with multiple, potentially independent sources of jobs and a large number of storage, compute, and network resources. We describe a scheduling framework that addresses these problems. Within this framework, data movement operations may be either tightly bound to job scheduling decisions or, alternatively, performed by a decoupled, asynchronous process on the basis of observed data access patterns and load. We develop a family of algorithms and use simulation studies to evaluate various combinations. Our results suggest that while it is necessary to consider the impact of replication, it is not always necessary to couple data movement and computation scheduling. Instead, these two activities can be addressed separately, thus significantly simplifying the design and implementation.
AbstractList In high-energy physics, bioinformatics, and other disciplines, we encounter applications involving numerous, loosely coupled jobs that both access and generate large data sets. So-called Data Grids seek to harness geographically distributed resources for such large-scale data-intensive problems. Yet effective scheduling in such environments is challenging, due to a need to address a variety of metrics and constraints while dealing with multiple, potentially independent sources of jobs and a large number of storage, compute, and network resources. We describe a scheduling framework that addresses these problems. Within this framework, data movement operations may be either tightly bound to job scheduling decisions or, alternatively, performed by a decoupled, asynchronous process on the basis of observed data access patterns and load. We develop a family of algorithms and use simulation studies to evaluate various combinations. Our results suggest that while it is necessary to consider the impact of replication, it is not always necessary to couple data movement and computation scheduling. Instead, these two activities can be addressed separately, thus significantly simplifying the design and implementation.
In high energy physics, bioinformatics, and other disciplines, we encounter applications involving numerous, loosely coupled jobs that both access and generate large data sets. So-called Data Grids seek to harness geographically distributed resources for such large-scale data-intensive problems. Yet effective scheduling in such environments is challenging, due toa need to address a variety of metrics and constraints (e.g., resource utilization, response time, global and local allocation policies) while dealing with multiple, potentially independent sources of jobs and a large number of storage, compute, and network resources.We describe a scheduling framework that addresses these problems. Within this framework, data movement operations may be either tightly bound to job scheduling decisions or, alternatively, performed by a decoupled, asynchronous process on the basis of observed data access patterns and load. We develop a family of job scheduling and data movement(replication) algorithms and use simulation studies to evaluate various combinations. Our results suggest that while it is necessary to consider the impact of replication on the scheduling strategy, it is not always necessary to couple data movement and computationscheduling. Instead, these two activities can be addressed separately, thus significantly simplifying the design and implementation of the overall Data Grid system.
Author Ranganathan, K.
Foster, I.
Author_xml – sequence: 1
  givenname: K.
  surname: Ranganathan
  fullname: Ranganathan, K.
  organization: Dept. of Comput. Sci., Chicago Univ., IL, USA
– sequence: 2
  givenname: I.
  surname: Foster
  fullname: Foster, I.
  organization: Dept. of Comput. Sci., Chicago Univ., IL, USA
BookMark eNotULtOwzAUtUSRaEs_ALFkYku5fsS1R9RSilQJhjJHjn0DRqkTYgeJvydqe5YjnddwZmQS2oCE3FFYUgr6cfe-WS8ZAFtSYFrz4orMYCV1QaWSckKmFBTLlYbVDVnE-A0jRCE0wJR8bNC2Q9f48JnZ9tgNySTfhswElzmTTBbtF7rh5PuQOR9T76sh4dnOfUgYov_FzHTjij214y25rk0TcXHhOTlsnw_rXb5_e3ldP-1zz0CmvFaVsxWnwkoUtdNc8AqEgFGSWkmmQVVMFOictTWAoSsrpIRKoakLWfE5eTjPdn37M2BM5dFHi01jArZDLDmlVKuCjcH7c9AjYtn1_mj6v_LyFv8HW-hhOQ
ContentType Conference Proceeding
DBID 6IE
6IL
CBEJK
RIE
RIL
7SC
8FD
JQ2
L7M
L~C
L~D
DOI 10.1109/HPDC.2002.1029935
DatabaseName IEEE Electronic Library (IEL) Conference Proceedings
IEEE Xplore POP ALL
IEEE Xplore All Conference Proceedings
IEEE/IET Electronic Library (IEL)
IEEE Proceedings Order Plans (POP All) 1998-Present
Computer and Information Systems Abstracts
Technology Research Database
ProQuest Computer Science Collection
Advanced Technologies Database with Aerospace
Computer and Information Systems Abstracts – Academic
Computer and Information Systems Abstracts Professional
DatabaseTitle Computer and Information Systems Abstracts
Technology Research Database
Computer and Information Systems Abstracts – Academic
Advanced Technologies Database with Aerospace
ProQuest Computer Science Collection
Computer and Information Systems Abstracts Professional
DatabaseTitleList
Computer and Information Systems Abstracts
Database_xml – sequence: 1
  dbid: RIE
  name: IEEE/IET Electronic Library (IEL)
  url: https://proxy.k.utb.cz/login?url=https://ieeexplore.ieee.org/
  sourceTypes: Publisher
DeliveryMethod fulltext_linktorsrc
Discipline Computer Science
EndPage 358
ExternalDocumentID 1029935
Genre Conference Paper
GroupedDBID 29P
6IE
6IF
6IK
6IL
6IN
AAJGR
AAWTH
ACGFS
ADZIZ
ALMA_UNASSIGNED_HOLDINGS
BEFXN
BFFAM
BGNUA
BKEBE
BPEOZ
CBEJK
CHZPO
IPLJI
M43
OCL
RIE
RIL
RNS
7SC
8FD
AAVQY
JQ2
L7M
L~C
L~D
RIB
RIC
ID FETCH-LOGICAL-i206t-f8bdcb314c6e4fd9343b0440b3169862908b245eddccf00a17c4660b8eaf56b3
IEDL.DBID RIE
ISBN 0769516866
9780769516868
ISSN 1082-8907
IngestDate Fri Jul 11 06:07:41 EDT 2025
Tue Aug 26 17:58:06 EDT 2025
IsPeerReviewed false
IsScholarly true
Language English
LinkModel DirectLink
MergedId FETCHMERGED-LOGICAL-i206t-f8bdcb314c6e4fd9343b0440b3169862908b245eddccf00a17c4660b8eaf56b3
Notes SourceType-Conference Papers & Proceedings-1
ObjectType-Conference Paper-1
content type line 25
PQID 31119852
PQPubID 23500
PageCount 7
ParticipantIDs proquest_miscellaneous_31119852
ieee_primary_1029935
PublicationCentury 2000
PublicationDate 20020000
20020724
PublicationDateYYYYMMDD 2002-01-01
2002-07-24
PublicationDate_xml – year: 2002
  text: 20020000
PublicationDecade 2000
PublicationTitle 11th International Symposium on High-Performance Distributed Computing (HPDC-11 2002)
PublicationTitleAbbrev HPDC
PublicationYear 2002
Publisher IEEE
Publisher_xml – name: IEEE
SSID ssj0000454900
ssj0020127
Score 2.0357275
Snippet In high-energy physics, bioinformatics, and other disciplines, we encounter applications involving numerous, loosely coupled jobs that both access and generate...
In high energy physics, bioinformatics, and other disciplines, we encounter applications involving numerous, loosely coupled jobs that both access and generate...
SourceID proquest
ieee
SourceType Aggregation Database
Publisher
StartPage 352
SubjectTerms Application software
Bioinformatics
Computer science
Distributed computing
Laboratories
Large-scale systems
Physics computing
Processor scheduling
Resource management
Scheduling algorithm
Title Decoupling computation and data scheduling in distributed data-intensive applications
URI https://ieeexplore.ieee.org/document/1029935
https://www.proquest.com/docview/31119852
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
link http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV09T8MwELXaTkwFWkT59MCISxI7rj1TqgqpqEMrdYvss4MqpBRBsvDrsZ2kIGBgi2NFie3cnZ_v3R1CN6DTXFpBCSjDCTMxEGUsI4mSUluau4YPTl488fmaPW7STQfd7mNhrLWBfGbH_jL48s0OKn9U5iTcKU-adlF3Ingdq7U_T_Gp5KSnQzZgy7tUa3K9k3iHAGvI7vYTXHDeZN5p26Jxd8aRvJsvp_eBuTBu3taUXfmlq4MBmvXRov30mnfyMq5KPYaPH1kd_zu2QzT8CvXDy70RO0IdWxyjflvrATeiP0DrqcOplQ_ffcYQesOSYlUY7Fmm2KFkZ7VC_7bAxufj9aW0bN1Nti1VHn93mQ_Ravawup-TpiQD2SYRL0kutAFNYwbcstxIyqj2RavdLS4dOJKR0AlLrTEAeRSpeAKM80gLq_KUa3qCesWusKcIc0MdVALBjcwZRFwCV1KAjCWAyRM1QgM_SdlrnXQja-ZnhK7bZcicIHjvhirsrnrPqNPaUqTJ2d8PnqODUMglnJ5coF75VtlLt58o9VX4kT4Bt53GXg
linkProvider IEEE
linkToHtml http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV07T8MwED7xGGDi0SLe9cBIShI7xp55qECLGFqpW2SfHVQhpQiahV-P7SQFAQNbHCtKbOfu_Pm-uwM4Q50V0goaoTI8YibBSBnLolRJqS0tXMMHJ48e-WDC7qfZdAXOl7Ew1tpAPrN9fxl8-WaOlT8qcxLulCfNVmE9Y4xldbTW8kTFJ5OTnhDZwC3vVK3p9U7mHQasQbvbUXDBeZN7p22LxuGZxPJi8HR9FbgL_eZ9TeGVX9o6mKDbLRi1H18zT1761UL38eNHXsf_jm4bul_BfuRpacZ2YMWWu7DVVnsgjfB3YHLtkGrlA3ifCYbesKhElYZ4nilxONnZrdA_K4nxGXl9MS1bd0ezlixPvjvNuzC-vRlfDaKmKEM0S2O-iAqhDWqaMOSWFUZSRrUvW-1ucengkYyFTllmjUEs4lgll8g4j7Wwqsi4pnuwVs5Luw-EG-rAEgpuZMEw5hK5kgJlIhFNkaoD6PhJyl_rtBt5Mz8H0GuXIXei4P0bqrTz6j2nTm9LkaWHfz_Yg43BeDTMh3ePD0ewGcq6hLOUY1hbvFX2xO0uFvo0_FSfS-bJqw
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Abook&rft.genre=proceeding&rft.title=Proceedings+11th+IEEE+International+Symposium+on+High+Performance+Distributed+Computing&rft.atitle=Decoupling+computation+and+data+scheduling+in+distributed+data-intensive+applications&rft.au=Ranganathan%2C+K.&rft.au=Foster%2C+I.&rft.date=2002-01-01&rft.pub=IEEE&rft.isbn=9780769516868&rft.issn=1082-8907&rft.spage=352&rft.epage=358&rft_id=info:doi/10.1109%2FHPDC.2002.1029935&rft.externalDocID=1029935
thumbnail_l http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=1082-8907&client=summon
thumbnail_m http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=1082-8907&client=summon
thumbnail_s http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=1082-8907&client=summon