Decoupling computation and data scheduling in distributed data-intensive applications
In high-energy physics, bioinformatics, and other disciplines, we encounter applications involving numerous, loosely coupled jobs that both access and generate large data sets. So-called Data Grids seek to harness geographically distributed resources for such large-scale data-intensive problems. Yet...
Saved in:
| Published in | 11th International Symposium on High-Performance Distributed Computing (HPDC-11 2002) pp. 352 - 358 |
|---|---|
| Main Authors | , |
| Format | Conference Proceeding |
| Language | English |
| Published |
IEEE
2002
|
| Subjects | |
| Online Access | Get full text |
| ISBN | 0769516866 9780769516868 |
| ISSN | 1082-8907 |
| DOI | 10.1109/HPDC.2002.1029935 |
Cover
| Abstract | In high-energy physics, bioinformatics, and other disciplines, we encounter applications involving numerous, loosely coupled jobs that both access and generate large data sets. So-called Data Grids seek to harness geographically distributed resources for such large-scale data-intensive problems. Yet effective scheduling in such environments is challenging, due to a need to address a variety of metrics and constraints while dealing with multiple, potentially independent sources of jobs and a large number of storage, compute, and network resources. We describe a scheduling framework that addresses these problems. Within this framework, data movement operations may be either tightly bound to job scheduling decisions or, alternatively, performed by a decoupled, asynchronous process on the basis of observed data access patterns and load. We develop a family of algorithms and use simulation studies to evaluate various combinations. Our results suggest that while it is necessary to consider the impact of replication, it is not always necessary to couple data movement and computation scheduling. Instead, these two activities can be addressed separately, thus significantly simplifying the design and implementation. |
|---|---|
| AbstractList | In high-energy physics, bioinformatics, and other disciplines, we encounter applications involving numerous, loosely coupled jobs that both access and generate large data sets. So-called Data Grids seek to harness geographically distributed resources for such large-scale data-intensive problems. Yet effective scheduling in such environments is challenging, due to a need to address a variety of metrics and constraints while dealing with multiple, potentially independent sources of jobs and a large number of storage, compute, and network resources. We describe a scheduling framework that addresses these problems. Within this framework, data movement operations may be either tightly bound to job scheduling decisions or, alternatively, performed by a decoupled, asynchronous process on the basis of observed data access patterns and load. We develop a family of algorithms and use simulation studies to evaluate various combinations. Our results suggest that while it is necessary to consider the impact of replication, it is not always necessary to couple data movement and computation scheduling. Instead, these two activities can be addressed separately, thus significantly simplifying the design and implementation. In high energy physics, bioinformatics, and other disciplines, we encounter applications involving numerous, loosely coupled jobs that both access and generate large data sets. So-called Data Grids seek to harness geographically distributed resources for such large-scale data-intensive problems. Yet effective scheduling in such environments is challenging, due toa need to address a variety of metrics and constraints (e.g., resource utilization, response time, global and local allocation policies) while dealing with multiple, potentially independent sources of jobs and a large number of storage, compute, and network resources.We describe a scheduling framework that addresses these problems. Within this framework, data movement operations may be either tightly bound to job scheduling decisions or, alternatively, performed by a decoupled, asynchronous process on the basis of observed data access patterns and load. We develop a family of job scheduling and data movement(replication) algorithms and use simulation studies to evaluate various combinations. Our results suggest that while it is necessary to consider the impact of replication on the scheduling strategy, it is not always necessary to couple data movement and computationscheduling. Instead, these two activities can be addressed separately, thus significantly simplifying the design and implementation of the overall Data Grid system. |
| Author | Ranganathan, K. Foster, I. |
| Author_xml | – sequence: 1 givenname: K. surname: Ranganathan fullname: Ranganathan, K. organization: Dept. of Comput. Sci., Chicago Univ., IL, USA – sequence: 2 givenname: I. surname: Foster fullname: Foster, I. organization: Dept. of Comput. Sci., Chicago Univ., IL, USA |
| BookMark | eNotULtOwzAUtUSRaEs_ALFkYku5fsS1R9RSilQJhjJHjn0DRqkTYgeJvydqe5YjnddwZmQS2oCE3FFYUgr6cfe-WS8ZAFtSYFrz4orMYCV1QaWSckKmFBTLlYbVDVnE-A0jRCE0wJR8bNC2Q9f48JnZ9tgNySTfhswElzmTTBbtF7rh5PuQOR9T76sh4dnOfUgYov_FzHTjij214y25rk0TcXHhOTlsnw_rXb5_e3ldP-1zz0CmvFaVsxWnwkoUtdNc8AqEgFGSWkmmQVVMFOictTWAoSsrpIRKoakLWfE5eTjPdn37M2BM5dFHi01jArZDLDmlVKuCjcH7c9AjYtn1_mj6v_LyFv8HW-hhOQ |
| ContentType | Conference Proceeding |
| DBID | 6IE 6IL CBEJK RIE RIL 7SC 8FD JQ2 L7M L~C L~D |
| DOI | 10.1109/HPDC.2002.1029935 |
| DatabaseName | IEEE Electronic Library (IEL) Conference Proceedings IEEE Xplore POP ALL IEEE Xplore All Conference Proceedings IEEE Electronic Library (IEL) IEEE Proceedings Order Plans (POP All) 1998-Present Computer and Information Systems Abstracts Technology Research Database ProQuest Computer Science Collection Advanced Technologies Database with Aerospace Computer and Information Systems Abstracts Academic Computer and Information Systems Abstracts Professional |
| DatabaseTitle | Computer and Information Systems Abstracts Technology Research Database Computer and Information Systems Abstracts – Academic Advanced Technologies Database with Aerospace ProQuest Computer Science Collection Computer and Information Systems Abstracts Professional |
| DatabaseTitleList | Computer and Information Systems Abstracts |
| Database_xml | – sequence: 1 dbid: RIE name: IEEE Electronic Library (IEL) url: https://proxy.k.utb.cz/login?url=https://ieeexplore.ieee.org/ sourceTypes: Publisher |
| DeliveryMethod | fulltext_linktorsrc |
| Discipline | Computer Science |
| EndPage | 358 |
| ExternalDocumentID | 1029935 |
| Genre | Conference Paper |
| GroupedDBID | 29P 6IE 6IF 6IK 6IL 6IN AAJGR AAWTH ACGFS ADZIZ ALMA_UNASSIGNED_HOLDINGS BEFXN BFFAM BGNUA BKEBE BPEOZ CBEJK CHZPO IPLJI M43 OCL RIE RIL RNS 7SC 8FD AAVQY JQ2 L7M L~C L~D RIB RIC |
| ID | FETCH-LOGICAL-i206t-f8bdcb314c6e4fd9343b0440b3169862908b245eddccf00a17c4660b8eaf56b3 |
| IEDL.DBID | RIE |
| ISBN | 0769516866 9780769516868 |
| ISSN | 1082-8907 |
| IngestDate | Fri Jul 11 06:07:41 EDT 2025 Tue Aug 26 17:58:06 EDT 2025 |
| IsPeerReviewed | false |
| IsScholarly | true |
| Language | English |
| LinkModel | DirectLink |
| MergedId | FETCHMERGED-LOGICAL-i206t-f8bdcb314c6e4fd9343b0440b3169862908b245eddccf00a17c4660b8eaf56b3 |
| Notes | SourceType-Conference Papers & Proceedings-1 ObjectType-Conference Paper-1 content type line 25 |
| PQID | 31119852 |
| PQPubID | 23500 |
| PageCount | 7 |
| ParticipantIDs | proquest_miscellaneous_31119852 ieee_primary_1029935 |
| PublicationCentury | 2000 |
| PublicationDate | 20020000 20020724 |
| PublicationDateYYYYMMDD | 2002-01-01 2002-07-24 |
| PublicationDate_xml | – year: 2002 text: 20020000 |
| PublicationDecade | 2000 |
| PublicationTitle | 11th International Symposium on High-Performance Distributed Computing (HPDC-11 2002) |
| PublicationTitleAbbrev | HPDC |
| PublicationYear | 2002 |
| Publisher | IEEE |
| Publisher_xml | – name: IEEE |
| SSID | ssj0000454900 ssj0020127 |
| Score | 2.0357275 |
| Snippet | In high-energy physics, bioinformatics, and other disciplines, we encounter applications involving numerous, loosely coupled jobs that both access and generate... In high energy physics, bioinformatics, and other disciplines, we encounter applications involving numerous, loosely coupled jobs that both access and generate... |
| SourceID | proquest ieee |
| SourceType | Aggregation Database Publisher |
| StartPage | 352 |
| SubjectTerms | Application software Bioinformatics Computer science Distributed computing Laboratories Large-scale systems Physics computing Processor scheduling Resource management Scheduling algorithm |
| Title | Decoupling computation and data scheduling in distributed data-intensive applications |
| URI | https://ieeexplore.ieee.org/document/1029935 https://www.proquest.com/docview/31119852 |
| hasFullText | 1 |
| inHoldings | 1 |
| isFullTextHit | |
| isPrint | |
| link | http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV07T8MwELbaTkwFWkR5emAkxYkd154pVYVU1KGVukXxI6hCShEkC7-es-MUBAxsTqwosS939uf77g6hmzSPuY1jGRVCmojFAFBUnlJQvAnJNSgUS1zs8OKJz9fscZNuOuh2HwtjrfXkMzt2Te_LNztdu6My0HAwnjTtou5E8CZWa3-e4lLJSUeHDGDLuVQbcj1oPCDABrLDfoILzkPmnfZaBHdnTOTdfDm998yFcXhbKLvyy1b7BWjWR4v20xveycu4rtRYf_zI6vjfsR2i4VeoH17uF7Ej1LHlMeq3tR5wUP0BWk8Bp9YufPcZa9_rRYrz0mDHMsWAkmHV8v3bEhuXj9eV0rJNd7RtqfL4u8t8iFazh9X9PAolGaJtQngFElVGKxozzS0rjKSMKle0Gm5xCeBIEqESllpjtC4IyeOJZpwTJWxepFzRE9Qrd6U9RRhwowb0qYnMFaOEqaSAtkwKI5jS0o7QwE1S9tok3cjC_IzQdSuGDBTBeTfy0u7q94yC1ZYiTc7-fvAcHfhCLv705AL1qrfaXsJ-olJX_kf6BG6CxH0 |
| linkProvider | IEEE |
| linkToHtml | http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV07T8MwED5BGWAq0CLKqx4YSXES28QzUJVHUYcisUXxI6hCShEkC7-es5MUBAxsTqwosS939uf77g7glGehsGEogzyRJmAhAhSV8RgV74JmGhWKRS52ePogJo_s9ok_rcHZKhbGWuvJZ3bkmt6Xb5a6ckdlqOFoPGO-DhucMcbraK3ViYpLJicdIbKBW86pWtPrUecRA9agHXcUIhGiyb3TXieNwzOk8nwyu7r03IVR876m8Mova-2XoHEXpu3H18yTl1FVqpH--JHX8b-j24b-V7Afma2WsR1Ys8UudNtqD6RR_h48XiFSrVwA7zPRvtcLlWSFIY5nShAn47rl-xcFMS4jryumZevuYNGS5cl3p3kf5uPr-eUkaIoyBIuIihJlqoxWcci0sCw3MmaxcmWr8ZaQCI8kTVTEuDVG65zSLLzQTAiqEpvlXKh4DzrFsrD7QBA5asSfmspMsZgyFeXYllFuEqa0tAPouUlKX-u0G2kzPwMYtmJIURWcfyMr7LJ6T2O02zLh0cHfDw5hczKf3qf3Nw93h7Dly7r4s5Qj6JRvlT3G3UWpTvxP9QnpEsfK |
| openUrl | ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Abook&rft.genre=proceeding&rft.title=Proceedings+11th+IEEE+International+Symposium+on+High+Performance+Distributed+Computing&rft.atitle=Decoupling+computation+and+data+scheduling+in+distributed+data-intensive+applications&rft.au=Ranganathan%2C+K.&rft.au=Foster%2C+I.&rft.date=2002-01-01&rft.pub=IEEE&rft.isbn=9780769516868&rft.issn=1082-8907&rft.spage=352&rft.epage=358&rft_id=info:doi/10.1109%2FHPDC.2002.1029935&rft.externalDocID=1029935 |
| thumbnail_l | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=1082-8907&client=summon |
| thumbnail_m | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=1082-8907&client=summon |
| thumbnail_s | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=1082-8907&client=summon |