Task allocation and reallocation for fault tolerance in multicomputer systems

The goal of task allocation in a set of interconnected processors (computers) is to maximize the efficient use of resources and thus reduce the job turnaround time. Proposed is a simple yet effective method to allocate the tasks in multicomputer systems for minimizing the interprocessor communicatio...

Full description

Saved in:

Bibliographic Details
Published in	IEEE transactions on aerospace and electronic systems Vol. 30; no. 4; pp. 1094 - 1104
Main Authors	Chen, C.-I.H., Cherkassky, V.
Format	Journal Article
Language	English
Published	IEEE 01.10.1994
Subjects	Communication networks Costs Fault tolerance Fault tolerant systems Hardware Load management Processor scheduling Redundancy Reliability Resource management
Online Access	Get full text
ISSN	0018-9251
DOI	10.1109/7.328753

Cover

Abstract	The goal of task allocation in a set of interconnected processors (computers) is to maximize the efficient use of resources and thus reduce the job turnaround time. Proposed is a simple yet effective method to allocate the tasks in multicomputer systems for minimizing the interprocessor communication cost subject to resource limitations defined by the system and designer. The limitations can be viewed as results from the load balancing since the execution time of each task, the number of available processors, processor speed, and memory capacity are known to the system or designer. As the number of processors increases, the probability of a failure existing somewhere in the systems at any time also increases. Very few established task allocation models have considered the reliability property. In multicomputer systems, we define system reliability as the probability that the system can run the tasks successfully. After the (nonredundant) task scheduling strategy is defined, tasks are then reallocated to processors statically and redundantly. This is a form of time redundancy, in which if some processors fail during the execution, all tasks can be completed on the remaining processors (but at a longer time). Due to static preallocation of tasks this method is simpler and thus more practical than well-known dynamic reconfiguration and rollback recovery techniques in multicomputer systems. We demonstrate the effectiveness of the task allocation and reallocation for hardware fault tolerance by illustrations of applying the methods to different examples and practical communications network multiprocessor system.< >
AbstractList	The goal of task allocation in a set of interconnected processors (computers) is to maximize the efficient use of resources and thus reduce the job turnaround time. Proposed is a simple yet effective method to allocate the tasks in multicomputer systems for minimizing the interprocessor communication cost subject to resource limitations defined by the system and designer. The limitations can be viewed as results from the load balancing since the execution time of each task, the number of available processors, processor speed, and memory capacity are known to the system or designer. As the number of processors increases, the probability of a failure existing somewhere in the systems at any time also increases. Very few established task allocation models have considered the reliability property. In multicomputer systems, we define system reliability as the probability that the system can run the tasks successfully. After the (nonredundant) task scheduling strategy is defined, tasks are then reallocated to processors statically and redundantly. This is a form of time redundancy, in which if some processors fail during the execution, all tasks can be completed on the remaining processors (but at a longer time). Due to static preallocation of tasks this method is simpler and thus more practical than well-known dynamic reconfiguration and rollback recovery techniques in multicomputer systems. We demonstrate the effectiveness of the task allocation and reallocation for hardware fault tolerance by illustrations of applying the methods to different examples and practical communications network multiprocessor system Proposed here is a simple yet effective method to allocate the tasks in multicomputer systems for minimizing the interprocessor communication cost subject to resource limitations defined by the system and designer. The limitations can be viewed as results from the load balancing since the execution time of each task the number of available processors, processor speed, and memory capacity are known to the system or designer. As the number of processors increases, the probability of a failure existing somewhere in the systems at any time also increases. Very few established task allocation models have considered the reIiability property. We define system reliability as the probability that the system can run the tasks successfully. After the (nonredundant) task scheduling strategy is defined, tasks are then reallocated to processors statically and redundantly. This is a form of time redundancy, in which if some processors fail during the execution, all tasks can be completed on the remaining processors (but at a longer time). (Author) The goal of task allocation in a set of interconnected processors (computers) is to maximize the efficient use of resources and thus reduce the job turnaround time. Proposed is a simple yet effective method to allocate the tasks in multicomputer systems for minimizing the interprocessor communication cost subject to resource limitations defined by the system and designer. The limitations can be viewed as results from the load balancing since the execution time of each task, the number of available processors, processor speed, and memory capacity are known to the system or designer. As the number of processors increases, the probability of a failure existing somewhere in the systems at any time also increases. Very few established task allocation models have considered the reliability property. In multicomputer systems, we define system reliability as the probability that the system can run the tasks successfully. After the (nonredundant) task scheduling strategy is defined, tasks are then reallocated to processors statically and redundantly. This is a form of time redundancy, in which if some processors fail during the execution, all tasks can be completed on the remaining processors (but at a longer time). Due to static preallocation of tasks this method is simpler and thus more practical than well-known dynamic reconfiguration and rollback recovery techniques in multicomputer systems. We demonstrate the effectiveness of the task allocation and reallocation for hardware fault tolerance by illustrations of applying the methods to different examples and practical communications network multiprocessor system.< >
Author	Cherkassky, V. Chen, C.-I.H.
Author_xml	– sequence: 1 givenname: C.-I.H. surname: Chen fullname: Chen, C.-I.H. organization: Dept. of Electr. Eng., Wright State Univ., Dayton, OH, USA – sequence: 2 givenname: V. surname: Cherkassky fullname: Cherkassky, V.
BookMark	eNqFkEtLxDAUhbMYwZlRcO0qK3HTMY82j6UMvmDEzbguMb2FaNqMSbqYf2-1g4gIri733O8cLmeBZn3oAaEzSlaUEn0lV5wpWfEZmhNCVaFZRY_RIqXXcS1VyefocWvSGzbeB2uyCz02fYMj_BDaEHFrBp9xDh6i6S1g1-NuVJwN3W7IEHHapwxdOkFHrfEJTg9ziZ5vb7br-2LzdPewvt4UlguWC6UN4VRwK5qqrLQRxlaCcqBcmpJoxahtG9s0oDWtXiwdNaIoCJBSSctKvkQXU-4uhvcBUq47lyx4b3oIQ6rZyAui-P-gEqqsBPsflEwzOf68RJcTaGNIKUJb76LrTNzXlNSfrdeynlof0dUv1Lr8VWqOxvm_DOeTwQHAd-7h-AHpsY9V
CODEN	IEARAX
CitedBy_id	crossref_primary_10_1109_TAES_2014_130690 crossref_primary_10_1109_TPDS_2010_34 crossref_primary_10_1109_24_994922
Cites_doi	10.1109/TC.1985.6312211 10.1137/0603056 10.1109/MC.1986.1663180 10.1109/TC.1987.1676966 10.1109/TR.1982.5221436 10.1109/TC.1979.1675348 10.1109/TC.1976.1674656 10.1109/TC.1984.1676479 10.1109/FTCS.1991.146684 10.1109/TSE.1987.233201 10.1109/MC.1984.1659213 10.1109/TC.1985.1676563 10.1109/TC.1980.1675654 10.1016/0026-2714(84)90221-X 10.1109/TC.1986.1676799 10.1002/j.1538-7305.1970.tb01770.x 10.1109/TC.1984.1676403
ContentType	Journal Article
DBID	AAYXX CITATION 8FD H8D L7M 7SP 7TB FR3
DOI	10.1109/7.328753
DatabaseName	CrossRef Technology Research Database Aerospace Database Advanced Technologies Database with Aerospace Electronics & Communications Abstracts Mechanical & Transportation Engineering Abstracts Engineering Research Database
DatabaseTitle	CrossRef Technology Research Database Aerospace Database Advanced Technologies Database with Aerospace Engineering Research Database Mechanical & Transportation Engineering Abstracts Electronics & Communications Abstracts
DatabaseTitleList	Engineering Research Database Technology Research Database Technology Research Database
DeliveryMethod	fulltext_linktorsrc
Discipline	Engineering
EndPage	1104
ExternalDocumentID	10_1109_7_328753 328753
GroupedDBID	-~X 0R~ 29I 4.4 41~ 5GY 5VS 6IK 97E AAJGR AARMG AASAJ AAWTH ABAZT ABQJQ ABVLG ACGFO ACIWK ACNCT AENEX AETIX AGQYO AGSQL AHBIQ AI. AIBXA AKJIK AKQYR ALLEH ALMA_UNASSIGNED_HOLDINGS ASUFR ATWAV BEFXN BFFAM BGNUA BKEBE BPEOZ CS3 DU5 EBS EJD F5P H~9 IAAWW IBMZZ ICLAB IFIPE IFJZH IPLJI JAVBF LAI M43 OCL P2P RIA RIE RNS TN5 VH1 AAYXX CITATION 8FD H8D L7M 7SP 7TB FR3
ID	FETCH-LOGICAL-c362t-89a03163c6d5459a6ac5613e137a409821cfdcdde9915bc1409081e6e7787c243
IEDL.DBID	RIE
ISSN	0018-9251
IngestDate	Sun Sep 28 00:59:02 EDT 2025 Sat Sep 27 20:53:21 EDT 2025 Sat Sep 27 22:26:38 EDT 2025 Thu Apr 24 23:11:11 EDT 2025 Wed Oct 01 01:41:00 EDT 2025 Wed Aug 27 02:52:21 EDT 2025
IsPeerReviewed	true
IsScholarly	true
Issue	4
Language	English
License	https://ieeexplore.ieee.org/Xplorehelp/downloads/license-information/IEEE.html
LinkModel	DirectLink
MergedId	FETCHMERGED-LOGICAL-c362t-89a03163c6d5459a6ac5613e137a409821cfdcdde9915bc1409081e6e7787c243
Notes	ObjectType-Article-2 SourceType-Scholarly Journals-1 ObjectType-Feature-1 content type line 23
PQID	27292716
PQPubID	23500
PageCount	11
ParticipantIDs	proquest_miscellaneous_27292716 crossref_citationtrail_10_1109_7_328753 ieee_primary_328753 proquest_miscellaneous_29086083 crossref_primary_10_1109_7_328753 proquest_miscellaneous_28684562
ProviderPackageCode	CITATION AAYXX
PublicationCentury	1900
PublicationDate	1994-10-01
PublicationDateYYYYMMDD	1994-10-01
PublicationDate_xml	– month: 10 year: 1994 text: 1994-10-01 day: 01
PublicationDecade	1990
PublicationTitle	IEEE transactions on aerospace and electronic systems
PublicationTitleAbbrev	T-AES
PublicationYear	1994
Publisher	IEEE
Publisher_xml	– name: IEEE
References	ref12 ref14 meyer (ref15) 1980; c 29 ref11 (ref25) 1991 ref2 johnson (ref10) 1989 ref1 ref16 ref19 ref18 mathur (ref13) 1970 blazewicz (ref6) 1979 cherkassky (ref17) 1987 (ref24) 0 ref26 ref20 ref22 ref21 kuhl (ref23) 1980 ref8 ref7 (ref27) 1984 ref9 ref5 dasarathy (ref4) 1984 chang (ref3) 1986
References_xml	– start-page: 175 year: 1986 ident: ref3 article-title: Distributed scheduling under deadline constraints: a comparison of sender-initiated and receiver-initiated approaches publication-title: Proc of the IEEE Real-Time Systems Symposium – year: 1989 ident: ref10 publication-title: Design and Analysis of Fault Tolerant Digital Systems – ident: ref2 doi: 10.1109/TC.1985.6312211 – year: 0 ident: ref24 publication-title: Open Systems Interconnection - Basic Reference Model – ident: ref21 doi: 10.1137/0603056 – ident: ref12 doi: 10.1109/MC.1986.1663180 – year: 1979 ident: ref6 publication-title: Performance of Computer Systems – ident: ref16 doi: 10.1109/TC.1987.1676966 – ident: ref9 doi: 10.1109/TR.1982.5221436 – start-page: 291 year: 1980 ident: ref23 article-title: Some extensions to the theory of system level fault diagnosis publication-title: Proc FTCS – ident: ref5 doi: 10.1109/TC.1979.1675348 – start-page: 375 year: 1970 ident: ref13 article-title: Reliability analysis and architecture of a lightly redundant digital system: Generalized triple modular redundancy with self-repair publication-title: Proceedings of the Spring Joint Computer Conference (proceedings of the American Federation of Information Processing Societies Conference) – ident: ref14 doi: 10.1109/TC.1976.1674656 – ident: ref19 doi: 10.1109/TC.1984.1676479 – ident: ref26 doi: 10.1109/FTCS.1991.146684 – ident: ref1 doi: 10.1109/TSE.1987.233201 – ident: ref11 doi: 10.1109/MC.1984.1659213 – start-page: 885 year: 1987 ident: ref17 article-title: Graceful degradation of multiprocessor systems publication-title: Proceedings of ICPP-17 – ident: ref7 doi: 10.1109/TC.1985.1676563 – volume: c 29 start-page: 720 year: 1980 ident: ref15 article-title: On evaluating the Performance of degradable computer systems publication-title: IEEE Transactions on Computers doi: 10.1109/TC.1980.1675654 – year: 1984 ident: ref27 publication-title: General Information Manual (Auragen System 4000) – ident: ref18 doi: 10.1016/0026-2714(84)90221-X – ident: ref8 doi: 10.1109/TC.1986.1676799 – ident: ref22 doi: 10.1002/j.1538-7305.1970.tb01770.x – year: 1991 ident: ref25 publication-title: MULTIBUS-II hot board products insertion investigation – ident: ref20 doi: 10.1109/TC.1984.1676403 – start-page: 135 year: 1984 ident: ref4 article-title: Task allocation problems in the synthesis of distributed real-time system publication-title: Proceedings of the IEEE Real-Time System Symposium
SSID	ssj0014843
Score	1.4979013
Snippet	The goal of task allocation in a set of interconnected processors (computers) is to maximize the efficient use of resources and thus reduce the job turnaround... Proposed here is a simple yet effective method to allocate the tasks in multicomputer systems for minimizing the interprocessor communication cost subject to...
SourceID	proquest crossref ieee
SourceType	Aggregation Database Enrichment Source Index Database Publisher
StartPage	1094
SubjectTerms	Communication networks Costs Fault tolerance Fault tolerant systems Hardware Load management Processor scheduling Redundancy Reliability Resource management
Title	Task allocation and reallocation for fault tolerance in multicomputer systems
URI	https://ieeexplore.ieee.org/document/328753 https://www.proquest.com/docview/27292716 https://www.proquest.com/docview/28684562 https://www.proquest.com/docview/29086083
Volume	30
hasFullText	1
inHoldings	1
isFullTextHit
isPrint
journalDatabaseRights	– providerCode: PRVIEE databaseName: IEEE/IET Electronic Library issn: 0018-9251 databaseCode: RIE dateStart: 19650101 customDbUrl: isFulltext: true dateEnd: 99991231 titleUrlDefault: https://ieeexplore.ieee.org/ omitProxy: false ssIdentifier: ssj0014843 providerName: IEEE
link	http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV3dT4MwEG-cT_rgx9Q4P2ti4hMMCpT20RiXxUSftmRvpJQ2MVvACLz413tt2fxaFt9IOQKUHve73t3vELqlPOU5jRNPgLHzYqJjTwQi8HTCiamXKYi0CbIvdDyNn2bJrOPZtrUwSimbfKZ8c2hj-UUlW7NVNoyIQdc91EsZdaVaq4BBzLoEuRD0F2x2xzMbBnyY-u66H5bHtlL58_-1RmW076q1a8tFaHJJ5n7b5L78-MXU-M_nPUB7HbjE9241HKItVfbR7jfKwSP0PBH1HJtou9urw6IsMADHrwFAsViLdtHgploo03dD4dcS28xD2fWAwI4Auj5G09Hj5GHsdS0VPAmWqvEYF6DFNJK0AOjEBRXSeBAqjFIBnh4jodSFhF8ewMYkl4YNCzCDoioFxZYkjk7QdlmV6hRhFsUJE4HiWpgyf8JYznMdSa1SnWiSDNDdcroz2fGNm7YXi8z6HQHP0sxN0ADdrCTfHMfGGpm-md_V-eXo9fIDZqAWJtYhSlW1dUbAaSDgC26QYJQZ92-DBLw7BYh6tvbe52jH8irbxL4LtN28t-oSAEqTX9ml-QkM-OQ5
linkProvider	IEEE
linkToHtml	http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV1LT8MwDI54HIADjwHiTZCQOHXr0iRNjgiBBmw7bdJuVZomEtrUIdZe-PU4aTeemrhVqau2aVx_ju3PCF1zGcuUUxYoMHYBJZYGKlRhYJkkrl4mI9onyPZ5Z0ifRmxU82z7WhhjjE8-M0136GP52VSXbqusFRGHrlfROqOUsqpYaxEyoKJOkWuDBoPVrplm26Fsxc3qym-2xzdT-fUH9mblYaeq1555NkKXTTJulkXa1O8_uBr_-cS7aLuGl_i2Wg97aMXkDbT1hXRwH_UGajbGLt5e7dZhlWcYoOPnAOBYbFU5KXAxnRjXecPglxz73ENdd4HAFQX07AANH-4Hd52gbqoQaLBVRSCkAj3mkeYZgCepuNLOhzDtKFbg6wnS1jbT8NMD4MhS7fiwADUYbmJQbU1odIjW8mlujhAWEWVChUZa5Qr9iRCpTG2krYkts4Qdo5v5dCe6Zhx3jS8mifc8QpnESTVBx-hqIflasWz8IdNw87s4Px-9nH_ABBTDRTtUbqblLCHgNhDwBpdICC6cA7hEAt6dA0g9-fPel2ijM-h1k-5j__kUbXqWZZ_md4bWirfSnANcKdILv0w_ACGQ54Y
openUrl	ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Task+allocation+and+reallocation+for+fault+tolerance+in+multicomputer+systems&rft.jtitle=IEEE+transactions+on+aerospace+and+electronic+systems&rft.au=Chen%2C+C-I+H&rft.au=Cherkassky%2C+V&rft.date=1994-10-01&rft.issn=0018-9251&rft.volume=30&rft.issue=4&rft.spage=1094&rft.epage=1104&rft_id=info:doi/10.1109%2F7.328753&rft.externalDBID=NO_FULL_TEXT
thumbnail_l	http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=0018-9251&client=summon
thumbnail_m	http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=0018-9251&client=summon
thumbnail_s	http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=0018-9251&client=summon