Parallel Performance Evaluation and Optimization

This chapter covers the most important aspects of shared‐memory parallel programming that impact performance. It gives guidance for diagnosing such issues in order to assist in performance tuning. The chapter overviews the performance impact of cache coherence, and presents the guidelines for minimi...

Full description

Saved in:

Bibliographic Details
Published in	Programming multi‐core and many‐core computing systems pp. 343 - 362
Main Author	Shafi, Hazim
Format	Book Chapter
Language	English
Published	Hoboken, NJ, USA John Wiley & Sons, Inc 24.01.2017
Subjects	cache coherence nonuniform memory access overlapping latency parallel performance optimization parallel performance tuning shared‐memory parallel programming
Online Access	Get full text
ISBN	0470936908 9780470936900
DOI	10.1002/9781119332015.ch17

Cover

Abstract	This chapter covers the most important aspects of shared‐memory parallel programming that impact performance. It gives guidance for diagnosing such issues in order to assist in performance tuning. The chapter overviews the performance impact of cache coherence, and presents the guidelines for minimizing these overheads: minimize write sharing and avoid false sharing. Nonuniform memory access (NUMA) systems present a challenge to application performance because, depending on where a thread is running and which memory address it's accessing, the performance of the application may vary. This presents developers with the additional burden of ensuring that their applications do not suffer from NUMA latency effects. The chapter describes how this may be accomplished. I/O latency can be a major source of serialization in a parallel application. The best way to deal with I/O is to overlap it with other work when possible.
AbstractList	This chapter covers the most important aspects of shared‐memory parallel programming that impact performance. It gives guidance for diagnosing such issues in order to assist in performance tuning. The chapter overviews the performance impact of cache coherence, and presents the guidelines for minimizing these overheads: minimize write sharing and avoid false sharing. Nonuniform memory access (NUMA) systems present a challenge to application performance because, depending on where a thread is running and which memory address it's accessing, the performance of the application may vary. This presents developers with the additional burden of ensuring that their applications do not suffer from NUMA latency effects. The chapter describes how this may be accomplished. I/O latency can be a major source of serialization in a parallel application. The best way to deal with I/O is to overlap it with other work when possible.
Author	Shafi, Hazim
Author_xml	– sequence: 1 givenname: Hazim surname: Shafi fullname: Shafi, Hazim
BookMark	eNptkFFLwzAUhSMq6Gb_gE_9A633Jk3TPMqYThhsD9tzSJoEg1lb2qm4Xy-tvkx8ORcO57twzoxcNW3jCLlHyBGAPkhRIaJkjALyvH5FcUGSM_OSzKAQIFkpobohyTAEA3RKFHBLYKt7HaOL6db1vu0PuqlduvzQ8V0fQ9ukurHppjuGQzhNxh259joOLvm9c7J_Wu4Wq2y9eX5ZPK6zATnzmQUouEDKvRZgSu48GCMKU6PhZSVpJUVpTcWwBubQ1YWlqA1ICwatQM_mJP_5-xmi-1LOtO3boM6qqVPo1NhZdXYE2D8AghqH-gNO0CjsGwSJW_0
ContentType	Book Chapter
Copyright	Copyright © 2017 by John Wiley & Sons, Inc. All rights reserved
Copyright_xml	– notice: Copyright © 2017 by John Wiley & Sons, Inc. All rights reserved
DOI	10.1002/9781119332015.ch17
DatabaseTitleList
DeliveryMethod	fulltext_linktorsrc
Discipline	Computer Science
EISBN	9781119332015 111933201X
Editor	Xhafa, Fatos Pllana, Sabri
Editor_xml	– sequence: 1 givenname: Sabri surname: Pllana fullname: Pllana, Sabri – sequence: 2 givenname: Fatos surname: Xhafa fullname: Xhafa, Fatos
EndPage	362
ExternalDocumentID	10.1002/9781119332015.ch17
Genre	chapter
GroupedDBID	38. 3XM AABBV ABARN ABQPQ ABQPW ACHMX ADVEM AERYV AFOJC AFPKT AJFER ALMA_UNASSIGNED_HOLDINGS ASGYQ AZZ BBABE BIBOL CZZ DFSMB DMGWJ DPMII GEOUK IPJKO JFSCD LQKAK LWYJN LYPXV MUFYN PQQKQ W1A YPLAZ ZEEST
ID	FETCH-LOGICAL-s153f-d00457125fa70b65ef0bb74bc1b568928976db831c03e1ec4d21ab09d0b1d71f3
ISBN	0470936908 9780470936900
IngestDate	Thu Jun 02 19:28:32 EDT 2022 Wed Nov 27 04:53:37 EST 2019
IsPeerReviewed	false
IsScholarly	false
Language	English
LinkModel	OpenURL
MergedId	FETCHMERGED-LOGICAL-s153f-d00457125fa70b65ef0bb74bc1b568928976db831c03e1ec4d21ab09d0b1d71f3
PageCount	20
ParticipantIDs	wiley_ebooks_10_1002_9781119332015_ch17_ch17
PublicationCentury	2000
PublicationDate	2017-01-24
PublicationDateYYYYMMDD	2017-01-24
PublicationDate_xml	– month: 01 year: 2017 text: 2017-01-24 day: 24
PublicationDecade	2010
PublicationPlace	Hoboken, NJ, USA
PublicationPlace_xml	– name: Hoboken, NJ, USA
PublicationTitle	Programming multi‐core and many‐core computing systems
PublicationYear	2017
Publisher	John Wiley & Sons, Inc
Publisher_xml	– name: John Wiley & Sons, Inc
SSID	ssib027811140 ssj0001756349 ssib043667605
Score	1.5021787
Snippet	This chapter covers the most important aspects of shared‐memory parallel programming that impact performance. It gives guidance for diagnosing such issues in...
SourceID	wiley
SourceType	Enrichment Source Publisher
StartPage	343
SubjectTerms	cache coherence nonuniform memory access overlapping latency parallel performance optimization parallel performance tuning shared‐memory parallel programming
Title	Parallel Performance Evaluation and Optimization
URI	https://onlinelibrary.wiley.com/doi/abs/10.1002/9781119332015.ch17
hasFullText	1
inHoldings	1
isFullTextHit
isPrint
link	http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwnV1LSwMxEA5aL-LBN9YXexC8uDVpsq-zVIrg49BCb2WTbLBga2nrpb_eyWNfbRHqJSxh2Gz2y04mszPfIHQnGUm4EokfZKH0meChH1MV-CwBxJWCHVPqBOfXt7DbZy-DYJAX2nbZJQveEsuNeSX_QRX6AFedJbsFssVNoQOuAV9oAWFoV4zfupvVkl7YyKqxPuubsMAibkEzU5qfAmP40uu9whRxMD6EClW5oWhMlS1gnS5H4-pC-khnut7Klw6WL3IMOgVJuBnoHRTP2GV0Vv0IRAec-u3Sj7ghUKcejWlPnJhFWNcAxLii9ahlWnIbKLXqdU03W65XzbFFwGqk8AxBS3zazM0Vzuu_xHfRbhSDRtuDfbtTeGbaRq7krWFUB-86XWb8bVEQUpYYUgI3hTgnYMqn5JKqYOTH9XHrpxdjfvSO0IFOSfF0rgg8_zHaySYn6DAvyOE5_XyKcI6VV8HKK7HyACuvitUZ6j93ek9d35XE8OewNSlfahM8AqNUpRHmYZApzHnEuCA8COMETs9RKHlMicA0I5lgsk1SjhOJOZERUfQcNSbfk-wCeYrFKuVJLIME8AsJJ5jyVBElwYgWsWyiBzPfoflrPx9afuv2sPZmhvrNmKaJ7mvidbHlaGpFp1JdbnfjK7RfLtdr1FjMfrIbMAQX_NatgV-m-01z
linkProvider	ProQuest Ebooks
openUrl	ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Abook&rft.genre=bookitem&rft.title=Programming+multi%E2%80%90core+and+many%E2%80%90core+computing+systems&rft.au=Shafi%2C+Hazim&rft.atitle=Parallel+Performance+Evaluation+and+Optimization&rft.date=2017-01-24&rft.pub=John+Wiley+%26+Sons%2C+Inc&rft.isbn=9780470936900&rft.spage=343&rft.epage=362&rft_id=info:doi/10.1002%2F9781119332015.ch17&rft.externalDocID=10.1002%2F9781119332015.ch17
thumbnail_l	http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=9780470936900/lc.gif&client=summon&freeimage=true
thumbnail_m	http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=9780470936900/mc.gif&client=summon&freeimage=true
thumbnail_s	http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=9780470936900/sc.gif&client=summon&freeimage=true