Parallel Performance Evaluation and Optimization

This chapter covers the most important aspects of shared‐memory parallel programming that impact performance. It gives guidance for diagnosing such issues in order to assist in performance tuning. The chapter overviews the performance impact of cache coherence, and presents the guidelines for minimi...

Full description

Saved in:
Bibliographic Details
Published inProgramming multi‐core and many‐core computing systems pp. 343 - 362
Main Author Shafi, Hazim
Format Book Chapter
LanguageEnglish
Published Hoboken, NJ, USA John Wiley & Sons, Inc 24.01.2017
Subjects
Online AccessGet full text
ISBN0470936908
9780470936900
DOI10.1002/9781119332015.ch17

Cover

Abstract This chapter covers the most important aspects of shared‐memory parallel programming that impact performance. It gives guidance for diagnosing such issues in order to assist in performance tuning. The chapter overviews the performance impact of cache coherence, and presents the guidelines for minimizing these overheads: minimize write sharing and avoid false sharing. Nonuniform memory access (NUMA) systems present a challenge to application performance because, depending on where a thread is running and which memory address it's accessing, the performance of the application may vary. This presents developers with the additional burden of ensuring that their applications do not suffer from NUMA latency effects. The chapter describes how this may be accomplished. I/O latency can be a major source of serialization in a parallel application. The best way to deal with I/O is to overlap it with other work when possible.
AbstractList This chapter covers the most important aspects of shared‐memory parallel programming that impact performance. It gives guidance for diagnosing such issues in order to assist in performance tuning. The chapter overviews the performance impact of cache coherence, and presents the guidelines for minimizing these overheads: minimize write sharing and avoid false sharing. Nonuniform memory access (NUMA) systems present a challenge to application performance because, depending on where a thread is running and which memory address it's accessing, the performance of the application may vary. This presents developers with the additional burden of ensuring that their applications do not suffer from NUMA latency effects. The chapter describes how this may be accomplished. I/O latency can be a major source of serialization in a parallel application. The best way to deal with I/O is to overlap it with other work when possible.
Author Shafi, Hazim
Author_xml – sequence: 1
  givenname: Hazim
  surname: Shafi
  fullname: Shafi, Hazim
BookMark eNptkFFLwzAUhSMq6Gb_gE_9A633Jk3TPMqYThhsD9tzSJoEg1lb2qm4Xy-tvkx8ORcO57twzoxcNW3jCLlHyBGAPkhRIaJkjALyvH5FcUGSM_OSzKAQIFkpobohyTAEA3RKFHBLYKt7HaOL6db1vu0PuqlduvzQ8V0fQ9ukurHppjuGQzhNxh259joOLvm9c7J_Wu4Wq2y9eX5ZPK6zATnzmQUouEDKvRZgSu48GCMKU6PhZSVpJUVpTcWwBubQ1YWlqA1ICwatQM_mJP_5-xmi-1LOtO3boM6qqVPo1NhZdXYE2D8AghqH-gNO0CjsGwSJW_0
ContentType Book Chapter
Copyright Copyright © 2017 by John Wiley & Sons, Inc. All rights reserved
Copyright_xml – notice: Copyright © 2017 by John Wiley & Sons, Inc. All rights reserved
DOI 10.1002/9781119332015.ch17
DatabaseTitleList
DeliveryMethod fulltext_linktorsrc
Discipline Computer Science
EISBN 9781119332015
111933201X
Editor Xhafa, Fatos
Pllana, Sabri
Editor_xml – sequence: 1
  givenname: Sabri
  surname: Pllana
  fullname: Pllana, Sabri
– sequence: 2
  givenname: Fatos
  surname: Xhafa
  fullname: Xhafa, Fatos
EndPage 362
ExternalDocumentID 10.1002/9781119332015.ch17
Genre chapter
GroupedDBID 38.
3XM
AABBV
ABARN
ABQPQ
ABQPW
ACHMX
ADVEM
AERYV
AFOJC
AFPKT
AJFER
ALMA_UNASSIGNED_HOLDINGS
ASGYQ
AZZ
BBABE
BIBOL
CZZ
DFSMB
DMGWJ
DPMII
GEOUK
IPJKO
JFSCD
LQKAK
LWYJN
LYPXV
MUFYN
PQQKQ
W1A
YPLAZ
ZEEST
ID FETCH-LOGICAL-s153f-d00457125fa70b65ef0bb74bc1b568928976db831c03e1ec4d21ab09d0b1d71f3
ISBN 0470936908
9780470936900
IngestDate Thu Jun 02 19:28:32 EDT 2022
Wed Nov 27 04:53:37 EST 2019
IsPeerReviewed false
IsScholarly false
Language English
LinkModel OpenURL
MergedId FETCHMERGED-LOGICAL-s153f-d00457125fa70b65ef0bb74bc1b568928976db831c03e1ec4d21ab09d0b1d71f3
PageCount 20
ParticipantIDs wiley_ebooks_10_1002_9781119332015_ch17_ch17
PublicationCentury 2000
PublicationDate 2017-01-24
PublicationDateYYYYMMDD 2017-01-24
PublicationDate_xml – month: 01
  year: 2017
  text: 2017-01-24
  day: 24
PublicationDecade 2010
PublicationPlace Hoboken, NJ, USA
PublicationPlace_xml – name: Hoboken, NJ, USA
PublicationTitle Programming multi‐core and many‐core computing systems
PublicationYear 2017
Publisher John Wiley & Sons, Inc
Publisher_xml – name: John Wiley & Sons, Inc
SSID ssib027811140
ssj0001756349
ssib043667605
Score 1.5021787
Snippet This chapter covers the most important aspects of shared‐memory parallel programming that impact performance. It gives guidance for diagnosing such issues in...
SourceID wiley
SourceType Enrichment Source
Publisher
StartPage 343
SubjectTerms cache coherence
nonuniform memory access
overlapping latency
parallel performance optimization
parallel performance tuning
shared‐memory parallel programming
Title Parallel Performance Evaluation and Optimization
URI https://onlinelibrary.wiley.com/doi/abs/10.1002/9781119332015.ch17
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
link http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwnV1LSwMxEA5aL-LBN9YXexC8uDVpsq-zVIrg49BCb2WTbLBga2nrpb_eyWNfbRHqJSxh2Gz2y04mszPfIHQnGUm4EokfZKH0meChH1MV-CwBxJWCHVPqBOfXt7DbZy-DYJAX2nbZJQveEsuNeSX_QRX6AFedJbsFssVNoQOuAV9oAWFoV4zfupvVkl7YyKqxPuubsMAibkEzU5qfAmP40uu9whRxMD6EClW5oWhMlS1gnS5H4-pC-khnut7Klw6WL3IMOgVJuBnoHRTP2GV0Vv0IRAec-u3Sj7ghUKcejWlPnJhFWNcAxLii9ahlWnIbKLXqdU03W65XzbFFwGqk8AxBS3zazM0Vzuu_xHfRbhSDRtuDfbtTeGbaRq7krWFUB-86XWb8bVEQUpYYUgI3hTgnYMqn5JKqYOTH9XHrpxdjfvSO0IFOSfF0rgg8_zHaySYn6DAvyOE5_XyKcI6VV8HKK7HyACuvitUZ6j93ek9d35XE8OewNSlfahM8AqNUpRHmYZApzHnEuCA8COMETs9RKHlMicA0I5lgsk1SjhOJOZERUfQcNSbfk-wCeYrFKuVJLIME8AsJJ5jyVBElwYgWsWyiBzPfoflrPx9afuv2sPZmhvrNmKaJ7mvidbHlaGpFp1JdbnfjK7RfLtdr1FjMfrIbMAQX_NatgV-m-01z
linkProvider ProQuest Ebooks
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Abook&rft.genre=bookitem&rft.title=Programming+multi%E2%80%90core+and+many%E2%80%90core+computing+systems&rft.au=Shafi%2C+Hazim&rft.atitle=Parallel+Performance+Evaluation+and+Optimization&rft.date=2017-01-24&rft.pub=John+Wiley+%26+Sons%2C+Inc&rft.isbn=9780470936900&rft.spage=343&rft.epage=362&rft_id=info:doi/10.1002%2F9781119332015.ch17&rft.externalDocID=10.1002%2F9781119332015.ch17
thumbnail_l http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=9780470936900/lc.gif&client=summon&freeimage=true
thumbnail_m http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=9780470936900/mc.gif&client=summon&freeimage=true
thumbnail_s http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=9780470936900/sc.gif&client=summon&freeimage=true