Parallel Performance Evaluation and Optimization
This chapter covers the most important aspects of shared‐memory parallel programming that impact performance. It gives guidance for diagnosing such issues in order to assist in performance tuning. The chapter overviews the performance impact of cache coherence, and presents the guidelines for minimi...
Saved in:
| Published in | Programming multi‐core and many‐core computing systems pp. 343 - 362 |
|---|---|
| Main Author | |
| Format | Book Chapter |
| Language | English |
| Published |
Hoboken, NJ, USA
John Wiley & Sons, Inc
24.01.2017
|
| Subjects | |
| Online Access | Get full text |
| ISBN | 0470936908 9780470936900 |
| DOI | 10.1002/9781119332015.ch17 |
Cover
| Abstract | This chapter covers the most important aspects of shared‐memory parallel programming that impact performance. It gives guidance for diagnosing such issues in order to assist in performance tuning. The chapter overviews the performance impact of cache coherence, and presents the guidelines for minimizing these overheads: minimize write sharing and avoid false sharing. Nonuniform memory access (NUMA) systems present a challenge to application performance because, depending on where a thread is running and which memory address it's accessing, the performance of the application may vary. This presents developers with the additional burden of ensuring that their applications do not suffer from NUMA latency effects. The chapter describes how this may be accomplished. I/O latency can be a major source of serialization in a parallel application. The best way to deal with I/O is to overlap it with other work when possible. |
|---|---|
| AbstractList | This chapter covers the most important aspects of shared‐memory parallel programming that impact performance. It gives guidance for diagnosing such issues in order to assist in performance tuning. The chapter overviews the performance impact of cache coherence, and presents the guidelines for minimizing these overheads: minimize write sharing and avoid false sharing. Nonuniform memory access (NUMA) systems present a challenge to application performance because, depending on where a thread is running and which memory address it's accessing, the performance of the application may vary. This presents developers with the additional burden of ensuring that their applications do not suffer from NUMA latency effects. The chapter describes how this may be accomplished. I/O latency can be a major source of serialization in a parallel application. The best way to deal with I/O is to overlap it with other work when possible. |
| Author | Shafi, Hazim |
| Author_xml | – sequence: 1 givenname: Hazim surname: Shafi fullname: Shafi, Hazim |
| BookMark | eNptkFFLwzAUhSMq6Gb_gE_9A633Jk3TPMqYThhsD9tzSJoEg1lb2qm4Xy-tvkx8ORcO57twzoxcNW3jCLlHyBGAPkhRIaJkjALyvH5FcUGSM_OSzKAQIFkpobohyTAEA3RKFHBLYKt7HaOL6db1vu0PuqlduvzQ8V0fQ9ukurHppjuGQzhNxh259joOLvm9c7J_Wu4Wq2y9eX5ZPK6zATnzmQUouEDKvRZgSu48GCMKU6PhZSVpJUVpTcWwBubQ1YWlqA1ICwatQM_mJP_5-xmi-1LOtO3boM6qqVPo1NhZdXYE2D8AghqH-gNO0CjsGwSJW_0 |
| ContentType | Book Chapter |
| Copyright | Copyright © 2017 by John Wiley & Sons, Inc. All rights reserved |
| Copyright_xml | – notice: Copyright © 2017 by John Wiley & Sons, Inc. All rights reserved |
| DOI | 10.1002/9781119332015.ch17 |
| DatabaseTitleList | |
| DeliveryMethod | fulltext_linktorsrc |
| Discipline | Computer Science |
| EISBN | 9781119332015 111933201X |
| Editor | Xhafa, Fatos Pllana, Sabri |
| Editor_xml | – sequence: 1 givenname: Sabri surname: Pllana fullname: Pllana, Sabri – sequence: 2 givenname: Fatos surname: Xhafa fullname: Xhafa, Fatos |
| EndPage | 362 |
| ExternalDocumentID | 10.1002/9781119332015.ch17 |
| Genre | chapter |
| GroupedDBID | 38. 3XM AABBV ABARN ABQPQ ABQPW ACHMX ADVEM AERYV AFOJC AFPKT AJFER ALMA_UNASSIGNED_HOLDINGS ASGYQ AZZ BBABE BIBOL CZZ DFSMB DMGWJ DPMII GEOUK IPJKO JFSCD LQKAK LWYJN LYPXV MUFYN PQQKQ W1A YPLAZ ZEEST |
| ID | FETCH-LOGICAL-s153f-d00457125fa70b65ef0bb74bc1b568928976db831c03e1ec4d21ab09d0b1d71f3 |
| ISBN | 0470936908 9780470936900 |
| IngestDate | Thu Jun 02 19:28:32 EDT 2022 Wed Nov 27 04:53:37 EST 2019 |
| IsPeerReviewed | false |
| IsScholarly | false |
| Language | English |
| LinkModel | OpenURL |
| MergedId | FETCHMERGED-LOGICAL-s153f-d00457125fa70b65ef0bb74bc1b568928976db831c03e1ec4d21ab09d0b1d71f3 |
| PageCount | 20 |
| ParticipantIDs | wiley_ebooks_10_1002_9781119332015_ch17_ch17 |
| PublicationCentury | 2000 |
| PublicationDate | 2017-01-24 |
| PublicationDateYYYYMMDD | 2017-01-24 |
| PublicationDate_xml | – month: 01 year: 2017 text: 2017-01-24 day: 24 |
| PublicationDecade | 2010 |
| PublicationPlace | Hoboken, NJ, USA |
| PublicationPlace_xml | – name: Hoboken, NJ, USA |
| PublicationTitle | Programming multi‐core and many‐core computing systems |
| PublicationYear | 2017 |
| Publisher | John Wiley & Sons, Inc |
| Publisher_xml | – name: John Wiley & Sons, Inc |
| SSID | ssib027811140 ssj0001756349 ssib043667605 |
| Score | 1.5021787 |
| Snippet | This chapter covers the most important aspects of shared‐memory parallel programming that impact performance. It gives guidance for diagnosing such issues in... |
| SourceID | wiley |
| SourceType | Enrichment Source Publisher |
| StartPage | 343 |
| SubjectTerms | cache coherence nonuniform memory access overlapping latency parallel performance optimization parallel performance tuning shared‐memory parallel programming |
| Title | Parallel Performance Evaluation and Optimization |
| URI | https://onlinelibrary.wiley.com/doi/abs/10.1002/9781119332015.ch17 |
| hasFullText | 1 |
| inHoldings | 1 |
| isFullTextHit | |
| isPrint | |
| link | http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwnV1LSwMxEA5aL-LBN9YXexC8uDVpsq-zVIrg49BCb2WTbLBga2nrpb_eyWNfbRHqJSxh2Gz2y04mszPfIHQnGUm4EokfZKH0meChH1MV-CwBxJWCHVPqBOfXt7DbZy-DYJAX2nbZJQveEsuNeSX_QRX6AFedJbsFssVNoQOuAV9oAWFoV4zfupvVkl7YyKqxPuubsMAibkEzU5qfAmP40uu9whRxMD6EClW5oWhMlS1gnS5H4-pC-khnut7Klw6WL3IMOgVJuBnoHRTP2GV0Vv0IRAec-u3Sj7ghUKcejWlPnJhFWNcAxLii9ahlWnIbKLXqdU03W65XzbFFwGqk8AxBS3zazM0Vzuu_xHfRbhSDRtuDfbtTeGbaRq7krWFUB-86XWb8bVEQUpYYUgI3hTgnYMqn5JKqYOTH9XHrpxdjfvSO0IFOSfF0rgg8_zHaySYn6DAvyOE5_XyKcI6VV8HKK7HyACuvitUZ6j93ek9d35XE8OewNSlfahM8AqNUpRHmYZApzHnEuCA8COMETs9RKHlMicA0I5lgsk1SjhOJOZERUfQcNSbfk-wCeYrFKuVJLIME8AsJJ5jyVBElwYgWsWyiBzPfoflrPx9afuv2sPZmhvrNmKaJ7mvidbHlaGpFp1JdbnfjK7RfLtdr1FjMfrIbMAQX_NatgV-m-01z |
| linkProvider | ProQuest Ebooks |
| openUrl | ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Abook&rft.genre=bookitem&rft.title=Programming+multi%E2%80%90core+and+many%E2%80%90core+computing+systems&rft.au=Shafi%2C+Hazim&rft.atitle=Parallel+Performance+Evaluation+and+Optimization&rft.date=2017-01-24&rft.pub=John+Wiley+%26+Sons%2C+Inc&rft.isbn=9780470936900&rft.spage=343&rft.epage=362&rft_id=info:doi/10.1002%2F9781119332015.ch17&rft.externalDocID=10.1002%2F9781119332015.ch17 |
| thumbnail_l | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=9780470936900/lc.gif&client=summon&freeimage=true |
| thumbnail_m | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=9780470936900/mc.gif&client=summon&freeimage=true |
| thumbnail_s | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=9780470936900/sc.gif&client=summon&freeimage=true |