Comparative Performance Analysis of Intel (R) Xeon Phi (TM), GPU, and CPU: A Case Study from Microscopy Image Analysis
We study and characterize the performance of operations in an important class of applications on GPUs and Many Integrated Core (MIC) architectures. Our work is motivated by applications that analyze low-dimensional spatial datasets captured by high resolution sensors, such as image datasets obtained...
Saved in:
| Published in | 2014 IEEE 28th International Parallel and Distributed Processing Symposium Vol. 2014; pp. 1063 - 1072 |
|---|---|
| Main Authors | , , , , |
| Format | Conference Proceeding Journal Article |
| Language | English |
| Published |
United States
IEEE
01.05.2014
|
| Subjects | |
| Online Access | Get full text |
| ISBN | 1479937991 9781479937998 |
| ISSN | 1530-2075 1045-9219 1558-2183 |
| DOI | 10.1109/IPDPS.2014.111 |
Cover
| Abstract | We study and characterize the performance of operations in an important class of applications on GPUs and Many Integrated Core (MIC) architectures. Our work is motivated by applications that analyze low-dimensional spatial datasets captured by high resolution sensors, such as image datasets obtained from whole slide tissue specimens using microscopy scanners. Common operations in these applications involve the detection and extraction of objects (object segmentation), the computation of features of each extracted object (feature computation), and characterization of objects based on these features (object classification). In this work, we have identify the data access and computation patterns of operations in the object segmentation and feature computation categories. We systematically implement and evaluate the performance of these operations on modern CPUs, GPUs, and MIC systems for a microscopy image analysis application. Our results show that the performance on a MIC of operations that perform regular data access is comparable or sometimes better than that on a GPU. On the other hand, GPUs are significantly more efficient than MICs for operations that access data irregularly. This is a result of the low performance of MICs when it comes to random data access. We also have examined the coordinated use of MICs and CPUs. Our experiments show that using a performance aware task strategy for scheduling application operations improves performance about 1.29× over a first-come-first-served strategy. This allows applications to obtain high performance efficiency on CPU-MIC systems - the example application attained an efficiency of 84% on 192 nodes (3072 CPU cores and 192 MICs). |
|---|---|
| AbstractList | We study and characterize the performance of operations in an important class of applications on GPUs and Many Integrated Core (MIC) architectures. Our work is motivated by applications that analyze low-dimensional spatial datasets captured by high resolution sensors, such as image datasets obtained from whole slide tissue specimens using microscopy scanners. Common operations in these applications involve the detection and extraction of objects (object segmentation), the computation of features of each extracted object (feature computation), and characterization of objects based on these features (object classification). In this work, we have identify the data access and computation patterns of operations in the object segmentation and feature computation categories. We systematically implement and evaluate the performance of these operations on modern CPUs, GPUs, and MIC systems for a microscopy image analysis application. Our results show that the performance on a MIC of operations that perform regular data access is comparable or sometimes better than that on a GPU. On the other hand, GPUs are significantly more efficient than MICs for operations that access data irregularly. This is a result of the low performance of MICs when it comes to random data access. We also have examined the coordinated use of MICs and CPUs. Our experiments show that using a performance aware task strategy for scheduling application operations improves performance about 1.29× over a first-come-first-served strategy. This allows applications to obtain high performance efficiency on CPU-MIC systems - the example application attained an efficiency of 84% on 192 nodes (3072 CPU cores and 192 MICs). We study and characterize the performance of operations in an important class of applications on GPUs and Many Integrated Core (MIC) architectures. Our work is motivated by applications that analyze low-dimensional spatial datasets captured by high resolution sensors, such as image datasets obtained from whole slide tissue specimens using microscopy scanners. Common operations in these applications involve the detection and extraction of objects (object segmentation), the computation of features of each extracted object (feature computation), and characterization of objects based on these features (object classification). In this work, we have identify the data access and computation patterns of operations in the object segmentation and feature computation categories. We systematically implement and evaluate the performance of these operations on modern CPUs, GPUs, and MIC systems for a microscopy image analysis application. Our results show that the performance on a MIC of operations that perform regular data access is comparable or sometimes better than that on a GPU. On the other hand, GPUs are significantly more efficient than MICs for operations that access data irregularly. This is a result of the low performance of MICs when it comes to random data access. We also have examined the coordinated use of MICs and CPUs. Our experiments show that using a performance aware task strategy for scheduling application operations improves performance about 1.29× over a first-come-first-served strategy. This allows applications to obtain high performance efficiency on CPU-MIC systems - the example application attained an efficiency of 84% on 192 nodes (3072 CPU cores and 192 MICs).We study and characterize the performance of operations in an important class of applications on GPUs and Many Integrated Core (MIC) architectures. Our work is motivated by applications that analyze low-dimensional spatial datasets captured by high resolution sensors, such as image datasets obtained from whole slide tissue specimens using microscopy scanners. Common operations in these applications involve the detection and extraction of objects (object segmentation), the computation of features of each extracted object (feature computation), and characterization of objects based on these features (object classification). In this work, we have identify the data access and computation patterns of operations in the object segmentation and feature computation categories. We systematically implement and evaluate the performance of these operations on modern CPUs, GPUs, and MIC systems for a microscopy image analysis application. Our results show that the performance on a MIC of operations that perform regular data access is comparable or sometimes better than that on a GPU. On the other hand, GPUs are significantly more efficient than MICs for operations that access data irregularly. This is a result of the low performance of MICs when it comes to random data access. We also have examined the coordinated use of MICs and CPUs. Our experiments show that using a performance aware task strategy for scheduling application operations improves performance about 1.29× over a first-come-first-served strategy. This allows applications to obtain high performance efficiency on CPU-MIC systems - the example application attained an efficiency of 84% on 192 nodes (3072 CPU cores and 192 MICs). |
| Author | Saltz, Joel Jun Kong Cooper, Lee Kurc, Tahsin Teodoro, George |
| AuthorAffiliation | 3 Scientific Data Group, Oak Ridge National Laboratory, Oak Ridge, TN, USA 1 Department of Computer Science, University of Brasília, Brasília, DF, Brazil 4 Department of Biomedical Informatics, Emory University, Atlanta, GA, USA 2 Department of Biomedical Informatics, Stony Brook University, Stony Brook, NY, USA |
| AuthorAffiliation_xml | – name: 2 Department of Biomedical Informatics, Stony Brook University, Stony Brook, NY, USA – name: 1 Department of Computer Science, University of Brasília, Brasília, DF, Brazil – name: 3 Scientific Data Group, Oak Ridge National Laboratory, Oak Ridge, TN, USA – name: 4 Department of Biomedical Informatics, Emory University, Atlanta, GA, USA |
| Author_xml | – sequence: 1 givenname: George surname: Teodoro fullname: Teodoro, George email: teodoro@unb.br organization: Dept. of Comput. Sci., Univ. of Brasilia, Brasilia, Brazil – sequence: 2 givenname: Tahsin surname: Kurc fullname: Kurc, Tahsin email: tkurc@emory.edu organization: Dept. of Biomed. Inf., Stony Brook Univ., Stony Brook, NY, USA – sequence: 3 surname: Jun Kong fullname: Jun Kong email: jun.kong@emory.edu organization: Dept. of Biomed. Inf., Emory Univ., Atlanta, GA, USA – sequence: 4 givenname: Lee surname: Cooper fullname: Cooper, Lee email: lee.cooper@emory.edu organization: Dept. of Biomed. Inf., Emory Univ., Atlanta, GA, USA – sequence: 5 givenname: Joel surname: Saltz fullname: Saltz, Joel email: jhsaltz@emory.edu organization: Dept. of Biomed. Inf., Stony Brook Univ., Stony Brook, NY, USA |
| BackLink | https://www.ncbi.nlm.nih.gov/pubmed/25419088$$D View this record in MEDLINE/PubMed |
| BookMark | eNpVUVtr1EAUHrViu7WvvghlHrfQtHPLXHwQlqh1ocVgu-DbMEkm7UgyEzPJSv69Wba2-nA4HL4bfGcBDnzwFoB3GF1gjNTlOv-U314QhNl84xfgRAmJmVCKSoTwS3CE01QmBEv6Ciz2wDz4YAdQlBAk0kOwiPEnQgRRpt6AQ5IyrJCUR2CbhbYzvRnc1sLc9nXoW-NLC1feNFN0EYYarv1gG7j8fgZ_2OBh_uDg8u7m7Bxe5ZtzaHwFs3zzAa5gZqKFt8NYTbDuQwtvXNmHWIZuguvW3D-7vgWva9NEe_K4j8Hmy-e77Gty_e1qna2uE0cVGRJOkJK8ptbUBheiYoZQLmzNCakwTytUcc5ZWTDBSGGQZISLolZWFaokdWrpMbjc-46-M9Nv0zS6611r-kljpHf1atdVXdS7eucbz4qPe0U3Fq2tSuuH3jyrgnH6f8S7B30ftpoRNhfMZ4Plo0Effo02Drp1sbRNY7wNY9RYEs5RSgSZqaf_Zj2F_H3PTHi_Jzhr7RPMpRCUpvQPqhqfPQ |
| CODEN | IEEPAD |
| ContentType | Conference Proceeding Journal Article |
| DBID | 6IE 6IL CBEJK RIE RIL NPM 7X8 5PM ADTOC UNPAY |
| DOI | 10.1109/IPDPS.2014.111 |
| DatabaseName | IEEE Electronic Library (IEL) Conference Proceedings IEEE Xplore POP ALL IEEE Xplore All Conference Proceedings IEEE Electronic Library (IEL) IEEE Proceedings Order Plans (POP All) 1998-Present PubMed MEDLINE - Academic PubMed Central (Full Participant titles) Unpaywall for CDI: Periodical Content Unpaywall |
| DatabaseTitle | PubMed MEDLINE - Academic |
| DatabaseTitleList | MEDLINE - Academic PubMed |
| Database_xml | – sequence: 1 dbid: NPM name: PubMed url: https://proxy.k.utb.cz/login?url=http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=PubMed sourceTypes: Index Database – sequence: 2 dbid: RIE name: IEEE Electronic Library (IEL) url: https://proxy.k.utb.cz/login?url=https://ieeexplore.ieee.org/ sourceTypes: Publisher – sequence: 3 dbid: UNPAY name: Unpaywall url: https://proxy.k.utb.cz/login?url=https://unpaywall.org/ sourceTypes: Open Access Repository |
| DeliveryMethod | fulltext_linktorsrc |
| Discipline | Computer Science Engineering |
| EISBN | 9781479938001 1479938009 |
| EISSN | 1558-2183 |
| EndPage | 1072 |
| ExternalDocumentID | oai:pubmedcentral.nih.gov:4240026 PMC4240026 25419088 6877335 |
| Genre | orig-research Journal Article |
| GrantInformation_xml | – fundername: NIBIB NIH HHS grantid: P20 EB000591 – fundername: NLM NIH HHS grantid: R01 LM011119 – fundername: NLM NIH HHS grantid: R01 LM009239 – fundername: NCI NIH HHS grantid: U54 CA113001 – fundername: NHLBI NIH HHS grantid: R24 HL085343 – fundername: NIMHD NIH HHS grantid: RC4 MD005964 |
| GroupedDBID | 29O 6IE 6IF 6IH 6IK 6IL 6IN AAJGR AAWTH ABLEC ADZIZ ALMA_UNASSIGNED_HOLDINGS BEFXN BFFAM BGNUA BKEBE BPEOZ CBEJK CHZPO IEGSK IPLJI OCL RIE RIL --Z -~X .DC 0R~ 29I 4.4 5GY 5VS 97E AASAJ AAYOK ABAZT ABFSI ABQJQ ACGFO ACIWK AENEX AETIX AGSQL AHBIQ AI. AIBXA AKJIK ALLEH ASUFR ATWAV CS3 DU5 E.L EBS EJD HZ~ H~9 ICLAB IEDLZ IFIPE IFJZH JAVBF LAI M43 MS~ NPM O9- P2P PQQKQ RIA RIG RNI RNS RZB TN5 TWZ UHB VH1 7X8 ABVLG AGQYO AKQYR 5PM ADTOC UNPAY |
| ID | FETCH-LOGICAL-i392t-620986f3eafa1b7d4a2367ef622d165d0d6664cb4742ba084267bf9e9b9c2f5e3 |
| IEDL.DBID | UNPAY |
| ISBN | 1479937991 9781479937998 |
| ISSN | 1530-2075 1045-9219 |
| IngestDate | Sun Oct 26 04:00:19 EDT 2025 Tue Sep 30 16:39:07 EDT 2025 Sun Sep 28 01:28:22 EDT 2025 Thu Apr 03 07:06:11 EDT 2025 Wed Aug 27 04:20:15 EDT 2025 |
| IsDoiOpenAccess | false |
| IsOpenAccess | true |
| IsPeerReviewed | true |
| IsScholarly | true |
| Language | English |
| LinkModel | DirectLink |
| MergedId | FETCHMERGED-LOGICAL-i392t-620986f3eafa1b7d4a2367ef622d165d0d6664cb4742ba084267bf9e9b9c2f5e3 |
| Notes | ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 23 teodoro@unb.br, tkurc@emory.edu, lee.cooper@emory.edu, jun.kong@emory.edu, jhsaltz@emory.edu |
| OpenAccessLink | https://proxy.k.utb.cz/login?url=http://doi.org/10.1109/IPDPS.2014.111 |
| PMID | 25419088 |
| PQID | 1826605272 |
| PQPubID | 23479 |
| PageCount | 10 |
| ParticipantIDs | unpaywall_primary_10_1109_ipdps_2014_111 proquest_miscellaneous_1826605272 ieee_primary_6877335 pubmedcentral_primary_oai_pubmedcentral_nih_gov_4240026 pubmed_primary_25419088 |
| PublicationCentury | 2000 |
| PublicationDate | 20140501 |
| PublicationDateYYYYMMDD | 2014-05-01 |
| PublicationDate_xml | – month: 5 year: 2014 text: 20140501 day: 1 |
| PublicationDecade | 2010 |
| PublicationPlace | United States |
| PublicationPlace_xml | – name: United States |
| PublicationTitle | 2014 IEEE 28th International Parallel and Distributed Processing Symposium |
| PublicationTitleAbbrev | IPDPS |
| PublicationTitleAlternate | IEEE Trans Parallel Distrib Syst |
| PublicationYear | 2014 |
| Publisher | IEEE |
| Publisher_xml | – name: IEEE |
| SSID | ssj0020349 ssib026764574 ssj0014504 |
| Score | 2.3102264 |
| Snippet | We study and characterize the performance of operations in an important class of applications on GPUs and Many Integrated Core (MIC) architectures. Our work is... |
| SourceID | unpaywall pubmedcentral proquest pubmed ieee |
| SourceType | Open Access Repository Aggregation Database Index Database Publisher |
| StartPage | 1063 |
| SubjectTerms | Graphics processing units Image analysis Image segmentation Instruction sets Microscopy Microwave integrated circuits Vegetation |
| SummonAdditionalLinks | – databaseName: IEEE Electronic Library (IEL) dbid: RIE link: http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV1Nb9NAEB21vcCpQAuYFjRIHFIpTm1nvWv3hgKlRQqyoJFys3a9azVqsSMatwq_vjv-CoUeuNmy1_J-zc6z33sL8CHKPO0HOflp89xlWoduLLR0jc4yoXgcMkni5Ok3fjZjX-fhfAuGvRbGGFOTz8yIDut_-brMKvpUdswjIcbjcBu2RcQbrVY3dgIuOAsplWnBFvmuNF6pnh0JIqxFXYJWY5sRdV5P7XnUujn6Xnx8nnxKfhDli1E0afdceSz9_JdF-aQqlnJ9J6-v_1iiTndh2lWuYaZcjaqVGmW___J9_N_aP4P9jRgQk36Zew5bpngBu91uENgGhz24nWyMxDHZ6BGw8z3BMsdatYKD70c4N2WByeUCBxfToyF-SWZDlIXGSTI7wY84sasrEslxjaSAwSkRB0lCs8bznzYG9k_dh9np54vJmdvu6uAubC62cnngxRHPx0bm0ldCM0kmcibnQaB9HmpPW0TFMsUsaFfSi2wKIVQem1jFWZCHZvwSdoqyMK8BbYSMYiOVhag5s7mGssHG45ozW4BFnnRgj5oxXTbGHWnbgg6873oztZOJ_pDIwpTVTUpgy-K7QAQOvGp6ty9skbRPpDAHxIN-728go-6HV4rFZW3YzYioG3AHBv0I6UvVIMyL08VSL29SGncEyt48_uYH8JTuaAiXh7Cz-lWZtzYpWql39Wy4B5A-BDE priority: 102 providerName: IEEE |
| Title | Comparative Performance Analysis of Intel (R) Xeon Phi (TM), GPU, and CPU: A Case Study from Microscopy Image Analysis |
| URI | https://ieeexplore.ieee.org/document/6877335 https://www.ncbi.nlm.nih.gov/pubmed/25419088 https://www.proquest.com/docview/1826605272 https://pubmed.ncbi.nlm.nih.gov/PMC4240026 http://doi.org/10.1109/IPDPS.2014.111 |
| UnpaywallVersion | submittedVersion |
| Volume | 2014 |
| hasFullText | 1 |
| inHoldings | 1 |
| isFullTextHit | |
| isPrint | |
| link | http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV1RT9swED5t5WF7GhOwhTF0k3goEmFJ6jgxb6jQwaSiaBCpe4rs2BYVkFaj3dT9enxpm4KYJt4i5RxFuovvPuX7PgPspWWgw8iSnza3PtM69kWipW90WSaKi5hJEif3L_hZzr4P4sGKRPP4930YiK_n2Ul2SQQsRt_2a1jjsRu5W7CWX2THPxcmjBQ4HOvxfRO4OCrlX1Pjc_Ljm2k1lrM_8vb2UWfpvYPeUp8zJ5TcHE4n6rD8-9yu8f8vvQ6bK_EeZk1beg-vTLUBv7srl2_MVmIBXJqS4MhiLSnB9o99HJhRhdn1ENtX_f0D_JblBygrjd0sP8Jj7LrWh8RAnCHJU7BPrD7St8zw_M5tUM1TNyHvnV51z_zFkQv-0A1KE59HgUi57RhpZagSzSQ5vBnLo0iHPNaBdnCHlYo5RK1kkLr-nigrjFCijGxsOlvQqkaV-Qjotq9UGKkcfrTMDQLK7QQB15y5BSwNpAcblKNiPHfVKHiaJJ1O7MGXZc4KV-n0-0JWZjS9LwgJOfAVJZEHH-Y5bBY7mBsSY8uD5El2mwBy0X56pxpe127ajFi0Efeg3dRBs6pGSIEo6uoqKKOEmLZfHvoJ3tLlnCG5A63Jr6n57KaYidqtpYa7i0p-AFMY8A4 |
| linkProvider | Unpaywall |
| linkToHtml | http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV3LbtNAFL0qZVFWBVqoy2uQWKRSnNrOPGx2KFASaCoLEik7a8YzViOKHbUxKHw9c_0KhS7Y2bLH8rzu3GOfcwbgTZh62g8y9NPmmUu1Zm4ktHSNTlOheMSoRHHy9IKP5_TTgi12oN9pYYwxFfnMDPCw-pevi7TET2WnPBRiOGT34D6jlLJardWOnoALThkmMw3cQueV2i3Vs2NBsErWJXA9tjlR6_bUnIeNn6PvRaeT-H38FUlfFONJs-vKXQnovzzKvTJfyc1PeXX1xyJ1tg_Ttno1N-XboFyrQfrrL-fH_63_QzjcygFJ3C10j2DH5I9hv90PgjTh4QB-jLZW4iTeKhJI63xCioxUuhXS-3JCFqbISXy5JL3Z9KRPPsbzPpG5JqN4_pa8IyO7vhKkOW4IamDIFKmDKKLZkMl3GwW7px7C_OzDbDR2m30d3KXNxtYuD7wo5NnQyEz6Smgq0UbOZDwItM-Z9rTFVDRV1MJ2Jb3QJhFCZZGJVJQGGTPDJ7CbF7k5AmJjZBgZqSxIzajNNpQNNx7XnNoCNPSkAwfYjMmqtu5ImhZ04HXbm4mdTviPROamKG8ShFsW4QUicOBp3btdYYulfaSFOSBu9Xt3A1p1376SLy8ry26KVN2AO9DrRkhXqoJhXpQsV3p1k-C4Q1h2fPebv4K98Wx6npxPLj4_gwd4d02_fA676-vSvLAp0lq9rGbGb1QJB34 |
| linkToUnpaywall | http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV1Na9tAEB1a-9CempK0UZOUKfTgQJRK8mqlzS24deKCg2gjcE9iV7uLTVPZJHaC--u74w85ISH0JtCsEMxoZx567y3A57QMdBhZ8tPm1mdax75ItPSNLstEcREzSeLk_gU_z9n3QTzYkGju_74PA_Gll33NfhIBi9G3_RKaPHYjdwOa-UV2-mtlwkiBo4me3NSBq6NSnpoaH5MfX82qiZzfyaure52l-wa6a33OklDy-3g2Vcfl38d2jc-_9BbsbMR7mNVt6S28MNU23HY2Lt-YbcQCuDYlwbHFhaQEWz8OcWDGFWbDEbYu-4dHeJblRygrjZ0sP8FT7LjWh8RAnCPJU7BPrD7St8yx98dtUPVTdyDvfrvsnPurIxf8kRuUpj6PApFy2zbSylAlmklyeDOWR5EOeawD7eAOKxVziFrJIHX9PVFWGKFEGdnYtN9BoxpXZhfQbV-pMFI5_GiZGwSU2wkCrjlzC1gaSA-2KUfFZOmqUfA0Sdrt2INP65wVrtLp94WszHh2UxAScuArSiIP3i9zWC92MDckxpYHyYPs1gHkov3wTjUaLty0GbFoI-5Bq66DetUCIQWiWFRXQRklxPTh_0P34DVdLhmS-9CYXs_MgZtipurjqob_Abf77w0 |
| openUrl | ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Abook&rft.genre=proceeding&rft.title=Proceedings+-+IEEE+International+Parallel+and+Distributed+Processing+Symposium&rft.atitle=Comparative+Performance+Analysis+of+Intel+%28R%29+Xeon+Phi+%28TM%29%2C+GPU%2C+and+CPU%3A+A+Case+Study+from+Microscopy+Image+Analysis&rft.au=Teodoro%2C+George&rft.au=Kurc%2C+Tahsin&rft.au=Jun+Kong&rft.au=Cooper%2C+Lee&rft.date=2014-05-01&rft.pub=IEEE&rft.isbn=1479937991&rft.issn=1530-2075&rft.spage=1063&rft.epage=1072&rft_id=info:doi/10.1109%2FIPDPS.2014.111&rft.externalDocID=6877335 |
| thumbnail_l | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=1530-2075&client=summon |
| thumbnail_m | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=1530-2075&client=summon |
| thumbnail_s | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=1530-2075&client=summon |