A Time Optimal Parallel Algorithm for the Dynamic Programming on the Hierarchical Memory Machine

The Hierarchical Memory Machine (HMM) is a theoretical parallel computing model that captures the essence of architecture of CUDA-enabled GPUs. The main contribution of this paper is to present an efficient implementation of the O (n3) time dynamic programming algorithm for solving the optimal trian...

Full description

Saved in:
Bibliographic Details
Published in2014 Second International Symposium on Computing and Networking pp. 86 - 95
Main Author Nakano, Koji
Format Conference Proceeding
LanguageEnglish
Japanese
Published IEEE 01.12.2014
Subjects
Online AccessGet full text
ISSN2379-1888
DOI10.1109/CANDAR.2014.14

Cover

Abstract The Hierarchical Memory Machine (HMM) is a theoretical parallel computing model that captures the essence of architecture of CUDA-enabled GPUs. The main contribution of this paper is to present an efficient implementation of the O (n3) time dynamic programming algorithm for solving the optimal triangulation problem for a convex n-gon in the HMM. Although the HMM can run a lot of threads in parallel, it is very hard to accelerate computation involving complicated memory access such as the dynamic programming for the optimal triangulation problem. It is often the case that the acceleration rate is limited to the bandwidth w of the global memory for problems involving complicated stride memory access. Quite surprisingly, our implementation of the dynamic programming algorithm for solving the optimal triangulation problem runs O (n 3 /w 2 ) time units using max (wL, w 2 l) threads on the HMM with bandwidth w, global memory latency L and shared memory latency l. Hence, this parallel algorithm achieves the acceleration rate of more than w although the dynamic programming algorithm involves complicated stride memory access. Also, we prove that this parallel algorithm is time optimal when L = O (wl).
AbstractList The Hierarchical Memory Machine (HMM) is a theoretical parallel computing model that captures the essence of architecture of CUDA-enabled GPUs. The main contribution of this paper is to present an efficient implementation of the O (n3) time dynamic programming algorithm for solving the optimal triangulation problem for a convex n-gon in the HMM. Although the HMM can run a lot of threads in parallel, it is very hard to accelerate computation involving complicated memory access such as the dynamic programming for the optimal triangulation problem. It is often the case that the acceleration rate is limited to the bandwidth w of the global memory for problems involving complicated stride memory access. Quite surprisingly, our implementation of the dynamic programming algorithm for solving the optimal triangulation problem runs O (n 3 /w 2 ) time units using max (wL, w 2 l) threads on the HMM with bandwidth w, global memory latency L and shared memory latency l. Hence, this parallel algorithm achieves the acceleration rate of more than w although the dynamic programming algorithm involves complicated stride memory access. Also, we prove that this parallel algorithm is time optimal when L = O (wl).
Author Nakano, Koji
Author_xml – sequence: 1
  givenname: Koji
  surname: Nakano
  fullname: Nakano, Koji
  organization: Dept. of Inf. Eng., Hiroshima Univ., Higashi-Hiroshima, Japan
BookMark eNotj01Lw0AURUeoYFu7deNm_kDifGbylqFVK7S2SF3X6eQlHckkZZJN_71BXVwulwMH7oxM2q5FQh44Szln8LQs3lfFRyoYVylXN2TGlQFQXAsxIVMhDSQ8z_M7suj7b8aYFEyxTE7JV0EPPiDdXQYfbEP3NtqmwYYWTd1FP5wDrbpIhzPS1bW1wTu6j10dbQi-rWnX_qK1x2ijO3s3KrYYunilWzvuFu_JbWWbHhf_PSefL8-H5TrZ7F7flsUm8cKwIVHSApeZGFMBnByUugKmhNGVc6UpuUUHyMqTQZGjRl2dBCoptIUcbA5yTh7_vB4Rj5c4vonXo2Fa8MzIHzkZVfs
CODEN IEEPAD
ContentType Conference Proceeding
DBID 6IE
6IL
CBEJK
RIE
RIL
DOI 10.1109/CANDAR.2014.14
DatabaseName IEEE Electronic Library (IEL) Conference Proceedings
IEEE Proceedings Order Plan All Online (POP All Online) 1998-present by volume
IEEE Xplore All Conference Proceedings
IEEE Electronic Library (IEL)
IEEE Proceedings Order Plans (POP All) 1998-Present
DatabaseTitleList
Database_xml – sequence: 1
  dbid: RIE
  name: IEEE Xplore
  url: https://proxy.k.utb.cz/login?url=https://ieeexplore.ieee.org/
  sourceTypes: Publisher
DeliveryMethod fulltext_linktorsrc
Discipline Computer Science
EISBN 1479941522
9781479941520
EndPage 95
ExternalDocumentID 7052167
Genre orig-research
GroupedDBID 6IE
6IF
6IL
6IN
AAWTH
ABLEC
ADZIZ
ALMA_UNASSIGNED_HOLDINGS
BEFXN
BFFAM
BGNUA
BKEBE
BPEOZ
CBEJK
CHZPO
IEGSK
OCL
RIE
RIL
ID FETCH-LOGICAL-i270t-43a91362136f99bc9d5f904275fccd7d1aec9e0db7e28e5e5fb2e4325a989a893
IEDL.DBID RIE
ISSN 2379-1888
IngestDate Wed Aug 27 02:03:07 EDT 2025
IsPeerReviewed false
IsScholarly false
Language English
Japanese
LinkModel DirectLink
MergedId FETCHMERGED-LOGICAL-i270t-43a91362136f99bc9d5f904275fccd7d1aec9e0db7e28e5e5fb2e4325a989a893
PageCount 10
ParticipantIDs ieee_primary_7052167
PublicationCentury 2000
PublicationDate 2014-12-01
PublicationDateYYYYMMDD 2014-12-01
PublicationDate_xml – month: 12
  year: 2014
  text: 2014-12-01
  day: 01
PublicationDecade 2010
PublicationTitle 2014 Second International Symposium on Computing and Networking
PublicationTitleAbbrev candar
PublicationYear 2014
Publisher IEEE
Publisher_xml – name: IEEE
SSID ssj0003204063
ssj0001967840
Score 1.5854005
Snippet The Hierarchical Memory Machine (HMM) is a theoretical parallel computing model that captures the essence of architecture of CUDA-enabled GPUs. The main...
SourceID ieee
SourceType Publisher
StartPage 86
SubjectTerms Bandwidth
CUDA
Dynamic programming
GPU
Graphics processing units
Heuristic algorithms
Hidden Markov models
Instruction sets
memory machine models
Optimized production technology
parallel algorithms
Title A Time Optimal Parallel Algorithm for the Dynamic Programming on the Hierarchical Memory Machine
URI https://ieeexplore.ieee.org/document/7052167
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
link http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV3LS8MwHA7bTp7mY-KbHDzark0faY5lcwyhc4iD3WaSpjrsWhndQf9680u7TcSDh0IfEEJe36_J9_0-hG7DVPle6LoWzaLM8kmq55zGeUuji0eocB3pgMA5mYTjmf8wD-YtdLfTwiilDPlM2XBrzvLTUm5gq6xPQWka0jZq0yistVr7_RSml90mbwk8e0QPT2OkRjzKLFf_6TU5G12H9QfxZBg_AbPLt0HA88NZxQDLqIuSbZVqPsm7vamELb9-ZWv8b50PUW8v4cPTHTgdoZYqjlF36-GAmyl9gl5iDCoQ_KiXjhXP8ZSvwV0lx3H-Wq6X1dsK67AW6zARD2vzeigWKF0rXS4uC_NpvAQds7FVyXEC5N1PnBiapuqh2ej-eTC2GtcFa0moU1m-x5mrYU1fGWNCsjTIGDhyBJmUKU1driRTDqRlJpEKVJAJovubBJxFjOvw5xR1irJQZwgLTkMREiVTIvyAZdxL3cjhkS8ElTyi5-gEGmzxUSfWWDRtdfH360t0AP1Vc0muUKdab9S1jggqcWOGwjevxrL9
linkProvider IEEE
linkToHtml http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV1NT8IwGG4QD3pCBeO3PXh0uHXtuh4JSFAZEoOJN2y7TokDDBkH_fX23YYY48HDkn0kTdOv5137PO-D0EUQG-oHnufwJEwcSmI75yzOOxZdfMKV52oXBM7RIOg90tsn9lRBl99aGGNMTj4zTbjNz_LjuV7CVtkVB6VpwDfQJqOUskKttd5REXbhLTOXwLNP7ADNrdSIz4Xj2X-9Mmuj54qrdmvQaT0At4s2QcLzw1slh5ZuDUWrShWMkrfmMlNN_fkrX-N_a72DGmsRHx5-w9MuqpjZHqqtXBxwOanr6LmFQQeC7-3iMZUpHsoF-KukuJW-zBeT7HWKbWCLbaCIO4V9PRQLpK6pLRfPZ_mn3gSUzLmxSoojoO9-4CgnapoGeuxej9o9p_RdcCaEu5lDfSk8C2z2SoRQWsQsEeDJwRKtYx570mhhXEjMTELDDEsUsT1OmBShkDYA2kfV2XxmDhBWkgcqIEbHRFEmEunHXujKkCrFtQz5IapDg43fi9Qa47Ktjv5-fY62eqOoP-7fDO6O0Tb0XcEsOUHVbLE0pzY-yNRZPiy-AD6ktko
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Abook&rft.genre=proceeding&rft.title=2014+Second+International+Symposium+on+Computing+and+Networking&rft.atitle=A+Time+Optimal+Parallel+Algorithm+for+the+Dynamic+Programming+on+the+Hierarchical+Memory+Machine&rft.au=Nakano%2C+Koji&rft.date=2014-12-01&rft.pub=IEEE&rft.issn=2379-1888&rft.spage=86&rft.epage=95&rft_id=info:doi/10.1109%2FCANDAR.2014.14&rft.externalDocID=7052167
thumbnail_l http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=2379-1888&client=summon
thumbnail_m http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=2379-1888&client=summon
thumbnail_s http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=2379-1888&client=summon