Data Flow Algorithms for Processors with Vector Extensions Handling Actors With Internal State

Full use of the parallel computation capabilities of present and expected CPUs and GPUs requires use of vector extensions. Yet many actors in data flow systems for digital signal processing have internal state (or, equivalently, an edge that loops from the actor back to itself) that impose serial de...

Full description

Saved in:
Bibliographic Details
Published inJournal of signal processing systems Vol. 87; no. 1; pp. 21 - 31
Main Authors Barford, Lee, Bhattacharyya, Shuvra S., Liu, Yanzhou
Format Journal Article
LanguageEnglish
Published New York Springer US 01.04.2017
Subjects
Online AccessGet full text
ISSN1939-8018
1939-8115
DOI10.1007/s11265-015-1045-x

Cover

Abstract Full use of the parallel computation capabilities of present and expected CPUs and GPUs requires use of vector extensions. Yet many actors in data flow systems for digital signal processing have internal state (or, equivalently, an edge that loops from the actor back to itself) that impose serial dependencies between actor invocations that make vectorizing across actor invocations impossible. Ideally, issues of inter-thread coordination required by serial data dependencies should be handled by code written by parallel programming experts that is separate from code specifying signal processing operations. The purpose of this paper is to present one approach for so doing in the case of actors that maintain state. We propose a methodology for using the parallel scan (also known as prefix sum) pattern to create algorithms for multiple simultaneous invocations of such an actor that results in vectorizable code. Two examples of applying this methodology are given: (1) infinite impulse response filters and (2) finite state machines. The correctness and performance of the resulting IIR filters and one class of FSMs are studied.
AbstractList Full use of the parallel computation capabilities of present and expected CPUs and GPUs requires use of vector extensions. Yet many actors in data flow systems for digital signal processing have internal state (or, equivalently, an edge that loops from the actor back to itself) that impose serial dependencies between actor invocations that make vectorizing across actor invocations impossible. Ideally, issues of inter-thread coordination required by serial data dependencies should be handled by code written by parallel programming experts that is separate from code specifying signal processing operations. The purpose of this paper is to present one approach for so doing in the case of actors that maintain state. We propose a methodology for using the parallel scan (also known as prefix sum) pattern to create algorithms for multiple simultaneous invocations of such an actor that results in vectorizable code. Two examples of applying this methodology are given: (1) infinite impulse response filters and (2) finite state machines. The correctness and performance of the resulting IIR filters and one class of FSMs are studied.
Author Barford, Lee
Liu, Yanzhou
Bhattacharyya, Shuvra S.
Author_xml – sequence: 1
  givenname: Lee
  surname: Barford
  fullname: Barford, Lee
  email: lee.barford@keysight.com
  organization: Keysight Laboratories, Keysight Technologies, Inc
– sequence: 2
  givenname: Shuvra S.
  surname: Bhattacharyya
  fullname: Bhattacharyya, Shuvra S.
  organization: University of Maryland, Tampere University of Technology
– sequence: 3
  givenname: Yanzhou
  surname: Liu
  fullname: Liu, Yanzhou
  organization: University of Maryland
BookMark eNp9j8FOwzAMhiM0JLbBA3DrCwTstm47btPYBtIkOADXKKTu2LQ1KC5aeXsyDa74YuuXP8vfSA1a37JS1wg3CFDeCmJakAYkjZCT7s_UECfZRFeINPibAasLNRLZAhRQEg7V3b3tbLLY-UMy3a192HQfe0kaH5Ln4B2L-CDJIabJG7suxvO-41Y2vpVLdd7YnfDVbx-r18X8ZfagV0_Lx9l0pR0S9doVaU1NmpeVLV1uCTPHhJZLwHryXlLhmC1BzUwVZw5sXmAdyRrrY9lsrPB01wUvErgxn2Gzt-HbIJijvDnJmyhvjvKmj0x6YiTutmsOZuu_Qhvf_Af6AQfdYAM
Cites_doi 10.1109/I2MTC.2012.6229207
10.1109/PACT.2011.68
10.1145/1866739.1866760
10.1109/31.76483
10.1016/j.parco.2011.09.001
10.1145/1375527.1375559
10.1109/32.99191
10.1109/MAHC.2010.28
10.1109/TC.1987.5009446
10.1109/IPDPSW.2013.207
10.1109/ISPDC.2012.17
10.1109/I2MTC.2014.6860775
10.1109/MC.1980.1653418
10.1109/IPDPSW.2013.141
10.1109/TIM.2010.2090055
10.1090/S0002-9947-1965-0188316-1
10.1145/1365490.1365500
10.1109/ARITH.2013.24
10.1145/258492.258518
10.1007/978-3-662-44199-2_13
10.1016/S0019-9958(78)90320-0
10.1145/322217.322232
10.1109/SC.2006.31
10.1007/BF02406474
10.1109/12.42122
10.1109/ASAP.1993.397152
ContentType Journal Article
Copyright The Author(s) 2015
Copyright_xml – notice: The Author(s) 2015
DBID C6C
AAYXX
CITATION
DOI 10.1007/s11265-015-1045-x
DatabaseName Springer Nature OA Free Journals
CrossRef
DatabaseTitle CrossRef
DatabaseTitleList
Database_xml – sequence: 1
  dbid: C6C
  name: Springer Nature Link
  url: http://www.springeropen.com/
  sourceTypes: Publisher
DeliveryMethod fulltext_linktorsrc
Discipline Engineering
EISSN 1939-8115
EndPage 31
ExternalDocumentID 10_1007_s11265_015_1045_x
GroupedDBID -5B
-5G
-BR
-EM
-Y2
-~C
.86
.VR
06D
0R~
0VY
1N0
203
29L
29~
2J2
2JN
2JY
2KG
2LR
2VQ
2~H
30V
4.4
406
408
409
40D
40E
5GY
5VS
67Z
6NX
8TC
8UJ
95-
95.
95~
96X
AAAVM
AABHQ
AACDK
AAHNG
AAIAL
AAJBT
AAJKR
AANZL
AARHV
AARTL
AASML
AATNV
AATVU
AAUYE
AAWCG
AAYIU
AAYQN
AAYTO
AAYZH
ABAKF
ABBBX
ABBXA
ABDZT
ABECU
ABFTV
ABHLI
ABHQN
ABJNI
ABJOX
ABKCH
ABKTR
ABMNI
ABMQK
ABNWP
ABQBU
ABSXP
ABTEG
ABTHY
ABTKH
ABTMW
ABULA
ABWNU
ABXPI
ACAOD
ACBXY
ACDTI
ACGFO
ACGFS
ACHSB
ACHXU
ACKNC
ACMDZ
ACMLO
ACOKC
ACOMO
ACPIV
ACREN
ACZOJ
ADHIR
ADINQ
ADKNI
ADKPE
ADRFC
ADTPH
ADURQ
ADYFF
ADYOE
ADZKW
AEBTG
AEFQL
AEGAL
AEGNC
AEJHL
AEJRE
AEKMD
AEMSY
AENEX
AEOHA
AEPYU
AESKC
AETLH
AEVLU
AEXYK
AFEXP
AFGCZ
AFLOW
AFQWF
AFWTZ
AFYQB
AFZKB
AGAYW
AGDGC
AGJBK
AGMZJ
AGQEE
AGQMX
AGRTI
AGWIL
AGWZB
AGYKE
AHAVH
AHBYD
AHKAY
AHSBF
AHYZX
AIAKS
AIGIU
AIIXL
AILAN
AITGF
AJBLW
AJRNO
AJZVZ
ALMA_UNASSIGNED_HOLDINGS
ALWAN
AMKLP
AMTXH
AMXSW
AMYLF
AOCGG
ARCEE
ARMRJ
ASPBG
AVWKF
AXYYD
AYJHY
AZFZN
B-.
BDATZ
BGNMA
BSONS
C6C
CAG
COF
CS3
CSCUP
DDRTE
DNIVK
DPUIP
DU5
EBLON
EBS
EIOEI
EJD
ESBYG
F5P
FEDTE
FERAY
FFXSO
FIGPU
FINBP
FNLPD
FRRFC
FSGXE
FWDCC
GGCAI
GGRSB
GJIRD
GNWQR
GQ6
GQ7
GQ8
H13
HF~
HG5
HG6
HLICF
HMJXF
HQYDN
HRMNR
HVGLF
HZ~
IJ-
IKXTQ
ITM
IWAJR
IXC
IZIGR
I~X
I~Z
J-C
J0Z
JBSCW
JCJTX
JZLTJ
KOV
LAK
LLZTM
M4Y
MA-
N9A
NPVJJ
NQJWS
NU0
O93
O9G
O9J
OAM
P9P
PF0
PT4
QOS
R89
R9I
ROL
RPX
RSV
S16
S1Z
S27
S3B
SAP
SCLPG
SDH
SEG
SHX
SISQX
SJYHP
SNE
SNPRN
SNX
SOHCF
SOJ
SPISZ
SRMVM
SSLCW
STPWE
SZN
T13
TSG
TSK
TSV
TUC
U2A
UG4
UOJIU
UTJUX
UZXMN
VC2
VFIZW
W23
W48
WK8
YLTOR
Z45
Z7R
Z7V
Z7X
Z7Z
Z83
Z88
Z8M
Z8N
Z8P
Z8T
Z8W
Z92
ZMTXR
~A9
AAPKM
AAYXX
ABBRH
ABDBE
ABFSG
ABQSL
ABRTQ
ACSTC
ADHKG
AEZWR
AFDZB
AFHIU
AFOHR
AHPBZ
AHWEU
AIXLP
ATHPR
AYFIA
CITATION
ID FETCH-LOGICAL-c155x-c62d5f2478a7c4a513ce51ae701d9b756ceea50dee58e3c0a461d155d1ddddda3
IEDL.DBID C6C
ISSN 1939-8018
IngestDate Wed Oct 01 01:08:43 EDT 2025
Fri Feb 21 02:35:08 EST 2025
IsDoiOpenAccess true
IsOpenAccess true
IsPeerReviewed true
IsScholarly true
Issue 1
Keywords Data flow computing
Vector processors
Parallel algorithms
Graphics processing units
Digital signal processing
Language English
LinkModel DirectLink
MergedId FETCHMERGED-LOGICAL-c155x-c62d5f2478a7c4a513ce51ae701d9b756ceea50dee58e3c0a461d155d1ddddda3
OpenAccessLink https://doi.org/10.1007/s11265-015-1045-x
PageCount 11
ParticipantIDs crossref_primary_10_1007_s11265_015_1045_x
springer_journals_10_1007_s11265_015_1045_x
ProviderPackageCode CITATION
AAYXX
PublicationCentury 2000
PublicationDate 20170400
2017-4-00
PublicationDateYYYYMMDD 2017-04-01
PublicationDate_xml – month: 4
  year: 2017
  text: 20170400
PublicationDecade 2010
PublicationPlace New York
PublicationPlace_xml – name: New York
PublicationSubtitle for Signal, Image, and Video Technology (formerly the Journal of VLSI Signal Processing Systems for Signal, Image, and Video Technology)
PublicationTitle Journal of signal processing systems
PublicationTitleAbbrev J Sign Process Syst
PublicationYear 2017
Publisher Springer US
Publisher_xml – name: Springer US
References Hwu, W.M.W. (2012). GPU Computing Gems Jade Edition. Morgan Kauffman.
Franchetti, F., Voronenko, Y., & Puschel, M. (2006). FFT program generation for shared memory: SMP and multicore. In SC 2006 Conference, Proceedings of the ACM/IEEE (pp. 51–51). IEEE .
DennisJBData flow supercomputersIEEE Computer19801311485610.1109/MC.1980.1653418
Vishkin, U. (1997). From algorithm parallelism to instruction-level parallelism: An encode-decode chain using prefix-sum. In Proceedings of the ninth annual ACM symposium on Parallel algorithms and architectures, (pp. 260–271).
Barford, L., & Keenan, K. (2014). Segmenting a signal based on a local property using multicore processors. In Proc. IEEE Intl. Instrumentation and Measurement Technology Conf., (pp. 397–401).
LeeEAMesserschmittDGStatic scheduling of synchronous data flow programs for digital signal processingIEEE Transactions Computers1987361243510.1109/TC.1987.5009446
Sengupta, S., Harris, M., Zhang, Y., & Owens, J.D. (2007). Scan primitives for GPU computing. In Graphics Hardware, vol. 2007, (pp. 97–106).
Maleki, S., Gao, Y., Garzaran, M.J., Wong, T., & Padua, D.A. (2011). An evaluation of vectorizing compilers. In 2011 International Conference on Parallel Architectures and Compilation Techniques (PACT), (pp. 372–382). IEEE.
BarfordLSpeeding localization of pulsed signal transitions using multicore processorsIEEE Transactions Instrumentation and Measurement20116051588159310.1109/TIM.2010.2090055
Zumbusch, G. (2012). Tuning a finite difference computation for parallel vector processors. In 2012 11th International Symposium on Parallel and Distributed Computing (ISPDC), (pp. 63–70). IEEE.
LinHMesserschmittDGFinite state machine has unlimited concurrencyIEEE Transactions on Circuits and Systems199138546547510.1109/31.76483
LadnerREFischerMJParallel prefix computationJournal of the ACM198027483183859470210.1145/322217.3222320445.68066
Lee, J.H., Patel, K., Nigania, N., Kim, H., & Kim, H. (2013). OpenCL performance evaluation on modern multi core CPUs. In 2013 IEEE 27th International Parallel and Distributed Processing Symposium Workshops & PhD Forum (IPDPSW) (pp. 1177–1185). IEEE.
NickollsJBuckIGarlandMSkadronKScalable parallel programming with CUDAQueue200862405310.1145/1365490.1365500
ParhiKKHigh-Level algorithm and architecture transformations for DSP synthesisJournal of VLSI Signal Processing199591210.1007/BF02406474
Rupley, J., King, J., Quinnell, E., Galloway, F., Patton, K., Seidel, P., Dinh, J., Bui, H., & Bhowmik, A. (2013). The floating-point unit of the Jaguar x86 core. In 2013 21st IEEE Symposium on Computer Arithmetic (ARITH), (pp. 7–16).
KoomeyJGBerardSSanchezMWongHImplications of historical trends in the electrical efficiency of computingIEEE Annals of the History of Computing20113334654275992610.1109/MAHC.2010.28
Munshi, A., Gaster, B., Mattson, T.G., & Ginsburg, D. (2011). OpenCL programming guide. Pearson Education.
Zeiger, H.P. (1968). Algebraic Theory of Machines, Languages, and Semigroups, chap. Cascade decomposition of automata using covers, (pp. 55–80). Academic Press.
Dotsenko, Y., Govindaraju, N.K., Sloan, P.P., Boyd, C., & Manferdelli, J. (2008). Fast scan algorithms on graphics processors. In Proceedings of the 22nd Annual International Conference on Supercomputing, ICS ’08, (pp. 205–213). New York: ACM. doi:10.1145/1375527.1375559.
WuCWCappelloPRApplication-specific CAD of VLSI second-order sections. IEEE Transactions on AcousticsSpeech and Signal Processing199836581382510.1109/29.15900709.94568
RadivojevicIPHerathHExecuting DSP applications in a fine-grained dataflow environmentIEEE Transactions on Software Engineering199117101028104110.1109/32.99191
Bell, N., & Hoberock, J. (2011). Thrust: A productivity-oriented library for CUDA. In W.W. Hwu (Ed.), GPU Computing Gems Jade Edition, 26, (pp. 359–371). Morgan Kauffman.
Intel architecture instruction set extensions programming reference. https://software.intel.com/en-us/intel-isa-extensions.
Barford, L. (2012). Parallelizing small finite state machines, with application to pulsed signal analysis. In Proc. IEEE Intl. Instrumentation and Measurement Technology Conf., (pp. 1957– 1962).
KrohnKRhodesJAlgebraic theory of machines. I. Prime decomposition theorem for finite semigroups and machinesTransactions of the American Mathematical Society196511645046418831610.1090/S0002-9947-1965-0188316-10148.01002
Mitra, G., Johnston, B., Rendell, A.P., McCreath, E., & Zhou, J. (2013). Use of SIMD vector operations to accelerate application code performance on low-powered ARM and Intel platforms. In 2013 IEEE 27th International Parallel and Distributed Processing Symposium Workshops & PhD Forum (IPDPSW), (pp. 1107–1116). IEEE.
KlöcknerAPintoNLeeYCatanzaroBIvanovPFasihAPyCUDA and PyOpenCL: A scripting-based approach to GPU run-time code generationParallel Computing201238315717410.1016/j.parco.2011.09.001
BlellochGEScans as primitive parallel operationsComputers, IEEE Transactions on198938111526153810.1109/12.42122
Egri-Nagi, A., Mitchell, J.D., & Nehaniv, C.L. (2014). SgpDec: Cascade (de)composition of finite transformation semigroups and pertmutation groups. In Hong, H., & Yap, C. (Eds.), Proc. of the 4th Internationcal Conference on Mathematical Software (ICMS 2014), Lecture Notes in Computer Science, (vol. 8592 pp. 75–82). Springer .
Eilenberg, S. (1976). Automata, Languages, and Machines, vol. B, chap. 1, 6. Academic Press.
TanKLiuHZhangJZhangYFangJVoelkerGMSora: High-performance software radio using general-purpose multi-core processorsCommunications of the ACM20115419910710.1145/1866739.1866760
NozakiAPractical decomposition of automataInformation and Control19783627529147620010.1016/S0019-9958(78)90320-00374.94035
AsanovicKBodikRCatanzaroBCGebisJJHusbandsPKeutzerKPattersonDAPlishkerWLShalfJWilliamsSWThe landscape of parallel computing research: A view from Berkeley. Tech. rep., Technical Report UCB/EECS-2006-183, EECS Department2006BerkeleyUniversity of California
Ritz, S., Pankert, M., & Meyr, H. (1993). Optimum vectorization of scalable synchronous dataflow graphs. In Proceedings of the International Conference on Application Specific Array Processors.
JB Dennis (1045_CR8) 1980; 13
IP Radivojevic (1045_CR27) 1991; 17
K Krohn (1045_CR16) 1965; 116
EA Lee (1045_CR18) 1987; 36
1045_CR21
1045_CR22
1045_CR23
1045_CR5
K Asanovic (1045_CR2) 2006
1045_CR4
1045_CR1
1045_CR28
1045_CR29
A Klöckner (1045_CR14) 2012; 38
1045_CR9
1045_CR6
J Nickolls (1045_CR24) 2008; 6
JG Koomey (1045_CR15) 2011; 33
H Lin (1045_CR20) 1991; 38
1045_CR30
GE Blelloch (1045_CR7) 1989; 38
1045_CR10
1045_CR32
CW Wu (1045_CR33) 1998; 36
1045_CR11
1045_CR12
1045_CR34
1045_CR13
1045_CR35
KK Parhi (1045_CR26) 1995; 9
1045_CR19
K Tan (1045_CR31) 2011; 54
RE Ladner (1045_CR17) 1980; 27
A Nozaki (1045_CR25) 1978; 36
L Barford (1045_CR3) 2011; 60
References_xml – reference: Egri-Nagi, A., Mitchell, J.D., & Nehaniv, C.L. (2014). SgpDec: Cascade (de)composition of finite transformation semigroups and pertmutation groups. In Hong, H., & Yap, C. (Eds.), Proc. of the 4th Internationcal Conference on Mathematical Software (ICMS 2014), Lecture Notes in Computer Science, (vol. 8592 pp. 75–82). Springer .
– reference: Sengupta, S., Harris, M., Zhang, Y., & Owens, J.D. (2007). Scan primitives for GPU computing. In Graphics Hardware, vol. 2007, (pp. 97–106).
– reference: Barford, L. (2012). Parallelizing small finite state machines, with application to pulsed signal analysis. In Proc. IEEE Intl. Instrumentation and Measurement Technology Conf., (pp. 1957– 1962).
– reference: Bell, N., & Hoberock, J. (2011). Thrust: A productivity-oriented library for CUDA. In W.W. Hwu (Ed.), GPU Computing Gems Jade Edition, 26, (pp. 359–371). Morgan Kauffman.
– reference: BarfordLSpeeding localization of pulsed signal transitions using multicore processorsIEEE Transactions Instrumentation and Measurement20116051588159310.1109/TIM.2010.2090055
– reference: Ritz, S., Pankert, M., & Meyr, H. (1993). Optimum vectorization of scalable synchronous dataflow graphs. In Proceedings of the International Conference on Application Specific Array Processors.
– reference: LinHMesserschmittDGFinite state machine has unlimited concurrencyIEEE Transactions on Circuits and Systems199138546547510.1109/31.76483
– reference: TanKLiuHZhangJZhangYFangJVoelkerGMSora: High-performance software radio using general-purpose multi-core processorsCommunications of the ACM20115419910710.1145/1866739.1866760
– reference: Intel architecture instruction set extensions programming reference. https://software.intel.com/en-us/intel-isa-extensions.
– reference: BlellochGEScans as primitive parallel operationsComputers, IEEE Transactions on198938111526153810.1109/12.42122
– reference: Hwu, W.M.W. (2012). GPU Computing Gems Jade Edition. Morgan Kauffman.
– reference: DennisJBData flow supercomputersIEEE Computer19801311485610.1109/MC.1980.1653418
– reference: Franchetti, F., Voronenko, Y., & Puschel, M. (2006). FFT program generation for shared memory: SMP and multicore. In SC 2006 Conference, Proceedings of the ACM/IEEE (pp. 51–51). IEEE .
– reference: Zeiger, H.P. (1968). Algebraic Theory of Machines, Languages, and Semigroups, chap. Cascade decomposition of automata using covers, (pp. 55–80). Academic Press.
– reference: Barford, L., & Keenan, K. (2014). Segmenting a signal based on a local property using multicore processors. In Proc. IEEE Intl. Instrumentation and Measurement Technology Conf., (pp. 397–401).
– reference: LadnerREFischerMJParallel prefix computationJournal of the ACM198027483183859470210.1145/322217.3222320445.68066
– reference: LeeEAMesserschmittDGStatic scheduling of synchronous data flow programs for digital signal processingIEEE Transactions Computers1987361243510.1109/TC.1987.5009446
– reference: Eilenberg, S. (1976). Automata, Languages, and Machines, vol. B, chap. 1, 6. Academic Press.
– reference: RadivojevicIPHerathHExecuting DSP applications in a fine-grained dataflow environmentIEEE Transactions on Software Engineering199117101028104110.1109/32.99191
– reference: NozakiAPractical decomposition of automataInformation and Control19783627529147620010.1016/S0019-9958(78)90320-00374.94035
– reference: KrohnKRhodesJAlgebraic theory of machines. I. Prime decomposition theorem for finite semigroups and machinesTransactions of the American Mathematical Society196511645046418831610.1090/S0002-9947-1965-0188316-10148.01002
– reference: Rupley, J., King, J., Quinnell, E., Galloway, F., Patton, K., Seidel, P., Dinh, J., Bui, H., & Bhowmik, A. (2013). The floating-point unit of the Jaguar x86 core. In 2013 21st IEEE Symposium on Computer Arithmetic (ARITH), (pp. 7–16).
– reference: Dotsenko, Y., Govindaraju, N.K., Sloan, P.P., Boyd, C., & Manferdelli, J. (2008). Fast scan algorithms on graphics processors. In Proceedings of the 22nd Annual International Conference on Supercomputing, ICS ’08, (pp. 205–213). New York: ACM. doi:10.1145/1375527.1375559.
– reference: KoomeyJGBerardSSanchezMWongHImplications of historical trends in the electrical efficiency of computingIEEE Annals of the History of Computing20113334654275992610.1109/MAHC.2010.28
– reference: Maleki, S., Gao, Y., Garzaran, M.J., Wong, T., & Padua, D.A. (2011). An evaluation of vectorizing compilers. In 2011 International Conference on Parallel Architectures and Compilation Techniques (PACT), (pp. 372–382). IEEE.
– reference: Zumbusch, G. (2012). Tuning a finite difference computation for parallel vector processors. In 2012 11th International Symposium on Parallel and Distributed Computing (ISPDC), (pp. 63–70). IEEE.
– reference: Lee, J.H., Patel, K., Nigania, N., Kim, H., & Kim, H. (2013). OpenCL performance evaluation on modern multi core CPUs. In 2013 IEEE 27th International Parallel and Distributed Processing Symposium Workshops & PhD Forum (IPDPSW) (pp. 1177–1185). IEEE.
– reference: NickollsJBuckIGarlandMSkadronKScalable parallel programming with CUDAQueue200862405310.1145/1365490.1365500
– reference: Vishkin, U. (1997). From algorithm parallelism to instruction-level parallelism: An encode-decode chain using prefix-sum. In Proceedings of the ninth annual ACM symposium on Parallel algorithms and architectures, (pp. 260–271).
– reference: KlöcknerAPintoNLeeYCatanzaroBIvanovPFasihAPyCUDA and PyOpenCL: A scripting-based approach to GPU run-time code generationParallel Computing201238315717410.1016/j.parco.2011.09.001
– reference: WuCWCappelloPRApplication-specific CAD of VLSI second-order sections. IEEE Transactions on AcousticsSpeech and Signal Processing199836581382510.1109/29.15900709.94568
– reference: Munshi, A., Gaster, B., Mattson, T.G., & Ginsburg, D. (2011). OpenCL programming guide. Pearson Education.
– reference: ParhiKKHigh-Level algorithm and architecture transformations for DSP synthesisJournal of VLSI Signal Processing199591210.1007/BF02406474
– reference: AsanovicKBodikRCatanzaroBCGebisJJHusbandsPKeutzerKPattersonDAPlishkerWLShalfJWilliamsSWThe landscape of parallel computing research: A view from Berkeley. Tech. rep., Technical Report UCB/EECS-2006-183, EECS Department2006BerkeleyUniversity of California
– reference: Mitra, G., Johnston, B., Rendell, A.P., McCreath, E., & Zhou, J. (2013). Use of SIMD vector operations to accelerate application code performance on low-powered ARM and Intel platforms. In 2013 IEEE 27th International Parallel and Distributed Processing Symposium Workshops & PhD Forum (IPDPSW), (pp. 1107–1116). IEEE.
– ident: 1045_CR4
  doi: 10.1109/I2MTC.2012.6229207
– ident: 1045_CR21
  doi: 10.1109/PACT.2011.68
– volume: 54
  start-page: 99
  issue: 1
  year: 2011
  ident: 1045_CR31
  publication-title: Communications of the ACM
  doi: 10.1145/1866739.1866760
– volume: 38
  start-page: 465
  issue: 5
  year: 1991
  ident: 1045_CR20
  publication-title: IEEE Transactions on Circuits and Systems
  doi: 10.1109/31.76483
– volume: 38
  start-page: 157
  issue: 3
  year: 2012
  ident: 1045_CR14
  publication-title: Parallel Computing
  doi: 10.1016/j.parco.2011.09.001
– ident: 1045_CR6
– ident: 1045_CR9
  doi: 10.1145/1375527.1375559
– volume: 17
  start-page: 1028
  issue: 10
  year: 1991
  ident: 1045_CR27
  publication-title: IEEE Transactions on Software Engineering
  doi: 10.1109/32.99191
– volume: 33
  start-page: 46
  issue: 3
  year: 2011
  ident: 1045_CR15
  publication-title: IEEE Annals of the History of Computing
  doi: 10.1109/MAHC.2010.28
– volume: 36
  start-page: 24
  issue: 1
  year: 1987
  ident: 1045_CR18
  publication-title: IEEE Transactions Computers
  doi: 10.1109/TC.1987.5009446
– ident: 1045_CR13
– ident: 1045_CR22
  doi: 10.1109/IPDPSW.2013.207
– ident: 1045_CR35
  doi: 10.1109/ISPDC.2012.17
– ident: 1045_CR5
  doi: 10.1109/I2MTC.2014.6860775
– volume: 13
  start-page: 48
  issue: 11
  year: 1980
  ident: 1045_CR8
  publication-title: IEEE Computer
  doi: 10.1109/MC.1980.1653418
– ident: 1045_CR11
– ident: 1045_CR19
  doi: 10.1109/IPDPSW.2013.141
– volume: 36
  start-page: 813
  issue: 5
  year: 1998
  ident: 1045_CR33
  publication-title: Speech and Signal Processing
– ident: 1045_CR34
– volume-title: The landscape of parallel computing research: A view from Berkeley. Tech. rep., Technical Report UCB/EECS-2006-183, EECS Department
  year: 2006
  ident: 1045_CR2
– volume: 60
  start-page: 1588
  issue: 5
  year: 2011
  ident: 1045_CR3
  publication-title: IEEE Transactions Instrumentation and Measurement
  doi: 10.1109/TIM.2010.2090055
– volume: 116
  start-page: 450
  year: 1965
  ident: 1045_CR16
  publication-title: Transactions of the American Mathematical Society
  doi: 10.1090/S0002-9947-1965-0188316-1
– ident: 1045_CR30
– volume: 6
  start-page: 40
  issue: 2
  year: 2008
  ident: 1045_CR24
  publication-title: Queue
  doi: 10.1145/1365490.1365500
– ident: 1045_CR29
  doi: 10.1109/ARITH.2013.24
– ident: 1045_CR23
– ident: 1045_CR1
– ident: 1045_CR32
  doi: 10.1145/258492.258518
– ident: 1045_CR10
  doi: 10.1007/978-3-662-44199-2_13
– volume: 36
  start-page: 275
  year: 1978
  ident: 1045_CR25
  publication-title: Information and Control
  doi: 10.1016/S0019-9958(78)90320-0
– volume: 27
  start-page: 831
  issue: 4
  year: 1980
  ident: 1045_CR17
  publication-title: Journal of the ACM
  doi: 10.1145/322217.322232
– ident: 1045_CR12
  doi: 10.1109/SC.2006.31
– volume: 9
  start-page: 1
  year: 1995
  ident: 1045_CR26
  publication-title: Journal of VLSI Signal Processing
  doi: 10.1007/BF02406474
– volume: 38
  start-page: 1526
  issue: 11
  year: 1989
  ident: 1045_CR7
  publication-title: Computers, IEEE Transactions on
  doi: 10.1109/12.42122
– ident: 1045_CR28
  doi: 10.1109/ASAP.1993.397152
SSID ssj0060751
Score 2.0857291
Snippet Full use of the parallel computation capabilities of present and expected CPUs and GPUs requires use of vector extensions. Yet many actors in data flow systems...
SourceID crossref
springer
SourceType Index Database
Publisher
StartPage 21
SubjectTerms Circuits and Systems
Computer Imaging
Electrical Engineering
Engineering
Image Processing and Computer Vision
Pattern Recognition
Pattern Recognition and Graphics
Signal,Image and Speech Processing
Vision
Subtitle Handling Actors With Internal State
Title Data Flow Algorithms for Processors with Vector Extensions
URI https://link.springer.com/article/10.1007/s11265-015-1045-x
Volume 87
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
journalDatabaseRights – providerCode: PRVAVX
  databaseName: SpringerLINK - Czech Republic Consortium
  customDbUrl:
  eissn: 1939-8115
  dateEnd: 99991231
  omitProxy: false
  ssIdentifier: ssj0060751
  issn: 1939-8018
  databaseCode: AGYKE
  dateStart: 19970101
  isFulltext: true
  titleUrlDefault: http://link.springer.com
  providerName: Springer Nature
– providerCode: PRVAVX
  databaseName: SpringerLink Journals (ICM)
  customDbUrl:
  eissn: 1939-8115
  dateEnd: 99991231
  omitProxy: true
  ssIdentifier: ssj0060751
  issn: 1939-8018
  databaseCode: U2A
  dateStart: 20080101
  isFulltext: true
  titleUrlDefault: http://www.springerlink.com/journals/
  providerName: Springer Nature
link http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV09T8MwFLRQu8CA-BTlo_LABLKwndhx2KLSUoHERFGZIid2YChN1RTRn8-zmwiKYCBDJifD2fK9-O5dEDoPDM2AdzSh3CgSZoqRWBpBAqEUKyRnQeENsg9yOArvxmJch0W7Xpgf-v1V5VpcnL3M5WWGgkC52AaOkl6Xlb1m05XAfGwlIMdu01WNgPnbK9YpaF3_9LQy2EHbdT2Ik9UE7qINO91DW99SAvfR9Y1eaDyYlB84mbyU8DX_-lZhKDZx7fIv5xV256n4yR_B4_7S29JhPR2g0aD_2BuS-pcHJAdiX5JcciMKHkZKR3moBQtyK5i2EWUmziIhgdO0oMZaoWyQUx1KZuBJw4y7dHCIWtNyao8QtoxayYqocC0gorBKUB3wLKZQAAFD2Q66aFBIZ6tki_Qrw9hBlgJkqYMsXXbQZYNTWi_y6u_Rx_8afYI2ueNKb4c5Ra3F_N2eAdMvsi5qJ7fP9_2un2u4j3jyCQiZoak
linkProvider Springer Nature
linkToHtml http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV09T8MwELVQOwAD34jy6YEJlMp2Ysdhq6Cl0NKpRWWKnNgBRGlQk4qKX4-dD5UiGJrZiazT5d6L790LAOe2RIHGHWEhIrnlBBxbHpPUsinnOGIE21EmkO2x9sC5H9JhMcedlGr3siWZVer5sBsmzAjNjHOmQy1NHKuO_j4hFVBt3D51mmUBZhoFcd5M9kwB5mUz86-HLMLRYi80g5jWJuiXm8uVJW_1aRrUw69fvo1L7n4LbBSUEzbyHNkGK2q8A9Z_GBHugqsbkQrYGsWfsDF6jiev6ct7AjWfhcUgQTxJoDmyhY_ZKT9szjLlu07ZPTBoNfvXbav4q4IVau4ws0JGJI2I43Lhho6g2A4VxUK5CEsvcCnTsCkokkpRruwQCYdhqe-UWJpL2PugMo7H6gBAhZFiOHIjM2VCI8UpEjYJPKQ5lgZBVQMXZXD9j9w8w5_bJJuA-DogvgmIP6uByzJyfvEeJf-vPlxq9RlYbfcfun73rtc5AmvEQHOmvjkGlXQyVSeaWKTBaZFI36jkwzo
linkToPdf http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV27TsMwFLVQkRAMiKcoTw9MIKt2EjsOW9U2Kg9VDBR1i5zYhqEkVRNEPh87D5VKMODZznBt-Zz4nnsuANeuxLHBHYGwIznyYk5QwCRFLuWcaOYQV1cC2QkbT72HGZ01fU7zVu3epiTrmgbr0pQWvYXUvVXhG3GYFZ1ZF02PIkMiNz0DbraFwYAN2quYGTwkdVo5sFcxb9Oav31iHZjWs6IV2IR7YLdhibBfb-s-2FDpAdj54R14CO6GohAwnGdfsD9_y8w__vtHDg0FhY32P1vm0L6ywtfqYR6Oykqsbk7ZEZiGo5fBGDWNEFBi4L5ECXMk1Y7nc-EnnqDETRQlQvmYyCD2KTNIJyiWSlGu3AQLjxFpVkoi7RDuMeikWapOAFQEK0a0r21hCNWKUyxcJw6woUUGt1QX3LRRiBa130W0cja2IYtMyCIbsqjsgts2TlFz9PO_Z5_-a_YV2HoehtHT_eTxDGw7Fkwrvcw56BTLT3VhqEARX1bb_Q0vg6pi
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Data+Flow+Algorithms+for+Processors+with+Vector+Extensions&rft.jtitle=Journal+of+signal+processing+systems&rft.au=Barford%2C+Lee&rft.au=Bhattacharyya%2C+Shuvra+S.&rft.au=Liu%2C+Yanzhou&rft.date=2017-04-01&rft.issn=1939-8018&rft.eissn=1939-8115&rft.volume=87&rft.issue=1&rft.spage=21&rft.epage=31&rft_id=info:doi/10.1007%2Fs11265-015-1045-x&rft.externalDBID=n%2Fa&rft.externalDocID=10_1007_s11265_015_1045_x
thumbnail_l http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=1939-8018&client=summon
thumbnail_m http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=1939-8018&client=summon
thumbnail_s http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=1939-8018&client=summon