Using FPGA devices to accelerate the evaluation phase of tree-based genetic programming: an extended analysis

This paper establishes the potential of accelerating the evaluation phase of tree-based genetic programming through contemporary field-programmable gate array (FPGA) technology. This exploration stems from the fact that FPGAs can sometimes leverage increased levels of both data and function parallel...

Full description

Saved in:

Bibliographic Details
Published in	Genetic programming and evolvable machines Vol. 26; no. 1
Main Authors	Crary, Christopher, Piard, Wesley, Stitt, Greg, Hicks, Benjamin, Bean, Caleb, Burlacu, Bogdan, Banzhaf, Wolfgang
Format	Journal Article
Language	English
Published	Dordrecht Springer Nature B.V 01.06.2025
Subjects	Central processing units CPUs Energy efficiency Field programmable gate arrays Genetic algorithms Graphics processing units Microprocessors Power management Software
Online Access	Get full text
ISSN	1389-2576 1573-7632
DOI	10.1007/s10710-024-09505-2

Cover

More Information
Summary:	This paper establishes the potential of accelerating the evaluation phase of tree-based genetic programming through contemporary field-programmable gate array (FPGA) technology. This exploration stems from the fact that FPGAs can sometimes leverage increased levels of both data and function parallelism, as well as superior power/energy efficiency, when compared to general-purpose CPU/GPU systems. In this investigation, we introduce a fixed-depth, tree-based architecture that can fully parallelize tree evaluation for type-consistent primitives that are unrolled and pipelined. We show that our accelerator on a 14nm FPGA achieves an average speedup of 43× when compared to a recent open-source GPU solution, TensorGP, implemented on 8nm process-node technology, and an average speedup of 4,902× when compared to a popular baseline GP software tool, DEAP, running parallelized across all cores of a 2-socket, 28-core (56-thread), 14nm CPU server. Despite our single-FPGA accelerator being 2.4× slower on average when compared to the recent state-of-the-art Operon tool executing on the same 2-processor, 28-core CPU system, we show that this single-FPGA system is 1.4× better than Operon in terms of performance-per-watt. Importantly, we also describe six future extensions that could provide at least a 64–192× speedup over our current design. Therefore, our initial results provide considerable motivation for the continued exploration of FPGA-based GP systems. Overall, any success in significantly improving runtime and energy efficiency could potentially enable novel research efforts through faster and/or less costly GP runs, similar to how GPUs unlocked the power of deep learning during the past fifteen years.
Bibliography:	ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14
ISSN:	1389-2576 1573-7632
DOI:	10.1007/s10710-024-09505-2