An Extended Nonstrict Partially Ordered Set-Based Configurable Linear Sorter on FPGAs

Sorting is essential for many scientific and data processing problems. It is significant to improve the efficiency of sorting. Taking advantage of specialized hardware, parallel sorting, e.g., sorting networks and linear sorters, implements sorting in lower time complexity. However, most of them are...

Full description

Saved in:
Bibliographic Details
Published inIEEE transactions on computer-aided design of integrated circuits and systems Vol. 39; no. 5; pp. 1031 - 1044
Main Authors Li, Dalin, Huang, Lan, Gao, Teng, Feng, Yang, Tavares, Adriano, Wang, Kangping
Format Journal Article
LanguageEnglish
Published New York IEEE 01.05.2020
The Institute of Electrical and Electronics Engineers, Inc. (IEEE)
Subjects
Online AccessGet full text
ISSN0278-0070
1937-4151
DOI10.1109/TCAD.2020.2977074

Cover

More Information
Summary:Sorting is essential for many scientific and data processing problems. It is significant to improve the efficiency of sorting. Taking advantage of specialized hardware, parallel sorting, e.g., sorting networks and linear sorters, implements sorting in lower time complexity. However, most of them are designed based on the parallelization of algorithms, lacking consideration of specialized hardware structures. In this article, we propose an extended nonstrict partially ordered set-based configurable linear sorter on field-programmable gate arrays (FPGAs). First, we extend nonstrict partial order to the binary tuple and n -tuple nonstrict partial orders. Then, the linear sorting algorithm is defined based on them, with the consideration of hardware performance. It has 4 N/n time complexity varying from 4 to 2 N as the tuple size varies. The number of comparisons reduces to N /2 in binary tuple-based sorting, which is half of the state-of-the-art insertion linear sorting. Finally, we implement the linear sorter on FPGAs. It consists of multiple customizable micro-cores, named sorting units (SUs). The SU packages the storage and comparison of the tuple. All the SUs are connected into a chain with simple communication, which makes the sorter fully configurable in length, bandwidth, and throughput. They also act the same in each clock cycle, so that the achieved frequency of the sorter improves. In our experiment, the sorter achieves at most 660-MHz frequency, 5.6 Gb/s throughput, and 87 times speed-up compared with the quick sort algorithm on general processors.
Bibliography:ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 14
ISSN:0278-0070
1937-4151
DOI:10.1109/TCAD.2020.2977074