An Extended Nonstrict Partially Ordered Set-Based Configurable Linear Sorter on FPGAs
Sorting is essential for many scientific and data processing problems. It is significant to improve the efficiency of sorting. Taking advantage of specialized hardware, parallel sorting, e.g., sorting networks and linear sorters, implements sorting in lower time complexity. However, most of them are...
Saved in:
| Published in | IEEE transactions on computer-aided design of integrated circuits and systems Vol. 39; no. 5; pp. 1031 - 1044 |
|---|---|
| Main Authors | , , , , , |
| Format | Journal Article |
| Language | English |
| Published |
New York
IEEE
01.05.2020
The Institute of Electrical and Electronics Engineers, Inc. (IEEE) |
| Subjects | |
| Online Access | Get full text |
| ISSN | 0278-0070 1937-4151 |
| DOI | 10.1109/TCAD.2020.2977074 |
Cover
| Summary: | Sorting is essential for many scientific and data processing problems. It is significant to improve the efficiency of sorting. Taking advantage of specialized hardware, parallel sorting, e.g., sorting networks and linear sorters, implements sorting in lower time complexity. However, most of them are designed based on the parallelization of algorithms, lacking consideration of specialized hardware structures. In this article, we propose an extended nonstrict partially ordered set-based configurable linear sorter on field-programmable gate arrays (FPGAs). First, we extend nonstrict partial order to the binary tuple and n -tuple nonstrict partial orders. Then, the linear sorting algorithm is defined based on them, with the consideration of hardware performance. It has 4 N/n time complexity varying from 4 to 2 N as the tuple size varies. The number of comparisons reduces to N /2 in binary tuple-based sorting, which is half of the state-of-the-art insertion linear sorting. Finally, we implement the linear sorter on FPGAs. It consists of multiple customizable micro-cores, named sorting units (SUs). The SU packages the storage and comparison of the tuple. All the SUs are connected into a chain with simple communication, which makes the sorter fully configurable in length, bandwidth, and throughput. They also act the same in each clock cycle, so that the achieved frequency of the sorter improves. In our experiment, the sorter achieves at most 660-MHz frequency, 5.6 Gb/s throughput, and 87 times speed-up compared with the quick sort algorithm on general processors. |
|---|---|
| Bibliography: | ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14 |
| ISSN: | 0278-0070 1937-4151 |
| DOI: | 10.1109/TCAD.2020.2977074 |