A Fast, Single-Instruction-Multiple-Data, Scalable Priority Queue

In this paper, we address a key challenge in designing flow-based traffic managers (TMs) for next-generation networks. One key functionality of a TM is to schedule the departure of packets on egress ports. This scheduling ensures that packets are sent in a way that meets the allowed bandwidth quotas...

Full description

Saved in:

Bibliographic Details
Published in	IEEE transactions on very large scale integration (VLSI) systems Vol. 26; no. 10; pp. 1939 - 1952
Main Authors	Benacer, Imad, Boyer, Francois-Raymond, Savaria, Yvon
Format	Journal Article
Language	English
Published	New York IEEE 01.10.2018 The Institute of Electrical and Electronics Engineers, Inc. (IEEE)
Subjects	Bandwidth C++ (programming language) Data structures Egress Field programmable gate arrays Field-programmable gate array (FPGA) flow-based networking Hardware High level synthesis Network latency Packets (communication) priority queue (PQ) Queueing Queues Quotas Real-time systems Scheduling SIMD (computers) Sorting Throughput Traffic congestion Traffic management traffic manager (TM) Traffic speed
Online Access	Get full text
ISSN	1063-8210 1557-9999 1557-9999
DOI	10.1109/TVLSI.2018.2838044

Cover

More Information
Summary:	In this paper, we address a key challenge in designing flow-based traffic managers (TMs) for next-generation networks. One key functionality of a TM is to schedule the departure of packets on egress ports. This scheduling ensures that packets are sent in a way that meets the allowed bandwidth quotas for each flow. A TM handles policing, shaping, scheduling, and queuing. The latter is a core function in traffic management and is a bottleneck in the context of high-speed network devices. Aiming at high throughput and low latency, we propose a single-instruction-multiple-data (SIMD) hardware priority queue (PQ) to sort out packets in real time, supporting independently the three basic operations of enqueuing, dequeuing, and replacing in a single clock cycle. A proof of validity of the proposed hardware PQ data structure is presented. The implemented PQ architecture is coded in C++. Vivado high-level synthesis is used to generate synthesizable register transfer logic from the C++ model. This implementation on a ZC706 field-programmable gate array (FPGA) shows the scalability of the proposed solution for various queue depths with almost constant performance. It offers a <inline-formula> <tex-math notation="LaTeX">10\times </tex-math></inline-formula> throughput improvement when compared to prior works, and it supports links operating at 100 Gb/s.
Bibliography:	ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14
ISSN:	1063-8210 1557-9999 1557-9999
DOI:	10.1109/TVLSI.2018.2838044