Deep Dual-Resolution Networks for Real-Time and Accurate Semantic Segmentation of Traffic Scenes

Using light-weight architectures or reasoning on low-resolution images, recent methods realize very fast scene parsing, even running at more than 100 FPS on a single GPU. However, there is still a significant gap in performance between these real-time methods and the models based on dilation backbon...

Full description

Saved in:

Bibliographic Details
Published in	IEEE transactions on intelligent transportation systems Vol. 24; no. 3; pp. 1 - 13
Main Authors	Pan, Huihui, Hong, Yuanduo, Sun, Weichao, Jia, Yisong
Format	Journal Article
Language	English
Published	New York IEEE 01.03.2023 The Institute of Electrical and Electronics Engineers, Inc. (IEEE)
Subjects	Acceleration autonomous driving Computer architecture Data mining deep convolutional neural networks Feature extraction Feature maps Image resolution Image segmentation Real time Real-time systems Semantic segmentation Semantics Task analysis Test sets Weight reduction
Online Access	Get full text
ISSN	1524-9050 1558-0016
DOI	10.1109/TITS.2022.3228042

Cover

More Information
Summary:	Using light-weight architectures or reasoning on low-resolution images, recent methods realize very fast scene parsing, even running at more than 100 FPS on a single GPU. However, there is still a significant gap in performance between these real-time methods and the models based on dilation backbones. To this end, we proposed a family of deep dual-resolution networks (DDRNets) for real-time and accurate semantic segmentation, which consist of deep dual-resolution backbones and enhanced low-resolution contextual information extractors. The two deep branches and multiple bilateral fusions of backbones generate higher quality details compared to existing two-pathway methods. The enhanced contextual information extractor named Deep Aggregation Pyramid Pooling Module (DAPPM) enlarges effective receptive fields and fuses multi-scale context based on low-resolution feature maps with little time cost. Our method achieves a new state-of-the-art trade-off between accuracy and speed on both Cityscapes and CamVid dataset. For the input of full resolution, on a single 2080Ti GPU without hardware acceleration, DDRNet-23-slim yields 77.4<inline-formula> <tex-math notation="LaTeX">\%</tex-math> </inline-formula> mIoU at 102 FPS on Cityscapes test set and 74.7<inline-formula> <tex-math notation="LaTeX">\%</tex-math> </inline-formula> mIoU at 230 FPS on CamVid test set. With widely used test augmentation, our method is superior to most state-of-the-art models and requires much less computation. Codes and trained models are available at https://github.com/ydhongHIT/DDRNet.
Bibliography:	ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14
ISSN:	1524-9050 1558-0016
DOI:	10.1109/TITS.2022.3228042