Deep Dual-Resolution Networks for Real-Time and Accurate Semantic Segmentation of Traffic Scenes

Using light-weight architectures or reasoning on low-resolution images, recent methods realize very fast scene parsing, even running at more than 100 FPS on a single GPU. However, there is still a significant gap in performance between these real-time methods and the models based on dilation backbon...

Full description

Saved in:
Bibliographic Details
Published inIEEE transactions on intelligent transportation systems Vol. 24; no. 3; pp. 1 - 13
Main Authors Pan, Huihui, Hong, Yuanduo, Sun, Weichao, Jia, Yisong
Format Journal Article
LanguageEnglish
Published New York IEEE 01.03.2023
The Institute of Electrical and Electronics Engineers, Inc. (IEEE)
Subjects
Online AccessGet full text
ISSN1524-9050
1558-0016
DOI10.1109/TITS.2022.3228042

Cover

More Information
Summary:Using light-weight architectures or reasoning on low-resolution images, recent methods realize very fast scene parsing, even running at more than 100 FPS on a single GPU. However, there is still a significant gap in performance between these real-time methods and the models based on dilation backbones. To this end, we proposed a family of deep dual-resolution networks (DDRNets) for real-time and accurate semantic segmentation, which consist of deep dual-resolution backbones and enhanced low-resolution contextual information extractors. The two deep branches and multiple bilateral fusions of backbones generate higher quality details compared to existing two-pathway methods. The enhanced contextual information extractor named Deep Aggregation Pyramid Pooling Module (DAPPM) enlarges effective receptive fields and fuses multi-scale context based on low-resolution feature maps with little time cost. Our method achieves a new state-of-the-art trade-off between accuracy and speed on both Cityscapes and CamVid dataset. For the input of full resolution, on a single 2080Ti GPU without hardware acceleration, DDRNet-23-slim yields 77.4<inline-formula> <tex-math notation="LaTeX">\%</tex-math> </inline-formula> mIoU at 102 FPS on Cityscapes test set and 74.7<inline-formula> <tex-math notation="LaTeX">\%</tex-math> </inline-formula> mIoU at 230 FPS on CamVid test set. With widely used test augmentation, our method is superior to most state-of-the-art models and requires much less computation. Codes and trained models are available at https://github.com/ydhongHIT/DDRNet.
Bibliography:ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 14
ISSN:1524-9050
1558-0016
DOI:10.1109/TITS.2022.3228042