Deep Dual-Resolution Networks for Real-Time and Accurate Semantic Segmentation of Traffic Scenes
Using light-weight architectures or reasoning on low-resolution images, recent methods realize very fast scene parsing, even running at more than 100 FPS on a single GPU. However, there is still a significant gap in performance between these real-time methods and the models based on dilation backbon...
Saved in:
| Published in | IEEE transactions on intelligent transportation systems Vol. 24; no. 3; pp. 1 - 13 |
|---|---|
| Main Authors | , , , |
| Format | Journal Article |
| Language | English |
| Published |
New York
IEEE
01.03.2023
The Institute of Electrical and Electronics Engineers, Inc. (IEEE) |
| Subjects | |
| Online Access | Get full text |
| ISSN | 1524-9050 1558-0016 |
| DOI | 10.1109/TITS.2022.3228042 |
Cover
| Summary: | Using light-weight architectures or reasoning on low-resolution images, recent methods realize very fast scene parsing, even running at more than 100 FPS on a single GPU. However, there is still a significant gap in performance between these real-time methods and the models based on dilation backbones. To this end, we proposed a family of deep dual-resolution networks (DDRNets) for real-time and accurate semantic segmentation, which consist of deep dual-resolution backbones and enhanced low-resolution contextual information extractors. The two deep branches and multiple bilateral fusions of backbones generate higher quality details compared to existing two-pathway methods. The enhanced contextual information extractor named Deep Aggregation Pyramid Pooling Module (DAPPM) enlarges effective receptive fields and fuses multi-scale context based on low-resolution feature maps with little time cost. Our method achieves a new state-of-the-art trade-off between accuracy and speed on both Cityscapes and CamVid dataset. For the input of full resolution, on a single 2080Ti GPU without hardware acceleration, DDRNet-23-slim yields 77.4<inline-formula> <tex-math notation="LaTeX">\%</tex-math> </inline-formula> mIoU at 102 FPS on Cityscapes test set and 74.7<inline-formula> <tex-math notation="LaTeX">\%</tex-math> </inline-formula> mIoU at 230 FPS on CamVid test set. With widely used test augmentation, our method is superior to most state-of-the-art models and requires much less computation. Codes and trained models are available at https://github.com/ydhongHIT/DDRNet. |
|---|---|
| Bibliography: | ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14 |
| ISSN: | 1524-9050 1558-0016 |
| DOI: | 10.1109/TITS.2022.3228042 |