A 3D medical image segmentation network based on gated attention blocks and dual-scale cross-attention mechanism

In the field of multi-organ 3D medical image segmentation, Convolutional Neural Networks (CNNs) are limited to extracting local feature information, while Transformer-based architectures suffer from high computational complexity and inadequate extraction of spatial and channel layer information. Mor...

Full description

Saved in:
Bibliographic Details
Published inScientific reports Vol. 15; no. 1; pp. 6159 - 21
Main Authors Jiang, Chunhui, Wang, Yi, Yuan, Qingni, Qu, Pengju, Li, Heng
Format Journal Article
LanguageEnglish
Published London Nature Publishing Group UK 20.02.2025
Nature Publishing Group
Nature Portfolio
Subjects
Online AccessGet full text
ISSN2045-2322
2045-2322
DOI10.1038/s41598-025-90339-y

Cover

More Information
Summary:In the field of multi-organ 3D medical image segmentation, Convolutional Neural Networks (CNNs) are limited to extracting local feature information, while Transformer-based architectures suffer from high computational complexity and inadequate extraction of spatial and channel layer information. Moreover, the large number and varying sizes of organs to be segmented result in suboptimal model robustness and segmentation outcomes. To address these challenges, this paper introduces a novel network architecture, DS-UNETR++, specifically designed for 3D medical image segmentation. The proposed network features a dual-branch feature encoding mechanism that categorizes images into coarse-grained and fine-grained types before processing them through the encoding blocks. Each encoding block comprises a downsampling layer and a Gated Shared Weighted Pairwise Attention (G-SWPA) submodule, which dynamically adjusts the influence of spatial and channel attention on feature extraction. Additionally, a Gated Dual-Scale Cross-Attention Module (G-DSCAM) is incorporated at the bottleneck stage. This module employs dimensionality reduction techniques to cross-coarse-grained and fine-grained features, using a gating mechanism to dynamically balance the ratio of these two types of feature information, thereby achieving effective multi-scale feature fusion. Finally, comprehensive evaluations were conducted on four public medical datasets. Experimental results demonstrate that DS-UNETR++ achieves good segmentation performance, highlighting the effectiveness and significance of the proposed method and offering new insights for various organ segmentation tasks.
Bibliography:ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 14
content type line 23
ISSN:2045-2322
2045-2322
DOI:10.1038/s41598-025-90339-y