MCBTNet: Multi-Feature Fusion CNN and Bi- Level Routing Attention Transformer-Based Medical Image Segmentation Network

Accurate medical image segmentation is crucial for precise diagnosis and treatment in clinical pathology analysis and surgical navigation. While Convolutional Neural Network (CNN)-based approaches excel in capturing and analyzing local features, they often lose key global context. Transformers, util...

Full description

Saved in:
Bibliographic Details
Published inIEEE journal of biomedical and health informatics Vol. 29; no. 7; pp. 5069 - 5082
Main Authors Zhang, Boheng, Zheng, Zelin, Zhao, Yanqi, Shen, Yi, Sun, Mingjian
Format Journal Article
LanguageEnglish
Published United States IEEE 01.07.2025
Subjects
Online AccessGet full text
ISSN2168-2194
2168-2208
2168-2208
DOI10.1109/JBHI.2025.3545398

Cover

More Information
Summary:Accurate medical image segmentation is crucial for precise diagnosis and treatment in clinical pathology analysis and surgical navigation. While Convolutional Neural Network (CNN)-based approaches excel in capturing and analyzing local features, they often lose key global context. Transformers, utilizing self-attention mechanisms, address this issue but often overlook localized and multi-scale features while also requiring significant computational resources. To integrate the advantages of CNNs and Transformers to achieve efficient and precise medical image segmentation, we propose a segmentation framework based on multi-feature fusion CNN and Bi-level Routing Attention Transformer (MCBTNet). MCBTNet integrates CNNs and Transformers within a U-shaped encoder-decoder architecture. This configuration not only extracts multi-scale features via the U-shaped structure but also efficiently captures global contextual information through the dynamic sparsity of the Bi-Level Routing Attention Transformer. Our novel Frequency-Channel-Spatial multi-dimensional attention mechanism is implemented on skip connections, enhancing segmentation accuracy and speed by maximizing multi-scale feature utilization. Finally, MCBTNet obtains the segmentation result by fusing the predictions of different scales. Experimental results on five public datasets demonstrate that MCBTNet outperforms state-of-the-art methods in Dice and HD metrics, with lower computational and memory requirements.
Bibliography:ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 23
ISSN:2168-2194
2168-2208
2168-2208
DOI:10.1109/JBHI.2025.3545398