MCBTNet: Multi-Feature Fusion CNN and Bi- Level Routing Attention Transformer-Based Medical Image Segmentation Network

Accurate medical image segmentation is crucial for precise diagnosis and treatment in clinical pathology analysis and surgical navigation. While Convolutional Neural Network (CNN)-based approaches excel in capturing and analyzing local features, they often lose key global context. Transformers, util...

Full description

Saved in:

Bibliographic Details
Published in	IEEE journal of biomedical and health informatics Vol. 29; no. 7; pp. 5069 - 5082
Main Authors	Zhang, Boheng, Zheng, Zelin, Zhao, Yanqi, Shen, Yi, Sun, Mingjian
Format	Journal Article
Language	English
Published	United States IEEE 01.07.2025
Subjects	Accuracy Algorithms Attention mechanisms Brain - diagnostic imaging CNN Computational efficiency Computational modeling Convolutional neural networks Decoding Feature extraction Humans Image Interpretation, Computer-Assisted - methods Image Processing, Computer-Assisted - methods Image segmentation Medical image segmentation multi-dimensional attention multi-scale feature fusion Neural Networks, Computer Routing Transformer Transformers
Online Access	Get full text
ISSN	2168-2194 2168-2208 2168-2208
DOI	10.1109/JBHI.2025.3545398

Cover

More Information
Summary:	Accurate medical image segmentation is crucial for precise diagnosis and treatment in clinical pathology analysis and surgical navigation. While Convolutional Neural Network (CNN)-based approaches excel in capturing and analyzing local features, they often lose key global context. Transformers, utilizing self-attention mechanisms, address this issue but often overlook localized and multi-scale features while also requiring significant computational resources. To integrate the advantages of CNNs and Transformers to achieve efficient and precise medical image segmentation, we propose a segmentation framework based on multi-feature fusion CNN and Bi-level Routing Attention Transformer (MCBTNet). MCBTNet integrates CNNs and Transformers within a U-shaped encoder-decoder architecture. This configuration not only extracts multi-scale features via the U-shaped structure but also efficiently captures global contextual information through the dynamic sparsity of the Bi-Level Routing Attention Transformer. Our novel Frequency-Channel-Spatial multi-dimensional attention mechanism is implemented on skip connections, enhancing segmentation accuracy and speed by maximizing multi-scale feature utilization. Finally, MCBTNet obtains the segmentation result by fusing the predictions of different scales. Experimental results on five public datasets demonstrate that MCBTNet outperforms state-of-the-art methods in Dice and HD metrics, with lower computational and memory requirements.
Bibliography:	ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 23
ISSN:	2168-2194 2168-2208 2168-2208
DOI:	10.1109/JBHI.2025.3545398