EPSegNet: Lightweight Semantic Recalibration and Assembly for Efficient Polyp Segmentation

Colorectal cancer (CRC) is among the most common malignancies and the detection and removal of polyps at the early stage is of great importance to prevent it. However, current state-of-the-art high-accuracy methods for polyps segmentation have a large number of parameters and a stringent requirement...

Full description

Saved in:
Bibliographic Details
Published inIEEE transaction on neural networks and learning systems Vol. 36; no. 8; pp. 13805 - 13817
Main Authors Wu, Huisi, Zhao, Zebin
Format Journal Article
LanguageEnglish
Published United States IEEE 01.08.2025
Subjects
Online AccessGet full text
ISSN2162-237X
2162-2388
2162-2388
DOI10.1109/TNNLS.2025.3527557

Cover

More Information
Summary:Colorectal cancer (CRC) is among the most common malignancies and the detection and removal of polyps at the early stage is of great importance to prevent it. However, current state-of-the-art high-accuracy methods for polyps segmentation have a large number of parameters and a stringent requirement for computational cost, while lightweight and fast models significantly sacrifice accuracy. Currently, medical semantic segmentation algorithms are mostly based on encoder-decoder architecture. Pixelwise spatial information has been proven to be very important to the quality of features extracted by encoders. However, almost all existing approaches capturing it suffer from high computational complexity. Furthermore, the capacity of the traditional decoder is limited by its limited receptive fields. To comprehensively address the above problems, we propose a novel efficient polyp segmentation network (EPSegNet) to simultaneously fulfill the requirements of accuracy, size, and speed. First, we propose a lightweight feature extraction and recalibration module (LFERM), which can efficiently extract dense multiscale features. Specifically, in LFERM, we propose a spatial information recalibration (SIR) block for efficiently refining spatial information. Based on LFERMs, we develop an encoder. Moreover, we propose a novel lightweight semantic assembly decoder (LSAD) that assembles both channelwise and pixelwise semantics from a global context view. Finally, we combine the encoder and LSAD to form the proposed EPSegNet. Experiments on Kvasir-SEG, CVC-ClinicDB, and CVC-ColonDB datasets demonstrate that the proposed EPSegNet achieves the best balance between accuracy and size among state-of-the-art models and obtains a fast speed for polyp segmentation. Without any pretraining and postprocessing, our method achieves 79.37% intersection over union (IoU) and 86.74% Dice on the Kvasie-SEG dataset with only 0.34 million parameters and a speed of 128 frames/s (FPS) at the input size of <inline-formula> <tex-math notation="LaTeX">3 { \times }384 \times 384 </tex-math></inline-formula> on a single NVIDIA GEFORCE RTX 2080Ti card. Codes will be released upon publication.
Bibliography:ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 23
ISSN:2162-237X
2162-2388
2162-2388
DOI:10.1109/TNNLS.2025.3527557