U-Net vs Transformer: Is U-Net Outdated in Medical Image Registration?
Due to their extreme long-range modeling capability, vision transformer-based networks have become increasingly popular in deformable image registration. We believe, however, that the receptive field of a 5-layer convolutional U-Net is sufficient to capture accurate deformations without needing long...
Saved in:
Published in | Machine Learning in Medical Imaging Vol. 13583; pp. 151 - 160 |
---|---|
Main Authors | , , , , , |
Format | Book Chapter |
Language | English |
Published |
Switzerland
Springer
2022
Springer Nature Switzerland |
Series | Lecture Notes in Computer Science |
Online Access | Get full text |
ISBN | 9783031210136 3031210131 |
ISSN | 0302-9743 1611-3349 |
DOI | 10.1007/978-3-031-21014-3_16 |
Cover
Summary: | Due to their extreme long-range modeling capability, vision transformer-based networks have become increasingly popular in deformable image registration. We believe, however, that the receptive field of a 5-layer convolutional U-Net is sufficient to capture accurate deformations without needing long-range dependencies. The purpose of this study is therefore to investigate whether U-Net-based methods are outdated compared to modern transformer-based approaches when applied to medical image registration. For this, we propose a large kernel U-Net (LKU-Net) by embedding a parallel convolutional block to a vanilla U-Net in order to enhance the effective receptive field. On the public 3D IXI brain dataset for atlas-based registration, we show that the performance of the vanilla U-Net is already comparable with that of state-of-the-art transformer-based networks (such as TransMorph), and that the proposed LKU-Net outperforms TransMorph by using only 1.12% of its parameters and 10.8% of its mult-adds operations. We further evaluate LKU-Net on a MICCAI Learn2Reg 2021 challenge dataset for inter-subject registration, our LKU-Net also outperforms TransMorph on this dataset and ranks first on the public leaderboard as of the submission of this work. With only modest modifications to the vanilla U-Net, we show that U-Net can outperform transformer-based architectures on inter-subject and atlas-based 3D medical image registration. Code is available at https://github.com/xi-jia/LKU-Net. |
---|---|
ISBN: | 9783031210136 3031210131 |
ISSN: | 0302-9743 1611-3349 |
DOI: | 10.1007/978-3-031-21014-3_16 |