UltrasOM: A mamba-based network for 3D freehand ultrasound reconstruction using optical flow

•The UltrasOM model introduces an innovative video embedding module that integrates optical flow dynamics with raw static information, enhancing feature representation and providing a robust foundation for downstream tasks.•Based on the Mamba architecture, the spatiotemporal attention module uses mu...

Full description

Saved in:
Bibliographic Details
Published inComputer methods and programs in biomedicine Vol. 268; p. 108843
Main Authors Sun, Rui, Liu, Chuanba, Wang, Wenshuo, Song, Yimin, Sun, Tao
Format Journal Article
LanguageEnglish
Published Ireland Elsevier B.V 01.08.2025
Subjects
Online AccessGet full text
ISSN0169-2607
1872-7565
1872-7565
DOI10.1016/j.cmpb.2025.108843

Cover

More Information
Summary:•The UltrasOM model introduces an innovative video embedding module that integrates optical flow dynamics with raw static information, enhancing feature representation and providing a robust foundation for downstream tasks.•Based on the Mamba architecture, the spatiotemporal attention module uses multi-layer Space-Time Blocks to capture global spatiotemporal correlations, enabling efficient information extraction in complex or extended scenarios.•UltrasOM incorporates correlation and motion velocity losses to improve generalization, stability, and accuracy under varying scanning speeds and postures.•Experiments show UltrasOM outperforms existing models in drift rate, distance error, and translational and rotational accuracy, demonstrating its technical superiority in 3D ultrasound reconstruction. Three-dimensional (3D) ultrasound (US) reconstruction is of significant value in clinical diagnosis, characterized by its safety, portability, low cost, and high real-time capabilities. 3D freehand ultrasound reconstruction aims to eliminate the need for tracking devices, relying solely on image data to infer the spatial relationships between frames. However, inherent jitter during handheld scanning introduces significant inaccuracies, making current methods ineffective in precisely predicting the spatial motions of ultrasound image frames. This leads to substantial cumulative errors over long sequence modeling, resulting in deformations or artifacts in the reconstructed volume. To address these challenges, we proposed UltrasOM, a 3D ultrasound reconstruction network designed for spatial relative motion estimation. Initially, we designed a video embedding module that integrates optical flow dynamics with original static information to enhance motion change features between frames. Next, we developed a Mamba-based spatiotemporal attention module, utilizing multi-layer stacked Space-Time Blocks to effectively capture global spatiotemporal correlations within video frame sequences. Finally, we incorporated correlation loss and motion speed loss to prevent overfitting related to scanning speed and pose, enhancing the model's generalization capability. Experimental results on a dataset of 200 forearm cases, comprising 58,011 frames, demonstrated that the proposed method achieved a final drift rate (FDR) of 10.24 %, a frame-to-frame distance error (DE) of 7.34 mm, a symmetric Hausdorff distance error (HD) of 10.81 mm, and a mean angular error (MEA) of 2.05°, outperforming state-of-the-art methods by 13.24 %, 15.11 %, 3.57 %, and 6.32 %, respectively. By integrating optical flow features and deeply exploring contextual spatiotemporal dependencies, the proposed network can directly predict the relative motions between multiple frames of ultrasound images without the need for tracking, surpassing the accuracy of existing methods.
Bibliography:ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 23
ISSN:0169-2607
1872-7565
1872-7565
DOI:10.1016/j.cmpb.2025.108843