Self Pre-Training with Adaptive Mask Autoencoders for Variable-Contrast 3D Medical Imaging

The Masked Autoencoder (MAE) has recently demonstrated effectiveness in pre-training Vision Transformers (ViT) for analyzing natural images. By reconstructing complete images from partially masked inputs, the ViT encoder gathers contextual information to predict the missing regions. This capability...

Full description

Saved in:
Bibliographic Details
Published inProceedings (International Symposium on Biomedical Imaging) pp. 1 - 5
Main Authors Das, Badhan Kumar, Zhao, Gengyan, Liu, Han, Re, Thomas J., Comaniciu, Dorin, Gibson, Eli, Maier, Andreas
Format Conference Proceeding
LanguageEnglish
Published IEEE 14.04.2025
Subjects
Online AccessGet full text
ISSN1945-8452
DOI10.1109/ISBI60581.2025.10981097

Cover

Abstract The Masked Autoencoder (MAE) has recently demonstrated effectiveness in pre-training Vision Transformers (ViT) for analyzing natural images. By reconstructing complete images from partially masked inputs, the ViT encoder gathers contextual information to predict the missing regions. This capability to aggregate context is especially important in medical imaging, where anatomical structures are functionally and mechanically linked to surrounding regions. However, current methods do not consider variations in the number of input images, which is typically the case in realworld Magnetic Resonance (MR) studies. To address this limitation, we propose a 3D Adaptive Masked Autoencoders (AMAE) architecture that accommodates a variable number of 3D input contrasts per subject. A magnetic resonance imaging (MRI) dataset of 45,364 subjects was used for pretraining and a subset of 1648 training, 193 validation and 215 test subjects were used for finetuning. The performance demonstrates that self pre-training of this adaptive masked autoencoders can enhance the infarct segmentation performance by 2.8%-3.7% for ViT-based segmentation models.
AbstractList The Masked Autoencoder (MAE) has recently demonstrated effectiveness in pre-training Vision Transformers (ViT) for analyzing natural images. By reconstructing complete images from partially masked inputs, the ViT encoder gathers contextual information to predict the missing regions. This capability to aggregate context is especially important in medical imaging, where anatomical structures are functionally and mechanically linked to surrounding regions. However, current methods do not consider variations in the number of input images, which is typically the case in realworld Magnetic Resonance (MR) studies. To address this limitation, we propose a 3D Adaptive Masked Autoencoders (AMAE) architecture that accommodates a variable number of 3D input contrasts per subject. A magnetic resonance imaging (MRI) dataset of 45,364 subjects was used for pretraining and a subset of 1648 training, 193 validation and 215 test subjects were used for finetuning. The performance demonstrates that self pre-training of this adaptive masked autoencoders can enhance the infarct segmentation performance by 2.8%-3.7% for ViT-based segmentation models.
Author Re, Thomas J.
Liu, Han
Das, Badhan Kumar
Maier, Andreas
Gibson, Eli
Zhao, Gengyan
Comaniciu, Dorin
Author_xml – sequence: 1
  givenname: Badhan Kumar
  surname: Das
  fullname: Das, Badhan Kumar
  organization: Siemens Healthineers AG
– sequence: 2
  givenname: Gengyan
  surname: Zhao
  fullname: Zhao, Gengyan
  organization: Siemens Medical Solutions USA, Inc
– sequence: 3
  givenname: Han
  surname: Liu
  fullname: Liu, Han
  organization: Siemens Medical Solutions USA, Inc
– sequence: 4
  givenname: Thomas J.
  surname: Re
  fullname: Re, Thomas J.
  organization: Siemens Medical Solutions USA, Inc
– sequence: 5
  givenname: Dorin
  surname: Comaniciu
  fullname: Comaniciu, Dorin
  organization: Siemens Medical Solutions USA, Inc
– sequence: 6
  givenname: Eli
  surname: Gibson
  fullname: Gibson, Eli
  organization: Siemens Medical Solutions USA, Inc
– sequence: 7
  givenname: Andreas
  surname: Maier
  fullname: Maier, Andreas
  organization: FAU Erlangen-Nuremberg
BookMark eNo1kNFKwzAYhaMoOOfeQDAv0Jk0SZNczjq1sKGw4oU342_zZ0a7dqRV8e0tqAcOB86B7-Kck5O2a5GQK87mnDN7XWxuiowpw-cpS9V8rMxofURmVlsjBFcpU2l2TCbcSpUYqdIzMuv7NzZKSymYnJCXDTaePkVMygihDe2OfoXhlS4cHIbwiXQN_TtdfAwdtnXnMPbUd5E-QwxQNZjkXTtE6AcqbukaXaihocUediPogpx6aHqc_eWUlHfLMn9IVo_3Rb5YJcGKIbF1ZbzjXBhXOa58ZqWtfJ1l2jvU2gmfWqfHTSiorckAFDhhmanAe-mcmJLLX2xAxO0hhj3E7-3_G-IHHUxXSQ
ContentType Conference Proceeding
DBID 6IE
6IL
CBEJK
RIE
RIL
DOI 10.1109/ISBI60581.2025.10981097
DatabaseName IEEE Electronic Library (IEL) Conference Proceedings
IEEE Xplore POP ALL
IEEE Xplore All Conference Proceedings
IEEE Electronic Library (IEL)
IEEE Proceedings Order Plans (POP All) 1998-Present
DatabaseTitleList
Database_xml – sequence: 1
  dbid: RIE
  name: IEEE Electronic Library (IEL)
  url: https://proxy.k.utb.cz/login?url=https://ieeexplore.ieee.org/
  sourceTypes: Publisher
DeliveryMethod fulltext_linktorsrc
Discipline Engineering
EISBN 9798331520526
EISSN 1945-8452
EndPage 5
ExternalDocumentID 10981097
Genre orig-research
GrantInformation_xml – fundername: Siemens Healthineers
  funderid: 10.13039/501100011699
GroupedDBID 23N
6IE
6IF
6IK
6IL
6IN
AAJGR
AAWTH
ABLEC
ADZIZ
ALMA_UNASSIGNED_HOLDINGS
BEFXN
BFFAM
BGNUA
BKEBE
BPEOZ
CBEJK
CHZPO
IEGSK
IPLJI
M43
OCL
RIE
RIL
RNS
ID FETCH-LOGICAL-i93t-9cb8fd1138dbd15f6949bfc667fde77d3f29d7dbd35ac986aa5ad3908baff4dd3
IEDL.DBID RIE
IngestDate Wed Aug 27 01:53:17 EDT 2025
IsPeerReviewed false
IsScholarly true
Language English
LinkModel DirectLink
MergedId FETCHMERGED-LOGICAL-i93t-9cb8fd1138dbd15f6949bfc667fde77d3f29d7dbd35ac986aa5ad3908baff4dd3
PageCount 5
ParticipantIDs ieee_primary_10981097
PublicationCentury 2000
PublicationDate 2025-April-14
PublicationDateYYYYMMDD 2025-04-14
PublicationDate_xml – month: 04
  year: 2025
  text: 2025-April-14
  day: 14
PublicationDecade 2020
PublicationTitle Proceedings (International Symposium on Biomedical Imaging)
PublicationTitleAbbrev ISBI
PublicationYear 2025
Publisher IEEE
Publisher_xml – name: IEEE
SSID ssj0000744304
Score 2.297934
Snippet The Masked Autoencoder (MAE) has recently demonstrated effectiveness in pre-training Vision Transformers (ViT) for analyzing natural images. By reconstructing...
SourceID ieee
SourceType Publisher
StartPage 1
SubjectTerms Adaptation models
Autoencoders
Biomedical imaging
Computer vision
Image segmentation
Magnetic resonance imaging
Masked Autoencoders
Self Pre-training
Solid modeling
Three-dimensional displays
Training
Transformers
Variable Inputs
Vision Transformer
Title Self Pre-Training with Adaptive Mask Autoencoders for Variable-Contrast 3D Medical Imaging
URI https://ieeexplore.ieee.org/document/10981097
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
link http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV1ba8IwFA7Tp-1lN8fu5GGv6VrTpsmju4gOlIFuyF4kaU5guKlo-7Jfv5xa3QUGeyvNKYSE9Mu5fN8h5MoIPyDijEmDotrWAFNGK2ZsYsEjRBMcEoV7fdF5ih9Gyagiq5dcGAAoi88gwMcyl29nWYGhMn_ClcSMaY3UUilWZK1NQMVjYex986qGy5tddwc3Xcz6oRvYTIL11z_6qJQw0t4l_fUEVtUjk6DITZB9_NJm_PcM90jji7FHHzdYtE-2YHpAdr6JDR6SlwG8OW8EbFi1haAYhKUtq-f4z6M9vZzQVpHPUNsS65upv9DSZ-9MI72KoY7VQi9zyu9old6h3feyyVGDDNv3w9sOqzorsFfFc6YyI52NIi6tsVHihIqVcZkQqbOQppa7prKpH-OJzpQUWifachVKo52LreVHpD6dTeGYUG-ohdEx-HsQKooayXWonAITqhC4PiENXKXxfKWdMV4v0Okf78_INm4W5mui-JzU80UBFx72c3NZbvcn7zyuvA
linkProvider IEEE
linkToHtml http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV3LS8MwGA86D-rF18S3OXhtbZdHm-N8seo2hFUZXkbSJCDTbWztxb_efF03HyB4K80XCAnJ9_z9PoQuFHcDnGZerIBUWyvjCSWFpzTTxmmIhrEAFO50eeuJ3vdZvwKrl1gYY0xZfGZ8-Cxz-XqcFRAqczdcxJAxXUVrjFLK5nCtZUjFaUPqvPOqissJXia9qwTyfuAINpi_mP-jk0qpSO62UHexhHn9yNAvcuVnH7_YGf-9xm1U_8Ls4celNtpBK2a0iza_0Q3uoZeeebNOyHhp1RgCQxgWN7WcwKuHO3I2xM0iHwO7JVQ4Y2fS4mfnTgPAygMmq6mc5Zjc4CrBg5P3ss1RHaV3t-l1y6t6K3ivguSeyFRsdRiSWCsdMssFFcpmnEdWmyjSxDaEjtwYYTITMZeSSU1EECtpLdWa7KPaaDwyBwg7QcmVpMZZQsApqmIiA2GFUYEIDJGHqA67NJjM2TMGiw06-uP_OVpvpZ32oJ10H47RBhwcZG9CeoJq-bQwp84IyNVZefSfRECyCQ
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Abook&rft.genre=proceeding&rft.title=Proceedings+%28International+Symposium+on+Biomedical+Imaging%29&rft.atitle=Self+Pre-Training+with+Adaptive+Mask+Autoencoders+for+Variable-Contrast+3D+Medical+Imaging&rft.au=Das%2C+Badhan+Kumar&rft.au=Zhao%2C+Gengyan&rft.au=Liu%2C+Han&rft.au=Re%2C+Thomas+J.&rft.date=2025-04-14&rft.pub=IEEE&rft.eissn=1945-8452&rft.spage=1&rft.epage=5&rft_id=info:doi/10.1109%2FISBI60581.2025.10981097&rft.externalDocID=10981097