MGRW-Transformer: Multigranularity Random Walk Transformer Model for Interpretable Learning

Deep-learning models have been widely used in image recognition tasks due to their strong feature-learning ability. However, most of the current deep-learning models are "black box" systems that lack a semantic explanation of how they reached their conclusions. This makes it difficult to a...

Full description

Saved in:

Bibliographic Details
Published in	IEEE transaction on neural networks and learning systems Vol. 36; no. 1; pp. 1104 - 1118
Main Authors	Ding, Weiping, Geng, Yu, Huang, Jiashuang, Ju, Hengrong, Wang, Haipeng, Lin, Chin-Teng
Format	Journal Article
Language	English
Published	United States IEEE 01.01.2025
Subjects	Classification algorithms Computational modeling Graph random walk interpretable method Lesions Medical diagnostic imaging multigranularity formal analysis Prediction algorithms self-attention mechanism Task analysis Transformers vision transformer (ViT)
Online Access	Get full text
ISSN	2162-237X 2162-2388 2162-2388
DOI	10.1109/TNNLS.2023.3326283

Cover

More Information
Summary:	Deep-learning models have been widely used in image recognition tasks due to their strong feature-learning ability. However, most of the current deep-learning models are "black box" systems that lack a semantic explanation of how they reached their conclusions. This makes it difficult to apply these methods to complex medical image recognition tasks. The vision transformer (ViT) model is the most commonly used deep-learning model with a self-attention mechanism that shows the region of influence as compared to traditional convolutional networks. Thus, ViT offers greater interpretability. However, medical images often contain lesions of variable size in different locations, which makes it difficult for a deep-learning model with a self-attention module to reach correct and explainable conclusions. We propose a multigranularity random walk transformer (MGRW-Transformer) model guided by an attention mechanism to find the regions that influence the recognition task. Our method divides the image into multiple subimage blocks and transfers them to the ViT module for classification. Simultaneously, the attention matrix output from the multiattention layer is fused with the multigranularity random walk module. Within the multigranularity random walk module, the segmented image blocks are used as nodes to construct an undirected graph using the attention node as a starting node and guiding the coarse-grained random walk. We appropriately divide the coarse blocks into finer ones to manage the computational cost and combine the results based on the importance of the discovered features. The result is that the model offers a semantic interpretation of the input image, a visualization of the interpretation, and insight into how the decision was reached. Experimental results show that our method improves classification performance with medical images while presenting an understandable interpretation for use by medical professionals.
Bibliography:	ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 23
ISSN:	2162-237X 2162-2388 2162-2388
DOI:	10.1109/TNNLS.2023.3326283