Enhanced tuberculosis detection using Vision Transformers and explainable AI with a Grad-CAM approach on chest X-rays

Tuberculosis (TB), caused by Mycobacterium tuberculosis, remains a leading global health challenge, especially in low-resource settings. Accurate diagnosis from chest X-rays is critical yet challenging due to subtle manifestations of TB, particularly in its early stages. Traditional computational me...

Full description

Saved in:

Bibliographic Details
Published in	BMC medical imaging Vol. 25; no. 1; pp. 96 - 16
Main Authors	Vanitha, K., Mahesh, T. R., Kumar, V. Vinoth, Guluwadi, Suresh
Format	Journal Article
Language	English
Published	London BioMed Central 24.03.2025 BioMed Central Ltd Springer Nature B.V BMC
Subjects	Accuracy Algorithms Artificial neural networks Batch processing Chest Chest X-rays Computer-aided manufacturing Datasets Decision making Deep learning Design Electric transformers Explainable AI Explainable artificial intelligence Global health Grad-CAM Humans Imaging Machine learning Medical imaging Medical imaging equipment Medicine Medicine & Public Health Neural networks Neural Networks, Computer Public health Radiographic Image Interpretation, Computer-Assisted - methods Radiography, Thoracic - methods Radiology Recall Self-attention Tuberculosis Tuberculosis - diagnostic imaging Tuberculosis detection Tuberculosis, Pulmonary - diagnostic imaging Vision Vision Transformer X-rays Deep learning Tuberculosis detection Explainable AI Medical imaging Chest X-rays Grad-CAM Diagnostic accuracy Self-attention Vision Transformer Convolutional neural networks
Online Access	Get full text
ISSN	1471-2342 1471-2342
DOI	10.1186/s12880-025-01630-3

Cover

More Information
Summary:	Tuberculosis (TB), caused by Mycobacterium tuberculosis, remains a leading global health challenge, especially in low-resource settings. Accurate diagnosis from chest X-rays is critical yet challenging due to subtle manifestations of TB, particularly in its early stages. Traditional computational methods, primarily using basic convolutional neural networks (CNNs), often require extensive pre-processing and struggle with generalizability across diverse clinical environments. This study introduces a novel Vision Transformer (ViT) model augmented with Gradient-weighted Class Activation Mapping (Grad-CAM) to enhance both diagnostic accuracy and interpretability. The ViT model utilizes self-attention mechanisms to extract long-range dependencies and complex patterns directly from the raw pixel information, whereas Grad-CAM offers visual explanations of model decisions about highlighting significant regions in the X-rays. The model contains a Conv2D stem for initial feature extraction, followed by many transformer encoder blocks, thereby significantly boosting its ability to learn discriminative features without any pre-processing. Performance testing on a validation set had an accuracy of 0.97, recall of 0.99, and F1-score of 0.98 for TB patients. On the test set, the model has accuracy of 0.98, recall of 0.97, and F1-score of 0.98, which is better than existing methods. The addition of Grad-CAM visuals not only improves the transparency of the model but also assists radiologists in assessing and verifying AI-driven diagnoses. These results demonstrate the model’s higher diagnostic precision and potential for clinical application in real-world settings, providing a massive improvement in the automated detection of TB.
Bibliography:	ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14 content type line 23
ISSN:	1471-2342 1471-2342
DOI:	10.1186/s12880-025-01630-3