Self-supervised learning framework for efficient classification of endoscopic images using pretext tasks

Identifying anatomical landmarks in endoscopic video frames is essential for the early diagnosis of gastrointestinal diseases. However, this task remains challenging due to variability in visual characteristics across different regions and the limited availability of annotated data. In this study, w...

Full description

Saved in:

Bibliographic Details
Published in	PloS one Vol. 20; no. 5; p. e0322028
Main Authors	Nezhad, Shima Ayyoubi, Tajeddin, Golnaz, Khatibi, Toktam, Sohrabi, Masoudreza
Format	Journal Article
Language	English
Published	United States Public Library of Science 08.05.2025
Subjects	Accuracy Algorithms Artificial intelligence Attention Attention task Biology and Life Sciences Classification Cognition & reasoning Colorization Computer and Information Sciences Computer vision Curricula Datasets Decision making Deep learning Diagnostic imaging Endoscopy Endoscopy - methods Endoscopy, Gastrointestinal - methods Feature extraction Frames (data processing) Gastrointestinal diseases Health aspects Humans Image analysis Image classification Image processing Image Processing, Computer-Assisted - methods Jigsaw puzzles Learning Machine learning Medical imaging Medical imaging equipment Medicine and Health Sciences Methods Neural networks Puzzles Real time Recall Research and Analysis Methods Self-supervised learning Social Sciences Success Supervised Machine Learning Ultrasonic imaging Iran Taiwan
Online Access	Get full text
ISSN	1932-6203 1932-6203
DOI	10.1371/journal.pone.0322028

Cover

More Information
Summary:	Identifying anatomical landmarks in endoscopic video frames is essential for the early diagnosis of gastrointestinal diseases. However, this task remains challenging due to variability in visual characteristics across different regions and the limited availability of annotated data. In this study, we propose a novel self-supervised learning (SSL) framework that integrates three complementary pretext task, colorization, jigsaw puzzle solving, and patch prediction, to enhance feature learning from unlabeled endoscopic images. By leveraging these tasks, our model extracts rich, meaningful representations, improving the downstream classification of Z-line, esophageal, and antrum/pylorus regions. To further enhance feature extraction and model interpretability, we incorporate attention mechanisms, transformer-based architectures, and Grad-CAM visualization. The integration of attention layers and transformers strengthens the model’s ability to learn discriminative and generalizable features, while Grad-CAM improves explainability by highlighting critical decision-making regions. These enhancements make our approach more suitable for clinical deployment, ensuring both high accuracy and interpretability. We evaluate our proposed framework on a comprehensive dataset, demonstrating substantial improvements in classification accuracy, precision, recall, and F1-score compared to conventional models trained without SSL. Specifically, our combined model achieves a classification accuracy of 98%, with high precision and recall across all classes, as reflected in ROC curves and confusion matrices. These results underscore the effectiveness of pretext-task-based SSL, attention mechanism, and transformers for anatomical landmark identification in endoscopic video frames. Our work introduces a scalable and interpretable methodology for improving endoscopic image classification, reducing reliance on large annotated datasets while enhancing model performance in real-world clinical applications. Future research will explore validation on diverse datasets, real-time diagnostic integration, and scalability to further advance medical image analysis using SSL.
Bibliography:	ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14 content type line 23 Competing Interests: The authors have declared that no competing interests exist.
ISSN:	1932-6203 1932-6203
DOI:	10.1371/journal.pone.0322028