X-SSL: Self-supervised X-ray threat detection with zero-shot and multi-modal learning

Automated X-ray threat detection is challenged by cluttered baggage scans, severe object occlusions, and the scarcity of annotated datasets. Traditional supervised approaches are impractical, as they require large amounts of labeled data, which is difficult to obtain given the rarity of threat objec...

Full description

Saved in:
Bibliographic Details
Published inInformation processing & management Vol. 63; no. 1; p. 104339
Main Authors Michael, Yonathan, Alansari, Mohamad, Ahmed, Abdelfatah, Werghi, Naoufel, Henschel, Andreas
Format Journal Article
LanguageEnglish
Published Elsevier Ltd 01.01.2026
Subjects
Online AccessGet full text
ISSN0306-4573
DOI10.1016/j.ipm.2025.104339

Cover

More Information
Summary:Automated X-ray threat detection is challenged by cluttered baggage scans, severe object occlusions, and the scarcity of annotated datasets. Traditional supervised approaches are impractical, as they require large amounts of labeled data, which is difficult to obtain given the rarity of threat objects. Unsupervised methods, on the other hand, often fail to differentiate between threat and nonthreat items due to the complex grayscale nature and high object overlap inherent in X-ray imagery. To overcome these limitations, we propose X-SSL a novel self-supervised learning framework that eliminates manual annotations designed to perform threat localization. Our approach integrates spatial region extraction using MaskCut for zero-shot object proposal generation, contrastive multi-modal clustering that leverages both image and text encoders to cluster and label proposals into threat and nonthreat categories, and self-supervised knowledge distillation where a teacher–student model refines multiscale features from global and local image crops for improved representation learning. We evaluated X-SSL on two benchmark datasets: PIDray (39,000+ images) and CLCXray (14,000+ images), demonstrating significant improvements over previous state-of-the-art (SOTA) methods. In the hidden PIDray subset, X-SSL improves detection AP to 18.16 (+4.41 AP) and segmentation AP to 13.76 (+1.01 AP) over previous methods. On CLCXray, it achieves 39.26 detection AP (+5.93 AP) and 38.67 segmentation AP (+10.20 AP), significantly surpassing previous approaches. For classification, X-SSL achieves an accuracy of 43% on PIDray Hidden and 65% on CLCXray, further highlighting its superior performance compared to existing weakly supervised and unsupervised methods. Code will be available here: https://github.com/yonathan-kiflom/X-SSL.
ISSN:0306-4573
DOI:10.1016/j.ipm.2025.104339