X-SSL: Self-supervised X-ray threat detection with zero-shot and multi-modal learning
Automated X-ray threat detection is challenged by cluttered baggage scans, severe object occlusions, and the scarcity of annotated datasets. Traditional supervised approaches are impractical, as they require large amounts of labeled data, which is difficult to obtain given the rarity of threat objec...
Saved in:
Published in | Information processing & management Vol. 63; no. 1; p. 104339 |
---|---|
Main Authors | , , , , |
Format | Journal Article |
Language | English |
Published |
Elsevier Ltd
01.01.2026
|
Subjects | |
Online Access | Get full text |
ISSN | 0306-4573 |
DOI | 10.1016/j.ipm.2025.104339 |
Cover
Summary: | Automated X-ray threat detection is challenged by cluttered baggage scans, severe object occlusions, and the scarcity of annotated datasets. Traditional supervised approaches are impractical, as they require large amounts of labeled data, which is difficult to obtain given the rarity of threat objects. Unsupervised methods, on the other hand, often fail to differentiate between threat and nonthreat items due to the complex grayscale nature and high object overlap inherent in X-ray imagery. To overcome these limitations, we propose X-SSL a novel self-supervised learning framework that eliminates manual annotations designed to perform threat localization. Our approach integrates spatial region extraction using MaskCut for zero-shot object proposal generation, contrastive multi-modal clustering that leverages both image and text encoders to cluster and label proposals into threat and nonthreat categories, and self-supervised knowledge distillation where a teacher–student model refines multiscale features from global and local image crops for improved representation learning. We evaluated X-SSL on two benchmark datasets: PIDray (39,000+ images) and CLCXray (14,000+ images), demonstrating significant improvements over previous state-of-the-art (SOTA) methods. In the hidden PIDray subset, X-SSL improves detection AP to 18.16 (+4.41 AP) and segmentation AP to 13.76 (+1.01 AP) over previous methods. On CLCXray, it achieves 39.26 detection AP (+5.93 AP) and 38.67 segmentation AP (+10.20 AP), significantly surpassing previous approaches. For classification, X-SSL achieves an accuracy of 43% on PIDray Hidden and 65% on CLCXray, further highlighting its superior performance compared to existing weakly supervised and unsupervised methods. Code will be available here: https://github.com/yonathan-kiflom/X-SSL. |
---|---|
ISSN: | 0306-4573 |
DOI: | 10.1016/j.ipm.2025.104339 |