An improved you only look once model for the multi-scale steel surface defect detection with multi-level alignment and cross-layer redistribution features

Steel surface defects involve a wide variety of sizes and irregular shapes. The performance of the detection models depends greatly on the effective extraction of the cross-scale features. The sequential feature fusion in the traditional Feature Pyramid Network (FPN) is subject to an information los...

Full description

Saved in:

Bibliographic Details
Published in	Engineering applications of artificial intelligence Vol. 145; p. 110214
Main Authors	Huang, Jianhang, Zhang, Xinliang, Jia, Lijie, Zhou, Yitian
Format	Journal Article
Language	English
Published	Elsevier Ltd 01.04.2025
Subjects	Feature misalignment Feature redistribution Multi-scale objects Steel surface defect You only Look once Steel surface defect Feature redistribution Feature misalignment You only Look once Multi-scale objects
Online Access	Get full text
ISSN	0952-1976
DOI	10.1016/j.engappai.2025.110214

Cover

More Information
Summary:	Steel surface defects involve a wide variety of sizes and irregular shapes. The performance of the detection models depends greatly on the effective extraction of the cross-scale features. The sequential feature fusion in the traditional Feature Pyramid Network (FPN) is subject to an information loss during the aggregation and transition across shallow and deep features. The classification and localization of steel surface defects using common You Only Look Once (YOLO) detection model may lead to sub-optimal performance when they are involved with the multi-scale objects. To realize a complete shallow-deep feature representation for different scale steel surface defects, an aggregation-redistribution network is introduced into the YOLO detection model to aggregate and refine features across different levels. In the aggregation sub-network, a Multi-level Alignment Module (MAM) is adopted to address the feature misalignment in FPN by aligning the feature maps from level-wise extractors. Therein, the scale deviation at pixels is compensated by the multiple parallel dilated convolutions. Meanwhile, in the redistribution sub-network, a Fusion-Redistribution Module (FRM) is constructed to impose the global information on the fused multi-level features to steer cross-layerly the generation of prediction feature maps for YOLO. The global feature provides a semantic adaptive weight and serves as a complement on the multi-level features through an attention mechanism. Finally, by incorporating the aggregation-redistribution network into YOLOv5, an improved YOLO detection model, i.e., Aggregation-Redistribution YOLO (ARYOLO), is derived for the steel surface defects. The validation results indicate that ARYOLO has achieved a rather satisfactory detection performance for the defects with large variations in scale and shape. It provides a mean average precision of 80.7% on the North Eastern University Surface Defect Dataset (NEU-DET) and 71.2% on the Global 10 Class Metallic Surface Defect Dataset (GC10-DET), giving a great potential in detection and localization tasks.
ISSN:	0952-1976
DOI:	10.1016/j.engappai.2025.110214