Automatic image captioning in Thai for house defect using a deep learning-based approach

This study aims to automate the reporting process of house inspections, which enables prospective buyers to make informed decisions. Currently, the inspection report generated by an inspector involves inserting all defect images into a spreadsheet software and manually captioning each image with ide...

Full description

Saved in:

Bibliographic Details
Published in	Advances in computational intelligence Vol. 4; no. 1; p. 1
Main Authors	Jaruschaimongkol, Manadda, Satirapiwong, Krittin, Pipatsattayanuwong, Kittipan, Temviriyakul, Suwant, Sangprasert, Ratchanat, Siriborvornratanakul, Thitirat
Format	Journal Article
Language	English
Published	Cham Springer International Publishing 01.03.2024 Springer Nature B.V
Subjects	Artificial Intelligence Coders Computational Intelligence Datasets Deep learning Defects Engineering Inspection Language Machine Learning Neural networks Original Article Performance evaluation Repair & maintenance Thai image captioning House defect Attention
Online Access	Get full text
ISSN	2730-7794 2730-7808
DOI	10.1007/s43674-023-00068-w

Cover

More Information
Summary:	This study aims to automate the reporting process of house inspections, which enables prospective buyers to make informed decisions. Currently, the inspection report generated by an inspector involves inserting all defect images into a spreadsheet software and manually captioning each image with identified defects. To the best of our knowledge, there are no previous works or datasets that have automated this process. Therefore, this paper proposes a new image captioning dataset for house defect inspection, which is benchmarked with three deep learning-based models. Our models are based on the encoder–decoder architecture where three image encoders (i.e., VGG16, MobileNet, and InceptionV3) and one GRU-based decoder with an additive attention mechanism of Bahdanau are experimented. The experimental results indicate that, despite similar training losses in all models, VGG16 takes the least time to train a model, while MobileNet achieves the highest BLEU-1 to BLEU-4 scores of 0.866, 0.850, 0.823, and 0.728, respectively. However, InceptionV3 is suggested as the optimal model, since it outperforms the others in terms of accurate attention plots and its BLEU scores are comparable to the best scores obtained by MobileNet.
Bibliography:	ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14
ISSN:	2730-7794 2730-7808
DOI:	10.1007/s43674-023-00068-w