Automatic image captioning in Thai for house defect using a deep learning-based approach
This study aims to automate the reporting process of house inspections, which enables prospective buyers to make informed decisions. Currently, the inspection report generated by an inspector involves inserting all defect images into a spreadsheet software and manually captioning each image with ide...
Saved in:
Published in | Advances in computational intelligence Vol. 4; no. 1; p. 1 |
---|---|
Main Authors | , , , , , |
Format | Journal Article |
Language | English |
Published |
Cham
Springer International Publishing
01.03.2024
Springer Nature B.V |
Subjects | |
Online Access | Get full text |
ISSN | 2730-7794 2730-7808 |
DOI | 10.1007/s43674-023-00068-w |
Cover
Summary: | This study aims to automate the reporting process of house inspections, which enables prospective buyers to make informed decisions. Currently, the inspection report generated by an inspector involves inserting all defect images into a spreadsheet software and manually captioning each image with identified defects. To the best of our knowledge, there are no previous works or datasets that have automated this process. Therefore, this paper proposes a new image captioning dataset for house defect inspection, which is benchmarked with three deep learning-based models. Our models are based on the encoder–decoder architecture where three image encoders (i.e., VGG16, MobileNet, and InceptionV3) and one GRU-based decoder with an additive attention mechanism of Bahdanau are experimented. The experimental results indicate that, despite similar training losses in all models, VGG16 takes the least time to train a model, while MobileNet achieves the highest BLEU-1 to BLEU-4 scores of 0.866, 0.850, 0.823, and 0.728, respectively. However, InceptionV3 is suggested as the optimal model, since it outperforms the others in terms of accurate attention plots and its BLEU scores are comparable to the best scores obtained by MobileNet. |
---|---|
Bibliography: | ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14 |
ISSN: | 2730-7794 2730-7808 |
DOI: | 10.1007/s43674-023-00068-w |