Text Detection and Language Identification in Natural Scene Images using YOLOv5
Deep learning has immensely evolved ever since digital era. Deep learning also includes feature extraction as a facet. Text snipping from a picture is a difficult task since the image comprises text in a variety of sizes, styles, orientations, alignments, low contrast, noise, and with a complicated...
Saved in:
Published in | International Conference on Computer Communication and Informatics (Online) pp. 1 - 7 |
---|---|
Main Authors | , , , , , , |
Format | Conference Proceeding |
Language | English |
Published |
IEEE
23.01.2023
|
Online Access | Get full text |
ISSN | 2473-7577 |
DOI | 10.1109/ICCCI56745.2023.10128400 |
Cover
Summary: | Deep learning has immensely evolved ever since digital era. Deep learning also includes feature extraction as a facet. Text snipping from a picture is a difficult task since the image comprises text in a variety of sizes, styles, orientations, alignments, low contrast, noise, and with a complicated backdrop structure. Transformation of an image into different perspective for feature identification is the first step towards text recognition. Scene texts provide rich contextual information that can be applied to several types of vision-based applications, hence over the last few years we have witnessed an increase in interest in the detection and recognition of scene texts. In order to address the issue of language detection from multilingual scene text photos, a deep learning-based solution is suggested in this paper. In this study, the underlying model of a Convolutional neural network is employed to detect objects in real-time with high accuracy. This study employs a single neural network "you only look once" known as YOLO, since it offers predictions with just a single forward propagation trip through the neural network to evaluate the full image. We used COCO 'Common Objects in Context' dataset which is a large-scale object detection, segmentation, and captioning dataset. To evaluate the image YOLO divides the image into smaller parts and forecasts boundary areas and probabilities for every part. The predicted probability weighs these region proposals. It then provides identified objects after non-max linear suppression. We used F1-score which combines accuracy and recall into a single metric by computing their harmonic means. |
---|---|
ISSN: | 2473-7577 |
DOI: | 10.1109/ICCCI56745.2023.10128400 |