Script independent text segmentation of document images using graph network based shortest path scheme
Document image processing is one of the growing research fields in the digital world for applications like data base indexing, text recognition, signature verification, web-searching engines, etc. Segmenting intermixed texts (handwritten and machine-printed) from documents is a difficult task. In th...
Saved in:
| Published in | International journal of information technology (Singapore. Online) Vol. 15; no. 4; pp. 2247 - 2261 |
|---|---|
| Main Authors | , , , , |
| Format | Journal Article |
| Language | English |
| Published |
Singapore
Springer Nature Singapore
01.04.2023
Springer Nature B.V |
| Subjects | |
| Online Access | Get full text |
| ISSN | 2511-2104 2511-2112 |
| DOI | 10.1007/s41870-023-01230-w |
Cover
| Summary: | Document image processing is one of the growing research fields in the digital world for applications like data base indexing, text recognition, signature verification, web-searching engines, etc. Segmenting intermixed texts (handwritten and machine-printed) from documents is a difficult task. In this paper, script independent text-line and word segmentation techniques are proposed. For text-line segmentation, Dijkstra’s algorithm is employed, whereas for segmenting words, wavelet transform is used. Text-line segmentation is modeled as a general image segmentation task. Dijkstra’s algorithm is a shortest path planning method, which is utilized for boundary growing process. This forms potential text-line boundary regions. For word segmentation, energy map is calculated first using wavelet transform and further, Gaussian filter is used for text-blocks creation. Proposed techniques are evaluated on different databases contain documents of different scripts. Benchmarking analysis is performed with other approaches where highest segmentation accuracies of 97.6% and 98.1% are obtained by text-line and word segmentation techniques, respectively. |
|---|---|
| Bibliography: | ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14 |
| ISSN: | 2511-2104 2511-2112 |
| DOI: | 10.1007/s41870-023-01230-w |