Script independent text segmentation of document images using graph network based shortest path scheme

Document image processing is one of the growing research fields in the digital world for applications like data base indexing, text recognition, signature verification, web-searching engines, etc. Segmenting intermixed texts (handwritten and machine-printed) from documents is a difficult task. In th...

Full description

Saved in:
Bibliographic Details
Published inInternational journal of information technology (Singapore. Online) Vol. 15; no. 4; pp. 2247 - 2261
Main Authors Sahare, Parul, Tembhurne, Jitendra V., Parate, Mayur R., Diwan, Tausif, Dhok, Sanjay B.
Format Journal Article
LanguageEnglish
Published Singapore Springer Nature Singapore 01.04.2023
Springer Nature B.V
Subjects
Online AccessGet full text
ISSN2511-2104
2511-2112
DOI10.1007/s41870-023-01230-w

Cover

More Information
Summary:Document image processing is one of the growing research fields in the digital world for applications like data base indexing, text recognition, signature verification, web-searching engines, etc. Segmenting intermixed texts (handwritten and machine-printed) from documents is a difficult task. In this paper, script independent text-line and word segmentation techniques are proposed. For text-line segmentation, Dijkstra’s algorithm is employed, whereas for segmenting words, wavelet transform is used. Text-line segmentation is modeled as a general image segmentation task. Dijkstra’s algorithm is a shortest path planning method, which is utilized for boundary growing process. This forms potential text-line boundary regions. For word segmentation, energy map is calculated first using wavelet transform and further, Gaussian filter is used for text-blocks creation. Proposed techniques are evaluated on different databases contain documents of different scripts. Benchmarking analysis is performed with other approaches where highest segmentation accuracies of 97.6% and 98.1% are obtained by text-line and word segmentation techniques, respectively.
Bibliography:ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 14
ISSN:2511-2104
2511-2112
DOI:10.1007/s41870-023-01230-w