Towards automatic transcription of Syriac handwriting

We describe a method implemented for the recognition of Syriac handwriting from historical manuscripts. The Syriac language has been a neglected area for handwriting recognition research, yet is interesting because the preponderance of scribe-written manuscripts offers a challenging yet tractable me...

Full description

Saved in:
Bibliographic Details
Published inImage Analysis and Processing: 12th International Conference On pp. 664 - 669
Main Author Clocksin, W.F.
Format Conference Proceeding
LanguageEnglish
Published IEEE 2003
Subjects
Online AccessGet full text
ISBN9780769519487
0769519482
DOI10.1109/ICIAP.2003.1234126

Cover

More Information
Summary:We describe a method implemented for the recognition of Syriac handwriting from historical manuscripts. The Syriac language has been a neglected area for handwriting recognition research, yet is interesting because the preponderance of scribe-written manuscripts offers a challenging yet tractable medium for OCR research between the extremes of typewritten text and free handwriting. Like Arabic, Syriac is written in a cursive form from right-to-left, and letter shape depends on the position within the word. The method described does not need to find character strokes or contours. Both whole words and character shapes were used in recognition experiments. After segmentation using a novel probabilistic method, features of these shapes are found that tolerate variation in formation and image quality. Each shape is recognised individually using a discriminative support vector machine with 10-fold cross-validation. We describe experiments using a variety of segmentation methods and combinations of features on characters and words. Images from scribe-written historical manuscripts are used, and the recognition results are compared with those for images taken from clearer 19th century typeset documents. Recognition rates vary from 61-100%, depending on the algorithms used and the size and source of the data set.
ISBN:9780769519487
0769519482
DOI:10.1109/ICIAP.2003.1234126