Layout Analysis for Arabic Historical Document Images Using Machine Learning
Page layout analysis is a fundamental step of any document image understanding system. We introduce an approach that segments text appearing in page margins (a.k.a side-notes text) from manuscripts with complex layout format. Simple and discriminative features are extracted in a connected-component...
        Saved in:
      
    
          | Published in | 2012 International Conference on Frontiers in Handwriting Recognition pp. 639 - 644 | 
|---|---|
| Main Authors | , , , | 
| Format | Conference Proceeding | 
| Language | English | 
| Published | 
            IEEE
    
        01.09.2012
     | 
| Subjects | |
| Online Access | Get full text | 
| ISBN | 9781467322621 1467322628  | 
| DOI | 10.1109/ICFHR.2012.227 | 
Cover
| Summary: | Page layout analysis is a fundamental step of any document image understanding system. We introduce an approach that segments text appearing in page margins (a.k.a side-notes text) from manuscripts with complex layout format. Simple and discriminative features are extracted in a connected-component level and subsequently robust feature vectors are generated. Multilayer perception classifier is exploited to classify connected components to the relevant class of text. A voting scheme is then applied to refine the resulting segmentation and produce the final classification. In contrast to state-of-the-art segmentation approaches, this method is independent of block segmentation, as well as pixel level analysis. The proposed method has been trained and tested on a dataset that contains a variety of complex side-notes layout formats, achieving a segmentation accuracy of about 95%. | 
|---|---|
| ISBN: | 9781467322621 1467322628  | 
| DOI: | 10.1109/ICFHR.2012.227 |