Layout Analysis for Arabic Historical Document Images Using Machine Learning
Page layout analysis is a fundamental step of any document image understanding system. We introduce an approach that segments text appearing in page margins (a.k.a side-notes text) from manuscripts with complex layout format. Simple and discriminative features are extracted in a connected-component...
        Saved in:
      
    
          | Published in | 2012 International Conference on Frontiers in Handwriting Recognition pp. 639 - 644 | 
|---|---|
| Main Authors | , , , | 
| Format | Conference Proceeding | 
| Language | English | 
| Published | 
            IEEE
    
        01.09.2012
     | 
| Subjects | |
| Online Access | Get full text | 
| ISBN | 9781467322621 1467322628  | 
| DOI | 10.1109/ICFHR.2012.227 | 
Cover
| Abstract | Page layout analysis is a fundamental step of any document image understanding system. We introduce an approach that segments text appearing in page margins (a.k.a side-notes text) from manuscripts with complex layout format. Simple and discriminative features are extracted in a connected-component level and subsequently robust feature vectors are generated. Multilayer perception classifier is exploited to classify connected components to the relevant class of text. A voting scheme is then applied to refine the resulting segmentation and produce the final classification. In contrast to state-of-the-art segmentation approaches, this method is independent of block segmentation, as well as pixel level analysis. The proposed method has been trained and tested on a dataset that contains a variety of complex side-notes layout formats, achieving a segmentation accuracy of about 95%. | 
    
|---|---|
| AbstractList | Page layout analysis is a fundamental step of any document image understanding system. We introduce an approach that segments text appearing in page margins (a.k.a side-notes text) from manuscripts with complex layout format. Simple and discriminative features are extracted in a connected-component level and subsequently robust feature vectors are generated. Multilayer perception classifier is exploited to classify connected components to the relevant class of text. A voting scheme is then applied to refine the resulting segmentation and produce the final classification. In contrast to state-of-the-art segmentation approaches, this method is independent of block segmentation, as well as pixel level analysis. The proposed method has been trained and tested on a dataset that contains a variety of complex side-notes layout formats, achieving a segmentation accuracy of about 95%. | 
    
| Author | Bukhari, S. S. El-Sana, J. Breuel, T. M. Asi, A.  | 
    
| Author_xml | – sequence: 1 givenname: S. S. surname: Bukhari fullname: Bukhari, S. S. email: bukhari@informatik.uni-kl.de organization: Tech. Univ. of Kaiserslautern, Kaiserslautern, Germany – sequence: 2 givenname: T. M. surname: Breuel fullname: Breuel, T. M. email: tmb@informatik.uni-kl.de organization: Tech. Univ. of Kaiserslautern, Kaiserslautern, Germany – sequence: 3 givenname: A. surname: Asi fullname: Asi, A. email: abedas@cs.bgu.ac.il organization: Ben-Gurion Univ. of the Negev, Beer-Sheva, Israel – sequence: 4 givenname: J. surname: El-Sana fullname: El-Sana, J. email: el-sana@cs.bgu.ac.il organization: Ben-Gurion Univ. of the Negev, Beer-Sheva, Israel  | 
    
| BookMark | eNotj1FLwzAUhQM60M2--uJL_sDqTXLbNI-jOleoCOKex02bzkiXStM99N870acD5-N8cJbsOgzBMXYvIBUCzGNVbnfvqQQhUyn1FUuMLgTmWkmZS7Fgy19klIECb1gS4xcAXIYaBNyyuqZ5OE98E6ifo4-8G0a-Gcn6hu98nIbRN9Tzp6E5n1yYeHWio4t8H3048ldqPn1wvHY0hktxxxYd9dEl_7li--3zR7lb128vVbmp117obFoLrYU1WrdWOgmaRIutsqAyaXJ0BKJrCm0xQ2eVROqALn8sZNi0aArM1Io9_Hm9c-7wPfoTjfMhR4mYF-oHvEJOxg | 
    
| CODEN | IEEPAD | 
    
| ContentType | Conference Proceeding | 
    
| DBID | 6IE 6IL CBEJK RIE RIL  | 
    
| DOI | 10.1109/ICFHR.2012.227 | 
    
| DatabaseName | IEEE Electronic Library (IEL) Conference Proceedings IEEE Xplore POP ALL IEEE Xplore All Conference Proceedings IEEE Electronic Library (IEL) IEEE Proceedings Order Plans (POP All) 1998-Present  | 
    
| DatabaseTitleList | |
| Database_xml | – sequence: 1 dbid: RIE name: IEEE Electronic Library (IEL) url: https://proxy.k.utb.cz/login?url=https://ieeexplore.ieee.org/ sourceTypes: Publisher  | 
    
| DeliveryMethod | fulltext_linktorsrc | 
    
| EndPage | 644 | 
    
| ExternalDocumentID | 6424468 | 
    
| Genre | orig-research | 
    
| GroupedDBID | 6IE 6IF 6IK 6IL 6IN AAJGR AAWTH ADFMO ALMA_UNASSIGNED_HOLDINGS BEFXN BFFAM BGNUA BKEBE BPEOZ CBEJK IEGSK IERZE OCL RIE RIL  | 
    
| ID | FETCH-LOGICAL-i175t-1771b977db2e207a1d4d3b0352964ea01fc87b454eb324af0a814b054cd498453 | 
    
| IEDL.DBID | RIE | 
    
| ISBN | 9781467322621 1467322628  | 
    
| IngestDate | Wed Aug 27 08:34:55 EDT 2025 | 
    
| IsDoiOpenAccess | false | 
    
| IsOpenAccess | true | 
    
| IsPeerReviewed | false | 
    
| IsScholarly | true | 
    
| LCCN | 2012939084 | 
    
| Language | English | 
    
| LinkModel | DirectLink | 
    
| MergedId | FETCHMERGED-LOGICAL-i175t-1771b977db2e207a1d4d3b0352964ea01fc87b454eb324af0a814b054cd498453 | 
    
| PageCount | 6 | 
    
| ParticipantIDs | ieee_primary_6424468 | 
    
| PublicationCentury | 2000 | 
    
| PublicationDate | 2012-Sept. | 
    
| PublicationDateYYYYMMDD | 2012-09-01 | 
    
| PublicationDate_xml | – month: 09 year: 2012 text: 2012-Sept.  | 
    
| PublicationDecade | 2010 | 
    
| PublicationTitle | 2012 International Conference on Frontiers in Handwriting Recognition | 
    
| PublicationTitleAbbrev | icfhr | 
    
| PublicationYear | 2012 | 
    
| Publisher | IEEE | 
    
| Publisher_xml | – name: IEEE | 
    
| SSID | ssj0001107010 | 
    
| Score | 2.1492846 | 
    
| Snippet | Page layout analysis is a fundamental step of any document image understanding system. We introduce an approach that segments text appearing in page margins... | 
    
| SourceID | ieee | 
    
| SourceType | Publisher | 
    
| StartPage | 639 | 
    
| SubjectTerms | Accuracy Context Feature extraction historical manuscripts Image segmentation Layout layout analysis machine learning Shape Training  | 
    
| Title | Layout Analysis for Arabic Historical Document Images Using Machine Learning | 
    
| URI | https://ieeexplore.ieee.org/document/6424468 | 
    
| hasFullText | 1 | 
    
| inHoldings | 1 | 
    
| isFullTextHit | |
| isPrint | |
| link | http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV09T8MwELXaTkyAWsS3PDCSNE4df4yoULWoRQhRqVvlr6AK0aKSDPDrOTtpKxADm-Mhss5R7t75vWeErijLAeMogKmU2IiaTEeS5jrKtBbcOGlyF9w-H9hwSu9n2ayBrrdaGOdcIJ-52A_DWb5dmdK3yrrMq7KYaKImF6zSau36KYBjAFsE7Rbj8JmyVGwsnepnUps2kkR2R_3B8Mkzu9I4TX9erRIyy2AfTTZrqgglr3FZ6Nh8_bJr_O-iD1Bnp-HDj9vsdIgabtlG47H6XJUF3liRYChZ8c1a6YXBO8MQfFu_FY_e4HfzgQOvAE8C79Lh2pL1pYOmg7vn_jCq71OIFlAkFBHhnGio96xOXZpwRSy1Pe0NUSWjTiUkN4JrmlEA2ClVeaIgdhpqOmOpFDTrHaHWcrV0xwhblTHjG6jSAMB0TlidmDxhEvCXP4g8QW0fifl7ZZkxr4Nw-vf0GdrzO1FRt85Rq1iX7gJyfaEvwyZ_A2ZKpMc | 
    
| linkProvider | IEEE | 
    
| linkToHtml | http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV09T8MwELVKGWAC1CK-8cBI0sQ9O8mIClUKaYVQK3Wr_BVUIVpUkgF-PbaTUoEY2BwPkXWOcvfO7z0jdAUsNxiHG5gKofJAUuElkAuPChFHUicy187tc8TSCdxP6bSBrr-1MFprRz7Tvh26s3y1lKVtlXWYVWWxeAttUwCglVpr01ExSMagC6feYpH5UBmJ16ZO9XNY2zaGQdIZ9Prpk-V2EZ-Qn5eruNzS30PD9aoqSsmLXxbCl5-_DBv_u-x91N6o-PDjd346QA29aKEs4x_LssBrMxJsilZ8s-JiLvHGMgTf1m_Fg1fzw3nHjlmAh455qXFtyvrcRpP-3biXevWNCt7clAmFF0ZRKEzFpwTRJIh4qEB1hbVETRhoHoS5jCMBFAzEJsDzgJvYCVPVSQVJDLR7iJqL5UIfIaw4ZdK2UBNpIKbWsRKBzAOWGARmjyKPUctGYvZWmWbM6iCc_D19iXbS8TCbZYPRwynatbtSEbnOULNYlfrcZP5CXLgN_wI-D6gU | 
    
| openUrl | ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Abook&rft.genre=proceeding&rft.title=2012+International+Conference+on+Frontiers+in+Handwriting+Recognition&rft.atitle=Layout+Analysis+for+Arabic+Historical+Document+Images+Using+Machine+Learning&rft.au=Bukhari%2C+S.+S.&rft.au=Breuel%2C+T.+M.&rft.au=Asi%2C+A.&rft.au=El-Sana%2C+J.&rft.date=2012-09-01&rft.pub=IEEE&rft.isbn=9781467322621&rft.spage=639&rft.epage=644&rft_id=info:doi/10.1109%2FICFHR.2012.227&rft.externalDocID=6424468 | 
    
| thumbnail_l | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=9781467322621/lc.gif&client=summon&freeimage=true | 
    
| thumbnail_m | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=9781467322621/mc.gif&client=summon&freeimage=true | 
    
| thumbnail_s | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=9781467322621/sc.gif&client=summon&freeimage=true |