Layout Analysis for Arabic Historical Document Images Using Machine Learning

Page layout analysis is a fundamental step of any document image understanding system. We introduce an approach that segments text appearing in page margins (a.k.a side-notes text) from manuscripts with complex layout format. Simple and discriminative features are extracted in a connected-component...

Full description

Saved in:
Bibliographic Details
Published in2012 International Conference on Frontiers in Handwriting Recognition pp. 639 - 644
Main Authors Bukhari, S. S., Breuel, T. M., Asi, A., El-Sana, J.
Format Conference Proceeding
LanguageEnglish
Published IEEE 01.09.2012
Subjects
Online AccessGet full text
ISBN9781467322621
1467322628
DOI10.1109/ICFHR.2012.227

Cover

Abstract Page layout analysis is a fundamental step of any document image understanding system. We introduce an approach that segments text appearing in page margins (a.k.a side-notes text) from manuscripts with complex layout format. Simple and discriminative features are extracted in a connected-component level and subsequently robust feature vectors are generated. Multilayer perception classifier is exploited to classify connected components to the relevant class of text. A voting scheme is then applied to refine the resulting segmentation and produce the final classification. In contrast to state-of-the-art segmentation approaches, this method is independent of block segmentation, as well as pixel level analysis. The proposed method has been trained and tested on a dataset that contains a variety of complex side-notes layout formats, achieving a segmentation accuracy of about 95%.
AbstractList Page layout analysis is a fundamental step of any document image understanding system. We introduce an approach that segments text appearing in page margins (a.k.a side-notes text) from manuscripts with complex layout format. Simple and discriminative features are extracted in a connected-component level and subsequently robust feature vectors are generated. Multilayer perception classifier is exploited to classify connected components to the relevant class of text. A voting scheme is then applied to refine the resulting segmentation and produce the final classification. In contrast to state-of-the-art segmentation approaches, this method is independent of block segmentation, as well as pixel level analysis. The proposed method has been trained and tested on a dataset that contains a variety of complex side-notes layout formats, achieving a segmentation accuracy of about 95%.
Author Bukhari, S. S.
El-Sana, J.
Breuel, T. M.
Asi, A.
Author_xml – sequence: 1
  givenname: S. S.
  surname: Bukhari
  fullname: Bukhari, S. S.
  email: bukhari@informatik.uni-kl.de
  organization: Tech. Univ. of Kaiserslautern, Kaiserslautern, Germany
– sequence: 2
  givenname: T. M.
  surname: Breuel
  fullname: Breuel, T. M.
  email: tmb@informatik.uni-kl.de
  organization: Tech. Univ. of Kaiserslautern, Kaiserslautern, Germany
– sequence: 3
  givenname: A.
  surname: Asi
  fullname: Asi, A.
  email: abedas@cs.bgu.ac.il
  organization: Ben-Gurion Univ. of the Negev, Beer-Sheva, Israel
– sequence: 4
  givenname: J.
  surname: El-Sana
  fullname: El-Sana, J.
  email: el-sana@cs.bgu.ac.il
  organization: Ben-Gurion Univ. of the Negev, Beer-Sheva, Israel
BookMark eNotj1FLwzAUhQM60M2--uJL_sDqTXLbNI-jOleoCOKex02bzkiXStM99N870acD5-N8cJbsOgzBMXYvIBUCzGNVbnfvqQQhUyn1FUuMLgTmWkmZS7Fgy19klIECb1gS4xcAXIYaBNyyuqZ5OE98E6ifo4-8G0a-Gcn6hu98nIbRN9Tzp6E5n1yYeHWio4t8H3048ldqPn1wvHY0hktxxxYd9dEl_7li--3zR7lb128vVbmp117obFoLrYU1WrdWOgmaRIutsqAyaXJ0BKJrCm0xQ2eVROqALn8sZNi0aArM1Io9_Hm9c-7wPfoTjfMhR4mYF-oHvEJOxg
CODEN IEEPAD
ContentType Conference Proceeding
DBID 6IE
6IL
CBEJK
RIE
RIL
DOI 10.1109/ICFHR.2012.227
DatabaseName IEEE Electronic Library (IEL) Conference Proceedings
IEEE Xplore POP ALL
IEEE Xplore All Conference Proceedings
IEEE Electronic Library (IEL)
IEEE Proceedings Order Plans (POP All) 1998-Present
DatabaseTitleList
Database_xml – sequence: 1
  dbid: RIE
  name: IEEE Electronic Library (IEL)
  url: https://proxy.k.utb.cz/login?url=https://ieeexplore.ieee.org/
  sourceTypes: Publisher
DeliveryMethod fulltext_linktorsrc
EndPage 644
ExternalDocumentID 6424468
Genre orig-research
GroupedDBID 6IE
6IF
6IK
6IL
6IN
AAJGR
AAWTH
ADFMO
ALMA_UNASSIGNED_HOLDINGS
BEFXN
BFFAM
BGNUA
BKEBE
BPEOZ
CBEJK
IEGSK
IERZE
OCL
RIE
RIL
ID FETCH-LOGICAL-i175t-1771b977db2e207a1d4d3b0352964ea01fc87b454eb324af0a814b054cd498453
IEDL.DBID RIE
ISBN 9781467322621
1467322628
IngestDate Wed Aug 27 08:34:55 EDT 2025
IsDoiOpenAccess false
IsOpenAccess true
IsPeerReviewed false
IsScholarly true
LCCN 2012939084
Language English
LinkModel DirectLink
MergedId FETCHMERGED-LOGICAL-i175t-1771b977db2e207a1d4d3b0352964ea01fc87b454eb324af0a814b054cd498453
PageCount 6
ParticipantIDs ieee_primary_6424468
PublicationCentury 2000
PublicationDate 2012-Sept.
PublicationDateYYYYMMDD 2012-09-01
PublicationDate_xml – month: 09
  year: 2012
  text: 2012-Sept.
PublicationDecade 2010
PublicationTitle 2012 International Conference on Frontiers in Handwriting Recognition
PublicationTitleAbbrev icfhr
PublicationYear 2012
Publisher IEEE
Publisher_xml – name: IEEE
SSID ssj0001107010
Score 2.1492846
Snippet Page layout analysis is a fundamental step of any document image understanding system. We introduce an approach that segments text appearing in page margins...
SourceID ieee
SourceType Publisher
StartPage 639
SubjectTerms Accuracy
Context
Feature extraction
historical manuscripts
Image segmentation
Layout
layout analysis
machine learning
Shape
Training
Title Layout Analysis for Arabic Historical Document Images Using Machine Learning
URI https://ieeexplore.ieee.org/document/6424468
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
link http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV09T8MwELXaTkyAWsS3PDCSNE4df4yoULWoRQhRqVvlr6AK0aKSDPDrOTtpKxADm-Mhss5R7t75vWeErijLAeMogKmU2IiaTEeS5jrKtBbcOGlyF9w-H9hwSu9n2ayBrrdaGOdcIJ-52A_DWb5dmdK3yrrMq7KYaKImF6zSau36KYBjAFsE7Rbj8JmyVGwsnepnUps2kkR2R_3B8Mkzu9I4TX9erRIyy2AfTTZrqgglr3FZ6Nh8_bJr_O-iD1Bnp-HDj9vsdIgabtlG47H6XJUF3liRYChZ8c1a6YXBO8MQfFu_FY_e4HfzgQOvAE8C79Lh2pL1pYOmg7vn_jCq71OIFlAkFBHhnGio96xOXZpwRSy1Pe0NUSWjTiUkN4JrmlEA2ClVeaIgdhpqOmOpFDTrHaHWcrV0xwhblTHjG6jSAMB0TlidmDxhEvCXP4g8QW0fifl7ZZkxr4Nw-vf0GdrzO1FRt85Rq1iX7gJyfaEvwyZ_A2ZKpMc
linkProvider IEEE
linkToHtml http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV09T8MwELVKGWAC1CK-8cBI0sQ9O8mIClUKaYVQK3Wr_BVUIVpUkgF-PbaTUoEY2BwPkXWOcvfO7z0jdAUsNxiHG5gKofJAUuElkAuPChFHUicy187tc8TSCdxP6bSBrr-1MFprRz7Tvh26s3y1lKVtlXWYVWWxeAttUwCglVpr01ExSMagC6feYpH5UBmJ16ZO9XNY2zaGQdIZ9Prpk-V2EZ-Qn5eruNzS30PD9aoqSsmLXxbCl5-_DBv_u-x91N6o-PDjd346QA29aKEs4x_LssBrMxJsilZ8s-JiLvHGMgTf1m_Fg1fzw3nHjlmAh455qXFtyvrcRpP-3biXevWNCt7clAmFF0ZRKEzFpwTRJIh4qEB1hbVETRhoHoS5jCMBFAzEJsDzgJvYCVPVSQVJDLR7iJqL5UIfIaw4ZdK2UBNpIKbWsRKBzAOWGARmjyKPUctGYvZWmWbM6iCc_D19iXbS8TCbZYPRwynatbtSEbnOULNYlfrcZP5CXLgN_wI-D6gU
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Abook&rft.genre=proceeding&rft.title=2012+International+Conference+on+Frontiers+in+Handwriting+Recognition&rft.atitle=Layout+Analysis+for+Arabic+Historical+Document+Images+Using+Machine+Learning&rft.au=Bukhari%2C+S.+S.&rft.au=Breuel%2C+T.+M.&rft.au=Asi%2C+A.&rft.au=El-Sana%2C+J.&rft.date=2012-09-01&rft.pub=IEEE&rft.isbn=9781467322621&rft.spage=639&rft.epage=644&rft_id=info:doi/10.1109%2FICFHR.2012.227&rft.externalDocID=6424468
thumbnail_l http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=9781467322621/lc.gif&client=summon&freeimage=true
thumbnail_m http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=9781467322621/mc.gif&client=summon&freeimage=true
thumbnail_s http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=9781467322621/sc.gif&client=summon&freeimage=true