Browsing Heterogeneous Document Collections by a Segmentation-Free Word Spotting Method

In this paper, we present a segmentation-free word spotting method that is able to deal with heterogeneous document image collections. We propose a patch-based framework where patches are represented by a bag-of-visual-words model powered by SIFT descriptors. A later refinement of the feature vector...

Full description

Saved in:
Bibliographic Details
Published in2011 International Conference on Document Analysis and Recognition pp. 63 - 67
Main Authors Rusinol, M., Aldavert, D., Toledo, R., Llados, Josep
Format Conference Proceeding
LanguageEnglish
Published IEEE 01.09.2011
Subjects
Online AccessGet full text
ISBN1457713500
9781457713507
ISSN1520-5363
DOI10.1109/ICDAR.2011.22

Cover

More Information
Summary:In this paper, we present a segmentation-free word spotting method that is able to deal with heterogeneous document image collections. We propose a patch-based framework where patches are represented by a bag-of-visual-words model powered by SIFT descriptors. A later refinement of the feature vectors is performed by applying the latent semantic indexing technique. The proposed method performs well on both handwritten and typewritten historical document images. We have also tested our method on documents written in non-Latin scripts.
ISBN:1457713500
9781457713507
ISSN:1520-5363
DOI:10.1109/ICDAR.2011.22