Browsing Heterogeneous Document Collections by a Segmentation-Free Word Spotting Method
In this paper, we present a segmentation-free word spotting method that is able to deal with heterogeneous document image collections. We propose a patch-based framework where patches are represented by a bag-of-visual-words model powered by SIFT descriptors. A later refinement of the feature vector...
Saved in:
| Published in | 2011 International Conference on Document Analysis and Recognition pp. 63 - 67 |
|---|---|
| Main Authors | , , , |
| Format | Conference Proceeding |
| Language | English |
| Published |
IEEE
01.09.2011
|
| Subjects | |
| Online Access | Get full text |
| ISBN | 1457713500 9781457713507 |
| ISSN | 1520-5363 |
| DOI | 10.1109/ICDAR.2011.22 |
Cover
| Summary: | In this paper, we present a segmentation-free word spotting method that is able to deal with heterogeneous document image collections. We propose a patch-based framework where patches are represented by a bag-of-visual-words model powered by SIFT descriptors. A later refinement of the feature vectors is performed by applying the latent semantic indexing technique. The proposed method performs well on both handwritten and typewritten historical document images. We have also tested our method on documents written in non-Latin scripts. |
|---|---|
| ISBN: | 1457713500 9781457713507 |
| ISSN: | 1520-5363 |
| DOI: | 10.1109/ICDAR.2011.22 |