Inverted files versus signature files for text indexing

Two well-known indexing methods are inverted files and signature files. We have undertaken a detailed comparison of these two approaches in the context of text indexing, paying particular attention to query evaluation speed and space requirements. We have examined their relative performance using bo...

Full description

Saved in:
Bibliographic Details
Published inACM transactions on database systems Vol. 23; no. 4; pp. 453 - 490
Main Authors Zobel, Justin, Moffat, Alistair, Ramamohanarao, Kotagiri
Format Journal Article
LanguageEnglish
Published New York, NY, USA ACM 01.12.1998
Association for Computing Machinery, Inc
Association for Computing Machinery
Subjects
Online AccessGet full text
ISSN0362-5915
1557-4644
1557-4644
DOI10.1145/296854.277632

Cover

More Information
Summary:Two well-known indexing methods are inverted files and signature files. We have undertaken a detailed comparison of these two approaches in the context of text indexing, paying particular attention to query evaluation speed and space requirements. We have examined their relative performance using both experimentation and a refined approach to modeling of signature files, and demonstrate that inverted files are distinctly superior to signature files. Not only can inverted files be used to evaluate typical queries in less time than can signature files, but inverted files require less space and provide greater functionality. Our results also show that a synthetic text database can provide a realistic indication of the behavior of an actual text database. The tools used to generate the synthetic database have been made publicly available
Bibliography:SourceType-Scholarly Journals-1
ObjectType-Feature-1
content type line 14
ObjectType-Article-2
content type line 23
ISSN:0362-5915
1557-4644
1557-4644
DOI:10.1145/296854.277632