Implementing agglomerative hierarchic clustering algorithms for use in document retrieval

Searching hierarchically clustered document collections can be effective[6], but creating the cluster hierarchies is expensive, since there are both many documents and many terms. However, the information in the document-term matrix is sparse: Documents are usually indexed by relatively few terms. T...

Full description

Saved in:

Bibliographic Details
Published in	Information processing & management Vol. 22; no. 6; pp. 465 - 476
Main Author	Voorhees, Ellen M.
Format	Journal Article
Language	English
Published	Oxford Elsevier Ltd 1986 Elsevier Science Pergamon Press Elsevier Science Ltd
Subjects	Algorithms Clustering Clusters Documents Efficiency Exact sciences and technology File organization Hierarchical clustering Hierarchies Implementations Information and communication sciences Information processing and retrieval Information retrieval Information retrieval. Man machine relationship Information science. Documentation Information storage and retrieval Information work Methods Research process. Evaluation Sciences and techniques of general use Searching Subject indexing Technical services Cluster analysis Document retrieval Evaluation Algorithm
Online Access	Get full text
ISSN	0306-4573 1873-5371
DOI	10.1016/0306-4573(86)90097-X

Cover

More Information
Summary:	Searching hierarchically clustered document collections can be effective[6], but creating the cluster hierarchies is expensive, since there are both many documents and many terms. However, the information in the document-term matrix is sparse: Documents are usually indexed by relatively few terms. This paper describes the implementations of three agglomerative hierarchic clustering algorithms that exploit this sparsity so that collections much larger than the algorithms' worst case running times would suggest can be clustered. The implementations described in the paper have been used to cluster a collection of 12,000 documents.
Bibliography:	ObjectType-Article-2 SourceType-Scholarly Journals-1 ObjectType-Feature-1 content type line 14 ObjectType-Article-1 ObjectType-Feature-2 content type line 23
ISSN:	0306-4573 1873-5371
DOI:	10.1016/0306-4573(86)90097-X