Implementing agglomerative hierarchic clustering algorithms for use in document retrieval
Searching hierarchically clustered document collections can be effective[6], but creating the cluster hierarchies is expensive, since there are both many documents and many terms. However, the information in the document-term matrix is sparse: Documents are usually indexed by relatively few terms. T...
Saved in:
| Published in | Information processing & management Vol. 22; no. 6; pp. 465 - 476 |
|---|---|
| Main Author | |
| Format | Journal Article |
| Language | English |
| Published |
Oxford
Elsevier Ltd
1986
Elsevier Science Pergamon Press Elsevier Science Ltd |
| Subjects | |
| Online Access | Get full text |
| ISSN | 0306-4573 1873-5371 |
| DOI | 10.1016/0306-4573(86)90097-X |
Cover
| Summary: | Searching hierarchically clustered document collections can be effective[6], but creating the cluster hierarchies is expensive, since there are both many documents and many terms. However, the information in the document-term matrix is sparse: Documents are usually indexed by relatively few terms. This paper describes the implementations of three agglomerative hierarchic clustering algorithms that exploit this sparsity so that collections much larger than the algorithms' worst case running times would suggest can be clustered. The implementations described in the paper have been used to cluster a collection of 12,000 documents. |
|---|---|
| Bibliography: | ObjectType-Article-2 SourceType-Scholarly Journals-1 ObjectType-Feature-1 content type line 14 ObjectType-Article-1 ObjectType-Feature-2 content type line 23 |
| ISSN: | 0306-4573 1873-5371 |
| DOI: | 10.1016/0306-4573(86)90097-X |