Improving Large-Scale k-Nearest Neighbor Text Categorization with Label Autoencoders
In this paper, we introduce a multi-label lazy learning approach to deal with automatic semantic indexing in large document collections in the presence of complex and structured label vocabularies with high inter-label correlation. The proposed method is an evolution of the traditional k-Nearest Nei...
Saved in:
| Published in | Mathematics (Basel) Vol. 10; no. 16; p. 2867 |
|---|---|
| Main Authors | , , |
| Format | Journal Article |
| Language | English |
| Published |
Basel
MDPI AG
01.08.2022
|
| Subjects | |
| Online Access | Get full text |
| ISSN | 2227-7390 2227-7390 |
| DOI | 10.3390/math10162867 |
Cover
| Summary: | In this paper, we introduce a multi-label lazy learning approach to deal with automatic semantic indexing in large document collections in the presence of complex and structured label vocabularies with high inter-label correlation. The proposed method is an evolution of the traditional k-Nearest Neighbors algorithm which uses a large autoencoder trained to map the large label space to a reduced size latent space and to regenerate the predicted labels from this latent space. We have evaluated our proposal in a large portion of the MEDLINE biomedical document collection which uses the Medical Subject Headings (MeSH) thesaurus as a controlled vocabulary. In our experiments we propose and evaluate several document representation approaches and different label autoencoder configurations. |
|---|---|
| Bibliography: | ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14 |
| ISSN: | 2227-7390 2227-7390 |
| DOI: | 10.3390/math10162867 |