Improving Large-Scale k-Nearest Neighbor Text Categorization with Label Autoencoders

In this paper, we introduce a multi-label lazy learning approach to deal with automatic semantic indexing in large document collections in the presence of complex and structured label vocabularies with high inter-label correlation. The proposed method is an evolution of the traditional k-Nearest Nei...

Full description

Saved in:

Bibliographic Details
Published in	Mathematics (Basel) Vol. 10; no. 16; p. 2867
Main Authors	Ribadas-Pena, Francisco J., Cao, Shuyuan, Darriba Bilbao, Víctor M.
Format	Journal Article
Language	English
Published	Basel MDPI AG 01.08.2022
Subjects	Adaptation Algorithms autoencoders Classification Data compression Documents Learning Medical Subject Headings-MeSH MeSH indexing multi-label categorization nearest neighbors Neural networks semantic indexing Semantics Text categorization
Online Access	Get full text
ISSN	2227-7390 2227-7390
DOI	10.3390/math10162867

Cover

More Information
Summary:	In this paper, we introduce a multi-label lazy learning approach to deal with automatic semantic indexing in large document collections in the presence of complex and structured label vocabularies with high inter-label correlation. The proposed method is an evolution of the traditional k-Nearest Neighbors algorithm which uses a large autoencoder trained to map the large label space to a reduced size latent space and to regenerate the predicted labels from this latent space. We have evaluated our proposal in a large portion of the MEDLINE biomedical document collection which uses the Medical Subject Headings (MeSH) thesaurus as a controlled vocabulary. In our experiments we propose and evaluate several document representation approaches and different label autoencoder configurations.
Bibliography:	ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14
ISSN:	2227-7390 2227-7390
DOI:	10.3390/math10162867