Ad-hoc Information Retrieval based on Boosted Latent Dirichlet Allocated Topics

Latent Dirichlet Allocation (LDA) is a fundamental method in the text mining field. We propose strategies for topic and model selection based on LDA that exploits the semantic coherence of the topics inferred, boosting the quality of the models found. Then we study how our boosted topic models perfo...

Full description

Saved in:
Bibliographic Details
Published in2018 37th International Conference of the Chilean Computer Science Society (SCCC) pp. 1 - 7
Main Authors Mendoza, Marcelo, Ormeno, Pablo, Valle, Carlos
Format Conference Proceeding
LanguageEnglish
Published IEEE 01.11.2018
Subjects
Online AccessGet full text
DOI10.1109/SCCC.2018.8705252

Cover

More Information
Summary:Latent Dirichlet Allocation (LDA) is a fundamental method in the text mining field. We propose strategies for topic and model selection based on LDA that exploits the semantic coherence of the topics inferred, boosting the quality of the models found. Then we study how our boosted topic models perform in ad-hoc information retrieval tasks. Experimental results in four datasets show that our proposal improves the quality of the topics found favoring document retrieval tasks. Our method outperforms traditional LDA-based methods showing that model selection based on semantic coherence is useful for document modeling and information retrieval tasks.
DOI:10.1109/SCCC.2018.8705252