A frequent keyword-set based algorithm for topic modeling and clustering of research papers
In this paper we introduce a novel and efficient approach to detect topics in a large corpus of research papers. With rapidly growing size of academic literature, the problem of topic detection has become a very challenging task. We present a unique approach that uses closed frequent keyword-set to...
Saved in:
| Published in | 2011 3rd Conference on Data Mining and Optimization (DMO) pp. 96 - 102 |
|---|---|
| Main Authors | , , |
| Format | Conference Proceeding |
| Language | English |
| Published |
IEEE
01.06.2011
|
| Subjects | |
| Online Access | Get full text |
| ISBN | 9781612842110 1612842119 |
| ISSN | 2155-6938 |
| DOI | 10.1109/DMO.2011.5976511 |
Cover
| Summary: | In this paper we introduce a novel and efficient approach to detect topics in a large corpus of research papers. With rapidly growing size of academic literature, the problem of topic detection has become a very challenging task. We present a unique approach that uses closed frequent keyword-set to form topics. Our approach also provides a natural method to cluster the research papers into hierarchical, overlapping clusters using topic as similarity measure. To rank the research papers in the topic cluster, we devise a modified PageRank algorithm that assigns an authoritative score to each research paper by considering the sub-graph in which the research paper appears. We test our algorithms on the DBLP dataset and experimentally show that our algorithms are fast, effective and scalable. |
|---|---|
| ISBN: | 9781612842110 1612842119 |
| ISSN: | 2155-6938 |
| DOI: | 10.1109/DMO.2011.5976511 |