A frequent keyword-set based algorithm for topic modeling and clustering of research papers

In this paper we introduce a novel and efficient approach to detect topics in a large corpus of research papers. With rapidly growing size of academic literature, the problem of topic detection has become a very challenging task. We present a unique approach that uses closed frequent keyword-set to...

Full description

Saved in:

Bibliographic Details
Published in	2011 3rd Conference on Data Mining and Optimization (DMO) pp. 96 - 102
Main Authors	Shubankar, Kumar, Singh, AdityaPratap, Pudi, Vikram
Format	Conference Proceeding
Language	English
Published	IEEE 01.06.2011
Subjects	Authoritative Score Citation Network Closed Frequent Keyword-set Clustering algorithms Graph Mining Hands Itemsets Noise Optimization Recommender systems Semantics Text analysis Topic Detection Trend analysis
Online Access	Get full text
ISBN	9781612842110 1612842119
ISSN	2155-6938
DOI	10.1109/DMO.2011.5976511

Cover

More Information
Summary:	In this paper we introduce a novel and efficient approach to detect topics in a large corpus of research papers. With rapidly growing size of academic literature, the problem of topic detection has become a very challenging task. We present a unique approach that uses closed frequent keyword-set to form topics. Our approach also provides a natural method to cluster the research papers into hierarchical, overlapping clusters using topic as similarity measure. To rank the research papers in the topic cluster, we devise a modified PageRank algorithm that assigns an authoritative score to each research paper by considering the sub-graph in which the research paper appears. We test our algorithms on the DBLP dataset and experimentally show that our algorithms are fast, effective and scalable.
ISBN:	9781612842110 1612842119
ISSN:	2155-6938
DOI:	10.1109/DMO.2011.5976511