A frequent keyword-set based algorithm for topic modeling and clustering of research papers

In this paper we introduce a novel and efficient approach to detect topics in a large corpus of research papers. With rapidly growing size of academic literature, the problem of topic detection has become a very challenging task. We present a unique approach that uses closed frequent keyword-set to...

Full description

Saved in:

Bibliographic Details
Published in	2011 3rd Conference on Data Mining and Optimization (DMO) pp. 96 - 102
Main Authors	Shubankar, Kumar, Singh, AdityaPratap, Pudi, Vikram
Format	Conference Proceeding
Language	English
Published	IEEE 01.06.2011
Subjects	Authoritative Score Citation Network Closed Frequent Keyword-set Clustering algorithms Graph Mining Hands Itemsets Noise Optimization Recommender systems Semantics Text analysis Topic Detection Trend analysis
Online Access	Get full text
ISBN	9781612842110 1612842119
ISSN	2155-6938
DOI	10.1109/DMO.2011.5976511

Cover

Abstract	In this paper we introduce a novel and efficient approach to detect topics in a large corpus of research papers. With rapidly growing size of academic literature, the problem of topic detection has become a very challenging task. We present a unique approach that uses closed frequent keyword-set to form topics. Our approach also provides a natural method to cluster the research papers into hierarchical, overlapping clusters using topic as similarity measure. To rank the research papers in the topic cluster, we devise a modified PageRank algorithm that assigns an authoritative score to each research paper by considering the sub-graph in which the research paper appears. We test our algorithms on the DBLP dataset and experimentally show that our algorithms are fast, effective and scalable.
AbstractList	In this paper we introduce a novel and efficient approach to detect topics in a large corpus of research papers. With rapidly growing size of academic literature, the problem of topic detection has become a very challenging task. We present a unique approach that uses closed frequent keyword-set to form topics. Our approach also provides a natural method to cluster the research papers into hierarchical, overlapping clusters using topic as similarity measure. To rank the research papers in the topic cluster, we devise a modified PageRank algorithm that assigns an authoritative score to each research paper by considering the sub-graph in which the research paper appears. We test our algorithms on the DBLP dataset and experimentally show that our algorithms are fast, effective and scalable.
Author	Shubankar, Kumar Pudi, Vikram Singh, AdityaPratap
Author_xml	– sequence: 1 givenname: Kumar surname: Shubankar fullname: Shubankar, Kumar email: shubankar@students.iiit.ac.in organization: Centre for Data Engineering, IIIT Hyderabad, India – sequence: 2 givenname: AdityaPratap surname: Singh fullname: Singh, AdityaPratap email: aditya_pratap@students.iiit.ac.in organization: Centre for Data Engineering, IIIT Hyderabad, India – sequence: 3 givenname: Vikram surname: Pudi fullname: Pudi, Vikram email: vikram@iiit.ac.in organization: Centre for Data Engineering, IIIT Hyderabad, India
BookMark	eNpVkL1OwzAURo0oEqV0R2LxC6Tc69Q_GatCAamoC0wMlWNft4E0CXYq1LeniC5Mn85ypPNdsUHTNsTYDcIEEYq7-5fVRADiRBZaScQzNi60QYXCTAUKff6PEQZsKFDKTBW5uWTjlD4A4ChSpoAhe5_xEOlrT03PP-nw3UafJep5aRN5butNG6t-u-Ohjbxvu8rxXeuprpoNt43nrt6nnuIvtoFHSmSj2_LOdhTTNbsItk40Pu2IvS0eXudP2XL1-DyfLbMKtewzB965PDhXkgtClTLXgpQloyVMrQcLELw8tigdhHOAOWl0VhtvJHpf5iN2--etiGjdxWpn42F9uif_AeaTWaA
ContentType	Conference Proceeding
DBID	6IE 6IL CBEJK RIE RIL
DOI	10.1109/DMO.2011.5976511
DatabaseName	IEEE Electronic Library (IEL) Conference Proceedings IEEE Xplore POP ALL IEEE Xplore All Conference Proceedings IEEE Electronic Library (IEL) IEEE Proceedings Order Plans (POP All) 1998-Present
DatabaseTitleList
Database_xml	– sequence: 1 dbid: RIE name: IEEE Electronic Library (IEL) url: https://proxy.k.utb.cz/login?url=https://ieeexplore.ieee.org/ sourceTypes: Publisher
DeliveryMethod	fulltext_linktorsrc
Discipline	Computer Science
EISBN	9781612842127 1612842127
EndPage	102
ExternalDocumentID	5976511
Genre	orig-research
GroupedDBID	6IE 6IF 6IK 6IL 6IN AAJGR AAWTH ACGFS ADZIZ ALMA_UNASSIGNED_HOLDINGS BEFXN BFFAM BGNUA BKEBE BPEOZ CBEJK CHZPO IEGSK IPLJI M43 OCL RIE RIL
ID	FETCH-LOGICAL-i175t-c0dcc3fccbecf26b5372e6ae87504ad0a00fd542167f2cc013e71ca78d851ddb3
IEDL.DBID	RIE
ISBN	9781612842110 1612842119
ISSN	2155-6938
IngestDate	Wed Sep 10 07:40:40 EDT 2025
IsPeerReviewed	false
IsScholarly	false
Language	English
LinkModel	DirectLink
MergedId	FETCHMERGED-LOGICAL-i175t-c0dcc3fccbecf26b5372e6ae87504ad0a00fd542167f2cc013e71ca78d851ddb3
PageCount	7
ParticipantIDs	ieee_primary_5976511
PublicationCentury	2000
PublicationDate	2011-June
PublicationDateYYYYMMDD	2011-06-01
PublicationDate_xml	– month: 06 year: 2011 text: 2011-June
PublicationDecade	2010
PublicationTitle	2011 3rd Conference on Data Mining and Optimization (DMO)
PublicationTitleAbbrev	DMO
PublicationYear	2011
Publisher	IEEE
Publisher_xml	– name: IEEE
SSID	ssj0001096890 ssj0000669059
Score	1.5412134
Snippet	In this paper we introduce a novel and efficient approach to detect topics in a large corpus of research papers. With rapidly growing size of academic...
SourceID	ieee
SourceType	Publisher
StartPage	96
SubjectTerms	Authoritative Score Citation Network Closed Frequent Keyword-set Clustering algorithms Graph Mining Hands Itemsets Noise Optimization Recommender systems Semantics Text analysis Topic Detection Trend analysis
Title	A frequent keyword-set based algorithm for topic modeling and clustering of research papers
URI	https://ieeexplore.ieee.org/document/5976511
hasFullText	1
inHoldings	1
isFullTextHit
isPrint
link	http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV09T8MwELVKJ6YCLeJbHhhxmyaxk4wIqCqkAgOVKjFUztmmFZBEJRUSv56zkxSBGNjiDPlwYt_z8713hJynxle4CpBMylCzcCgTloZRwLjhiEaiFAInEpvcifE0vJ3xWYtcbLQwWmuXfKb79tDt5asc1pYqGyD4FdwKebeiWFRarQ2fgqEzaaCC41cQm8eOYsGgxplIgtjquoSdj62rWW331LS9ZgvTSwbXk_vK27O-34_CKy7ujDpk0jxxlW7y0l-XaR8-f5k5_veVdkjvW-FHHzaxa5e0dLZHOk2JB1qP-C55uqRm5bKtS4rD_QOXquxdl9QGP0Xl63O-WpaLN4rQl5Z5sQTqSuvgNanMFIXXtTVisM3c0NpYaEELWSDq7JHp6ObxaszqegxsiSCjZOApgMAA4Hc3vkh5EPlaSB1bi3ipPOl5RnHsTREZH8ASrNEQZBQrhHVKpcE-aWd5pg8ITbjPASdbjXAqBIDE-JFJdCzjIBBy6B2Sru2reVFZbszrbjr6-_Qx2a6oXkuOnJB2uVrrU8QKZXrmfpIviau4FA
linkProvider	IEEE
linkToHtml	http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV3PT8IwFG6IHvSECsbf9uDRwtjW_TgalaACeoCExAPpXlshIiM4YuJf72u3YTQevK077Ee39n39-r7vEXKRaFfiKkAwIXzF_JaIWeKHHuOaIxoJE_CsSKzXDzpD_37ERxVyudbCKKVs8plqmEO7ly9TWBmqrIngN-BGyLvJfd_nuVprzahg8IxLsGAZFkTnkSVZMKxxFsReZJRdgZmRja9ZYfhUtp1yE9OJmze9x9zds7jjj9IrNvK0q6RXPnOecPLaWGVJAz5_2Tn-96V2SP1b40ef1tFrl1TUfI9UyyIPtBjzNfJ8RfXS5ltnFAf8By5W2bvKqAl_korZS7qcZpM3iuCXZuliCtQW18FrUjGXFGYrY8VgmqmmhbXQhC7EAnFnnQzbt4PrDisqMrApwoyMgSMBPA2AX167QcK90FWBUJExiRfSEY6jJcfeDELtAhiKNWyBCCOJwE7KxNsnG_N0rg4IjbnLAadbhYDKB4BYu6GOVSQizwtEyzkkNdNX40VuujEuuuno79PnZKsz6HXH3bv-wzHZzolfQ5WckI1suVKniByy5Mz-MF9hU7th
openUrl	ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Abook&rft.genre=proceeding&rft.title=2011+3rd+Conference+on+Data+Mining+and+Optimization+%28DMO%29&rft.atitle=A+frequent+keyword-set+based+algorithm+for+topic+modeling+and+clustering+of+research+papers&rft.au=Shubankar%2C+Kumar&rft.au=Singh%2C+AdityaPratap&rft.au=Pudi%2C+Vikram&rft.date=2011-06-01&rft.pub=IEEE&rft.isbn=9781612842110&rft.issn=2155-6938&rft.spage=96&rft.epage=102&rft_id=info:doi/10.1109%2FDMO.2011.5976511&rft.externalDocID=5976511
thumbnail_l	http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=2155-6938&client=summon
thumbnail_m	http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=2155-6938&client=summon
thumbnail_s	http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=2155-6938&client=summon