A frequent keyword-set based algorithm for topic modeling and clustering of research papers

In this paper we introduce a novel and efficient approach to detect topics in a large corpus of research papers. With rapidly growing size of academic literature, the problem of topic detection has become a very challenging task. We present a unique approach that uses closed frequent keyword-set to...

Full description

Saved in:
Bibliographic Details
Published in2011 3rd Conference on Data Mining and Optimization (DMO) pp. 96 - 102
Main Authors Shubankar, Kumar, Singh, AdityaPratap, Pudi, Vikram
Format Conference Proceeding
LanguageEnglish
Published IEEE 01.06.2011
Subjects
Online AccessGet full text
ISBN9781612842110
1612842119
ISSN2155-6938
DOI10.1109/DMO.2011.5976511

Cover

Abstract In this paper we introduce a novel and efficient approach to detect topics in a large corpus of research papers. With rapidly growing size of academic literature, the problem of topic detection has become a very challenging task. We present a unique approach that uses closed frequent keyword-set to form topics. Our approach also provides a natural method to cluster the research papers into hierarchical, overlapping clusters using topic as similarity measure. To rank the research papers in the topic cluster, we devise a modified PageRank algorithm that assigns an authoritative score to each research paper by considering the sub-graph in which the research paper appears. We test our algorithms on the DBLP dataset and experimentally show that our algorithms are fast, effective and scalable.
AbstractList In this paper we introduce a novel and efficient approach to detect topics in a large corpus of research papers. With rapidly growing size of academic literature, the problem of topic detection has become a very challenging task. We present a unique approach that uses closed frequent keyword-set to form topics. Our approach also provides a natural method to cluster the research papers into hierarchical, overlapping clusters using topic as similarity measure. To rank the research papers in the topic cluster, we devise a modified PageRank algorithm that assigns an authoritative score to each research paper by considering the sub-graph in which the research paper appears. We test our algorithms on the DBLP dataset and experimentally show that our algorithms are fast, effective and scalable.
Author Shubankar, Kumar
Pudi, Vikram
Singh, AdityaPratap
Author_xml – sequence: 1
  givenname: Kumar
  surname: Shubankar
  fullname: Shubankar, Kumar
  email: shubankar@students.iiit.ac.in
  organization: Centre for Data Engineering, IIIT Hyderabad, India
– sequence: 2
  givenname: AdityaPratap
  surname: Singh
  fullname: Singh, AdityaPratap
  email: aditya_pratap@students.iiit.ac.in
  organization: Centre for Data Engineering, IIIT Hyderabad, India
– sequence: 3
  givenname: Vikram
  surname: Pudi
  fullname: Pudi, Vikram
  email: vikram@iiit.ac.in
  organization: Centre for Data Engineering, IIIT Hyderabad, India
BookMark eNpVkL1OwzAURo0oEqV0R2LxC6Tc69Q_GatCAamoC0wMlWNft4E0CXYq1LeniC5Mn85ypPNdsUHTNsTYDcIEEYq7-5fVRADiRBZaScQzNi60QYXCTAUKff6PEQZsKFDKTBW5uWTjlD4A4ChSpoAhe5_xEOlrT03PP-nw3UafJep5aRN5butNG6t-u-Ohjbxvu8rxXeuprpoNt43nrt6nnuIvtoFHSmSj2_LOdhTTNbsItk40Pu2IvS0eXudP2XL1-DyfLbMKtewzB965PDhXkgtClTLXgpQloyVMrQcLELw8tigdhHOAOWl0VhtvJHpf5iN2--etiGjdxWpn42F9uif_AeaTWaA
ContentType Conference Proceeding
DBID 6IE
6IL
CBEJK
RIE
RIL
DOI 10.1109/DMO.2011.5976511
DatabaseName IEEE Electronic Library (IEL) Conference Proceedings
IEEE Xplore POP ALL
IEEE Xplore All Conference Proceedings
IEEE Electronic Library (IEL)
IEEE Proceedings Order Plans (POP All) 1998-Present
DatabaseTitleList
Database_xml – sequence: 1
  dbid: RIE
  name: IEEE Electronic Library (IEL)
  url: https://proxy.k.utb.cz/login?url=https://ieeexplore.ieee.org/
  sourceTypes: Publisher
DeliveryMethod fulltext_linktorsrc
Discipline Computer Science
EISBN 9781612842127
1612842127
EndPage 102
ExternalDocumentID 5976511
Genre orig-research
GroupedDBID 6IE
6IF
6IK
6IL
6IN
AAJGR
AAWTH
ACGFS
ADZIZ
ALMA_UNASSIGNED_HOLDINGS
BEFXN
BFFAM
BGNUA
BKEBE
BPEOZ
CBEJK
CHZPO
IEGSK
IPLJI
M43
OCL
RIE
RIL
ID FETCH-LOGICAL-i175t-c0dcc3fccbecf26b5372e6ae87504ad0a00fd542167f2cc013e71ca78d851ddb3
IEDL.DBID RIE
ISBN 9781612842110
1612842119
ISSN 2155-6938
IngestDate Wed Sep 10 07:40:40 EDT 2025
IsPeerReviewed false
IsScholarly false
Language English
LinkModel DirectLink
MergedId FETCHMERGED-LOGICAL-i175t-c0dcc3fccbecf26b5372e6ae87504ad0a00fd542167f2cc013e71ca78d851ddb3
PageCount 7
ParticipantIDs ieee_primary_5976511
PublicationCentury 2000
PublicationDate 2011-June
PublicationDateYYYYMMDD 2011-06-01
PublicationDate_xml – month: 06
  year: 2011
  text: 2011-June
PublicationDecade 2010
PublicationTitle 2011 3rd Conference on Data Mining and Optimization (DMO)
PublicationTitleAbbrev DMO
PublicationYear 2011
Publisher IEEE
Publisher_xml – name: IEEE
SSID ssj0001096890
ssj0000669059
Score 1.5412134
Snippet In this paper we introduce a novel and efficient approach to detect topics in a large corpus of research papers. With rapidly growing size of academic...
SourceID ieee
SourceType Publisher
StartPage 96
SubjectTerms Authoritative Score
Citation Network
Closed Frequent Keyword-set
Clustering algorithms
Graph Mining
Hands
Itemsets
Noise
Optimization
Recommender systems
Semantics
Text analysis
Topic Detection
Trend analysis
Title A frequent keyword-set based algorithm for topic modeling and clustering of research papers
URI https://ieeexplore.ieee.org/document/5976511
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
link http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV09T8MwELVKJ6YCLeJbHhhxmyaxk4wIqCqkAgOVKjFUztmmFZBEJRUSv56zkxSBGNjiDPlwYt_z8713hJynxle4CpBMylCzcCgTloZRwLjhiEaiFAInEpvcifE0vJ3xWYtcbLQwWmuXfKb79tDt5asc1pYqGyD4FdwKebeiWFRarQ2fgqEzaaCC41cQm8eOYsGgxplIgtjquoSdj62rWW331LS9ZgvTSwbXk_vK27O-34_CKy7ujDpk0jxxlW7y0l-XaR8-f5k5_veVdkjvW-FHHzaxa5e0dLZHOk2JB1qP-C55uqRm5bKtS4rD_QOXquxdl9QGP0Xl63O-WpaLN4rQl5Z5sQTqSuvgNanMFIXXtTVisM3c0NpYaEELWSDq7JHp6ObxaszqegxsiSCjZOApgMAA4Hc3vkh5EPlaSB1bi3ipPOl5RnHsTREZH8ASrNEQZBQrhHVKpcE-aWd5pg8ITbjPASdbjXAqBIDE-JFJdCzjIBBy6B2Sru2reVFZbszrbjr6-_Qx2a6oXkuOnJB2uVrrU8QKZXrmfpIviau4FA
linkProvider IEEE
linkToHtml http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV3PT8IwFG6IHvSECsbf9uDRwtjW_TgalaACeoCExAPpXlshIiM4YuJf72u3YTQevK077Ee39n39-r7vEXKRaFfiKkAwIXzF_JaIWeKHHuOaIxoJE_CsSKzXDzpD_37ERxVyudbCKKVs8plqmEO7ly9TWBmqrIngN-BGyLvJfd_nuVprzahg8IxLsGAZFkTnkSVZMKxxFsReZJRdgZmRja9ZYfhUtp1yE9OJmze9x9zds7jjj9IrNvK0q6RXPnOecPLaWGVJAz5_2Tn-96V2SP1b40ef1tFrl1TUfI9UyyIPtBjzNfJ8RfXS5ltnFAf8By5W2bvKqAl_korZS7qcZpM3iuCXZuliCtQW18FrUjGXFGYrY8VgmqmmhbXQhC7EAnFnnQzbt4PrDisqMrApwoyMgSMBPA2AX167QcK90FWBUJExiRfSEY6jJcfeDELtAhiKNWyBCCOJwE7KxNsnG_N0rg4IjbnLAadbhYDKB4BYu6GOVSQizwtEyzkkNdNX40VuujEuuuno79PnZKsz6HXH3bv-wzHZzolfQ5WckI1suVKniByy5Mz-MF9hU7th
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Abook&rft.genre=proceeding&rft.title=2011+3rd+Conference+on+Data+Mining+and+Optimization+%28DMO%29&rft.atitle=A+frequent+keyword-set+based+algorithm+for+topic+modeling+and+clustering+of+research+papers&rft.au=Shubankar%2C+Kumar&rft.au=Singh%2C+AdityaPratap&rft.au=Pudi%2C+Vikram&rft.date=2011-06-01&rft.pub=IEEE&rft.isbn=9781612842110&rft.issn=2155-6938&rft.spage=96&rft.epage=102&rft_id=info:doi/10.1109%2FDMO.2011.5976511&rft.externalDocID=5976511
thumbnail_l http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=2155-6938&client=summon
thumbnail_m http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=2155-6938&client=summon
thumbnail_s http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=2155-6938&client=summon