A frequent keyword-set based algorithm for topic modeling and clustering of research papers
In this paper we introduce a novel and efficient approach to detect topics in a large corpus of research papers. With rapidly growing size of academic literature, the problem of topic detection has become a very challenging task. We present a unique approach that uses closed frequent keyword-set to...
Saved in:
| Published in | 2011 3rd Conference on Data Mining and Optimization (DMO) pp. 96 - 102 |
|---|---|
| Main Authors | , , |
| Format | Conference Proceeding |
| Language | English |
| Published |
IEEE
01.06.2011
|
| Subjects | |
| Online Access | Get full text |
| ISBN | 9781612842110 1612842119 |
| ISSN | 2155-6938 |
| DOI | 10.1109/DMO.2011.5976511 |
Cover
| Abstract | In this paper we introduce a novel and efficient approach to detect topics in a large corpus of research papers. With rapidly growing size of academic literature, the problem of topic detection has become a very challenging task. We present a unique approach that uses closed frequent keyword-set to form topics. Our approach also provides a natural method to cluster the research papers into hierarchical, overlapping clusters using topic as similarity measure. To rank the research papers in the topic cluster, we devise a modified PageRank algorithm that assigns an authoritative score to each research paper by considering the sub-graph in which the research paper appears. We test our algorithms on the DBLP dataset and experimentally show that our algorithms are fast, effective and scalable. |
|---|---|
| AbstractList | In this paper we introduce a novel and efficient approach to detect topics in a large corpus of research papers. With rapidly growing size of academic literature, the problem of topic detection has become a very challenging task. We present a unique approach that uses closed frequent keyword-set to form topics. Our approach also provides a natural method to cluster the research papers into hierarchical, overlapping clusters using topic as similarity measure. To rank the research papers in the topic cluster, we devise a modified PageRank algorithm that assigns an authoritative score to each research paper by considering the sub-graph in which the research paper appears. We test our algorithms on the DBLP dataset and experimentally show that our algorithms are fast, effective and scalable. |
| Author | Shubankar, Kumar Pudi, Vikram Singh, AdityaPratap |
| Author_xml | – sequence: 1 givenname: Kumar surname: Shubankar fullname: Shubankar, Kumar email: shubankar@students.iiit.ac.in organization: Centre for Data Engineering, IIIT Hyderabad, India – sequence: 2 givenname: AdityaPratap surname: Singh fullname: Singh, AdityaPratap email: aditya_pratap@students.iiit.ac.in organization: Centre for Data Engineering, IIIT Hyderabad, India – sequence: 3 givenname: Vikram surname: Pudi fullname: Pudi, Vikram email: vikram@iiit.ac.in organization: Centre for Data Engineering, IIIT Hyderabad, India |
| BookMark | eNpVkL1OwzAURo0oEqV0R2LxC6Tc69Q_GatCAamoC0wMlWNft4E0CXYq1LeniC5Mn85ypPNdsUHTNsTYDcIEEYq7-5fVRADiRBZaScQzNi60QYXCTAUKff6PEQZsKFDKTBW5uWTjlD4A4ChSpoAhe5_xEOlrT03PP-nw3UafJep5aRN5butNG6t-u-Ohjbxvu8rxXeuprpoNt43nrt6nnuIvtoFHSmSj2_LOdhTTNbsItk40Pu2IvS0eXudP2XL1-DyfLbMKtewzB965PDhXkgtClTLXgpQloyVMrQcLELw8tigdhHOAOWl0VhtvJHpf5iN2--etiGjdxWpn42F9uif_AeaTWaA |
| ContentType | Conference Proceeding |
| DBID | 6IE 6IL CBEJK RIE RIL |
| DOI | 10.1109/DMO.2011.5976511 |
| DatabaseName | IEEE Electronic Library (IEL) Conference Proceedings IEEE Xplore POP ALL IEEE Xplore All Conference Proceedings IEEE Electronic Library (IEL) IEEE Proceedings Order Plans (POP All) 1998-Present |
| DatabaseTitleList | |
| Database_xml | – sequence: 1 dbid: RIE name: IEEE Electronic Library (IEL) url: https://proxy.k.utb.cz/login?url=https://ieeexplore.ieee.org/ sourceTypes: Publisher |
| DeliveryMethod | fulltext_linktorsrc |
| Discipline | Computer Science |
| EISBN | 9781612842127 1612842127 |
| EndPage | 102 |
| ExternalDocumentID | 5976511 |
| Genre | orig-research |
| GroupedDBID | 6IE 6IF 6IK 6IL 6IN AAJGR AAWTH ACGFS ADZIZ ALMA_UNASSIGNED_HOLDINGS BEFXN BFFAM BGNUA BKEBE BPEOZ CBEJK CHZPO IEGSK IPLJI M43 OCL RIE RIL |
| ID | FETCH-LOGICAL-i175t-c0dcc3fccbecf26b5372e6ae87504ad0a00fd542167f2cc013e71ca78d851ddb3 |
| IEDL.DBID | RIE |
| ISBN | 9781612842110 1612842119 |
| ISSN | 2155-6938 |
| IngestDate | Wed Sep 10 07:40:40 EDT 2025 |
| IsPeerReviewed | false |
| IsScholarly | false |
| Language | English |
| LinkModel | DirectLink |
| MergedId | FETCHMERGED-LOGICAL-i175t-c0dcc3fccbecf26b5372e6ae87504ad0a00fd542167f2cc013e71ca78d851ddb3 |
| PageCount | 7 |
| ParticipantIDs | ieee_primary_5976511 |
| PublicationCentury | 2000 |
| PublicationDate | 2011-June |
| PublicationDateYYYYMMDD | 2011-06-01 |
| PublicationDate_xml | – month: 06 year: 2011 text: 2011-June |
| PublicationDecade | 2010 |
| PublicationTitle | 2011 3rd Conference on Data Mining and Optimization (DMO) |
| PublicationTitleAbbrev | DMO |
| PublicationYear | 2011 |
| Publisher | IEEE |
| Publisher_xml | – name: IEEE |
| SSID | ssj0001096890 ssj0000669059 |
| Score | 1.5412134 |
| Snippet | In this paper we introduce a novel and efficient approach to detect topics in a large corpus of research papers. With rapidly growing size of academic... |
| SourceID | ieee |
| SourceType | Publisher |
| StartPage | 96 |
| SubjectTerms | Authoritative Score Citation Network Closed Frequent Keyword-set Clustering algorithms Graph Mining Hands Itemsets Noise Optimization Recommender systems Semantics Text analysis Topic Detection Trend analysis |
| Title | A frequent keyword-set based algorithm for topic modeling and clustering of research papers |
| URI | https://ieeexplore.ieee.org/document/5976511 |
| hasFullText | 1 |
| inHoldings | 1 |
| isFullTextHit | |
| isPrint | |
| link | http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV09T8MwELVKJ6YCLeJbHhhxmyaxk4wIqCqkAgOVKjFUztmmFZBEJRUSv56zkxSBGNjiDPlwYt_z8713hJynxle4CpBMylCzcCgTloZRwLjhiEaiFAInEpvcifE0vJ3xWYtcbLQwWmuXfKb79tDt5asc1pYqGyD4FdwKebeiWFRarQ2fgqEzaaCC41cQm8eOYsGgxplIgtjquoSdj62rWW331LS9ZgvTSwbXk_vK27O-34_CKy7ujDpk0jxxlW7y0l-XaR8-f5k5_veVdkjvW-FHHzaxa5e0dLZHOk2JB1qP-C55uqRm5bKtS4rD_QOXquxdl9QGP0Xl63O-WpaLN4rQl5Z5sQTqSuvgNanMFIXXtTVisM3c0NpYaEELWSDq7JHp6ObxaszqegxsiSCjZOApgMAA4Hc3vkh5EPlaSB1bi3ipPOl5RnHsTREZH8ASrNEQZBQrhHVKpcE-aWd5pg8ITbjPASdbjXAqBIDE-JFJdCzjIBBy6B2Sru2reVFZbszrbjr6-_Qx2a6oXkuOnJB2uVrrU8QKZXrmfpIviau4FA |
| linkProvider | IEEE |
| linkToHtml | http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV3PT8IwFG6IHvSECsbf9uDRwtjW_TgalaACeoCExAPpXlshIiM4YuJf72u3YTQevK077Ee39n39-r7vEXKRaFfiKkAwIXzF_JaIWeKHHuOaIxoJE_CsSKzXDzpD_37ERxVyudbCKKVs8plqmEO7ly9TWBmqrIngN-BGyLvJfd_nuVprzahg8IxLsGAZFkTnkSVZMKxxFsReZJRdgZmRja9ZYfhUtp1yE9OJmze9x9zds7jjj9IrNvK0q6RXPnOecPLaWGVJAz5_2Tn-96V2SP1b40ef1tFrl1TUfI9UyyIPtBjzNfJ8RfXS5ltnFAf8By5W2bvKqAl_korZS7qcZpM3iuCXZuliCtQW18FrUjGXFGYrY8VgmqmmhbXQhC7EAnFnnQzbt4PrDisqMrApwoyMgSMBPA2AX167QcK90FWBUJExiRfSEY6jJcfeDELtAhiKNWyBCCOJwE7KxNsnG_N0rg4IjbnLAadbhYDKB4BYu6GOVSQizwtEyzkkNdNX40VuujEuuuno79PnZKsz6HXH3bv-wzHZzolfQ5WckI1suVKniByy5Mz-MF9hU7th |
| openUrl | ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Abook&rft.genre=proceeding&rft.title=2011+3rd+Conference+on+Data+Mining+and+Optimization+%28DMO%29&rft.atitle=A+frequent+keyword-set+based+algorithm+for+topic+modeling+and+clustering+of+research+papers&rft.au=Shubankar%2C+Kumar&rft.au=Singh%2C+AdityaPratap&rft.au=Pudi%2C+Vikram&rft.date=2011-06-01&rft.pub=IEEE&rft.isbn=9781612842110&rft.issn=2155-6938&rft.spage=96&rft.epage=102&rft_id=info:doi/10.1109%2FDMO.2011.5976511&rft.externalDocID=5976511 |
| thumbnail_l | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=2155-6938&client=summon |
| thumbnail_m | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=2155-6938&client=summon |
| thumbnail_s | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=2155-6938&client=summon |