Finding Core Topics: Topic Extraction with Clustering on Tweet

Twitter is one of the most popular microblogging services that lets users post short text called Tweet. Tweet is distinguished from conventional text data in that it is typically composed of short and informal message, and it makes typical text analysis methods do not work well. Accordingly, extract...

Full description

Saved in:
Bibliographic Details
Published in2012 International Conference on Cloud and Green Computing pp. 777 - 782
Main Authors Sungchul Kim, Sungho Jeon, Jinha Kim, Young-Ho Park, Hwanjo Yu
Format Conference Proceeding
LanguageEnglish
Published IEEE 01.11.2012
Subjects
Online AccessGet full text
ISBN1467330272
9781467330275
DOI10.1109/CGC.2012.120

Cover

Abstract Twitter is one of the most popular microblogging services that lets users post short text called Tweet. Tweet is distinguished from conventional text data in that it is typically composed of short and informal message, and it makes typical text analysis methods do not work well. Accordingly, extracting meaningful topics from tweets brings up new challenges. In this work, we propose a simple and novel method called Core-Topic-based Clustering (CTC), which extracts topics and cluster tweets simultaneously based on the clustering principles: minimizing the inter-cluster similarity and maximizing the intra-cluster similarity. Experimental results show that our method efficiently extracts meaningful topics, and the clustering performance is better than K-means algorithm.
AbstractList Twitter is one of the most popular microblogging services that lets users post short text called Tweet. Tweet is distinguished from conventional text data in that it is typically composed of short and informal message, and it makes typical text analysis methods do not work well. Accordingly, extracting meaningful topics from tweets brings up new challenges. In this work, we propose a simple and novel method called Core-Topic-based Clustering (CTC), which extracts topics and cluster tweets simultaneously based on the clustering principles: minimizing the inter-cluster similarity and maximizing the intra-cluster similarity. Experimental results show that our method efficiently extracts meaningful topics, and the clustering performance is better than K-means algorithm.
Author Jinha Kim
Sungchul Kim
Hwanjo Yu
Sungho Jeon
Young-Ho Park
Author_xml – sequence: 1
  surname: Sungchul Kim
  fullname: Sungchul Kim
  email: subright@postech.ac.kr
  organization: Dept. of Comput. Sci. & Eng., POSTECH, Pohang, South Korea
– sequence: 2
  surname: Sungho Jeon
  fullname: Sungho Jeon
  email: sdeva@postech.ac.kr
  organization: Dept. of Comput. Sci. & Eng., POSTECH, Pohang, South Korea
– sequence: 3
  surname: Jinha Kim
  fullname: Jinha Kim
  email: goldbar@postech.ac.kr
  organization: Dept. of Comput. Sci. & Eng., POSTECH, Pohang, South Korea
– sequence: 4
  surname: Young-Ho Park
  fullname: Young-Ho Park
  email: yhpark@sm.ac.kr
  organization: Dept. of Multimedia Sci., Sookmyung Womens Univ., Seoul, South Korea
– sequence: 5
  surname: Hwanjo Yu
  fullname: Hwanjo Yu
  email: hwanjoyu@postech.ac.kr
  organization: Dept. of Creative IT Excellence Eng., POSTECH, Pohang, South Korea
BookMark eNotjM1KAzEURiMq6NTu3LmZF5jx5j9xIUhoq1BwM65LTO5opM6UTKT69lbq6sDH-U5FzoZxQEKuKbSUgr11K9cyoKylDE5IBVpZKYwS4pRUVCjNOTDNLsh8mj4A4PBRlLNLcr9MQ0zDW-3GjHU37lKY7o6sF98l-1DSONT7VN5rt_2aCuY_-zB1e8RyRc57v51w_s8ZeVkuOvfYrJ9XT-5h3SSqZWkosN4qLqV8jUEbz9FIakOvg4o0-mg0hmAC8xa8VMyAUFYLlDxyFEIoPiM3x25CxM0up0-ffzaKG2ZB8l-FAkij
CODEN IEEPAD
ContentType Conference Proceeding
DBID 6IE
6IL
CBEJK
RIE
RIL
DOI 10.1109/CGC.2012.120
DatabaseName IEEE Electronic Library (IEL) Conference Proceedings
IEEE Xplore POP ALL
IEEE Xplore All Conference Proceedings
IEEE Electronic Library (IEL)
IEEE Proceedings Order Plans (POP All) 1998-Present
DatabaseTitleList
Database_xml – sequence: 1
  dbid: RIE
  name: IEEE Electronic Library (IEL)
  url: https://proxy.k.utb.cz/login?url=https://ieeexplore.ieee.org/
  sourceTypes: Publisher
DeliveryMethod fulltext_linktorsrc
EISBN 0769548644
9780769548647
EndPage 782
ExternalDocumentID 6382905
Genre orig-research
GroupedDBID 6IE
6IF
6IK
6IL
6IN
AAJGR
AAWTH
ADFMO
ALMA_UNASSIGNED_HOLDINGS
BEFXN
BFFAM
BGNUA
BKEBE
BPEOZ
CBEJK
IEGSK
IERZE
OCL
RIB
RIC
RIE
RIL
ID FETCH-LOGICAL-i175t-102f963555bdc78a3e8519cf7c6d1dad87ecc8c2a90a5628046974e53d3e44463
IEDL.DBID RIE
ISBN 1467330272
9781467330275
IngestDate Wed Aug 27 03:42:09 EDT 2025
IsPeerReviewed false
IsScholarly false
Language English
LinkModel DirectLink
MergedId FETCHMERGED-LOGICAL-i175t-102f963555bdc78a3e8519cf7c6d1dad87ecc8c2a90a5628046974e53d3e44463
PageCount 6
ParticipantIDs ieee_primary_6382905
PublicationCentury 2000
PublicationDate 2012-Nov.
PublicationDateYYYYMMDD 2012-11-01
PublicationDate_xml – month: 11
  year: 2012
  text: 2012-Nov.
PublicationDecade 2010
PublicationTitle 2012 International Conference on Cloud and Green Computing
PublicationTitleAbbrev cgc
PublicationYear 2012
Publisher IEEE
Publisher_xml – name: IEEE
SSID ssj0001106132
Score 1.5730333
Snippet Twitter is one of the most popular microblogging services that lets users post short text called Tweet. Tweet is distinguished from conventional text data in...
SourceID ieee
SourceType Publisher
StartPage 777
SubjectTerms Clustering algorithms
document clustering
Encyclopedias
Internet
social network
topic extraction
Twitter
Vectors
Title Finding Core Topics: Topic Extraction with Clustering on Tweet
URI https://ieeexplore.ieee.org/document/6382905
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
link http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV3NS8MwFA9zJ08qm_hNDx5tZ5Lmy4OXsjmEiYcNdhtt8gJD2Ya2KP715rX7EPHgqWkIJWlK3--9vN_vEXLtvS9ypoKnKo3HaBWLjdM-psLbgJ5ZIWqx6tGTHE7Sx6mYtsjNlgsDAHXyGSTYrM_y3dJWGCrrhW-FGRQs3VNaNlytXTwFfRvOau6WVByP49hG0ml9L7aJ76aXPWSY18USipW-fxRWqe3K4ICMNjNq0klekqosEvv1S6zxv1M-JN0dgy963tqmI9KCRYfcD-Y1hyXKwlOi8XI1t-93zTXqf5ZvDcchwtBslL1WKKGAo0PX-AOg7JLJoD_OhvG6fkI8D6CgDH9Y5g0CClE4q3TOIcArY72y0lGXO63C_mnLcnObBxik0VVWKQjuOKTBTeTHpL1YLuCERKAldVQp6h1LJdhcMDAofUa1Ljgzp6SDa5-tGomM2XrZZ393n5N9fPcNpe-CtMu3Ci6DbS-Lq3pTvwHfKZ5S
linkProvider IEEE
linkToHtml http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV09T8MwELWqMsAEqEV8k4GRpMSJY5uBJWop0FYMqdStSuyzVIGaqiQC8evxJf1AiIEpiRVFdi7KvTvfe0fItTEmSym3kWokDWarqCu1MK7PjLLomWasEqsejqL-OHyasEmD3Gy4MABQFZ-Bh6fVXr7OVYmpso79VqhEwdIdFoYhq9la24wKRjcBrdhbEQ9wQ46uRZ1W12xT-i478UOMlV3U87HX94_WKpVn6e2T4XpOdUHJq1cWmae-fsk1_nfSB6S95fA5LxvvdEgaMG-R-96sYrE4sX2Kk-SLmXq_q49O97NY1iwHB5OzTvxWoogC3m2Hkg-Aok3GvW4S991VBwV3ZmFBYf-x1EiEFCzTios0AAuwpDJcRdrXqRbcWlAomsrb1AIhgcEyD4EFOgD7hqPgiDTn-RyOiQMi8rXPuW80DSNQKaMgUfzMFyILqDwhLVz7dFGLZExXyz79e_iK7PaT4WA6eBw9n5E9tENN8DsnzWJZwoX19EV2WRn4G2KYoZ8
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Abook&rft.genre=proceeding&rft.title=2012+International+Conference+on+Cloud+and+Green+Computing&rft.atitle=Finding+Core+Topics%3A+Topic+Extraction+with+Clustering+on+Tweet&rft.au=Sungchul+Kim&rft.au=Sungho+Jeon&rft.au=Jinha+Kim&rft.au=Young-Ho+Park&rft.date=2012-11-01&rft.pub=IEEE&rft.isbn=9781467330275&rft.spage=777&rft.epage=782&rft_id=info:doi/10.1109%2FCGC.2012.120&rft.externalDocID=6382905
thumbnail_l http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=9781467330275/lc.gif&client=summon&freeimage=true
thumbnail_m http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=9781467330275/mc.gif&client=summon&freeimage=true
thumbnail_s http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=9781467330275/sc.gif&client=summon&freeimage=true