TopicLens: Efficient Multi-Level Visual Topic Exploration of Large-Scale Document Collections

Topic modeling, which reveals underlying topics of a document corpus, has been actively adopted in visual analytics for large-scale document collections. However, due to its significant processing time and non-interactive nature, topic modeling has so far not been tightly integrated into a visual an...

Full description

Saved in:
Bibliographic Details
Published inIEEE transactions on visualization and computer graphics Vol. 23; no. 1; pp. 151 - 160
Main Authors Minjeong Kim, Kyeongpil Kang, Deokgun Park, Jaegul Choo, Elmqvist, Niklas
Format Journal Article
LanguageEnglish
Published United States IEEE 01.01.2017
The Institute of Electrical and Electronics Engineers, Inc. (IEEE)
Subjects
Online AccessGet full text
ISSN1077-2626
1941-0506
1941-0506
DOI10.1109/TVCG.2016.2598445

Cover

More Information
Summary:Topic modeling, which reveals underlying topics of a document corpus, has been actively adopted in visual analytics for large-scale document collections. However, due to its significant processing time and non-interactive nature, topic modeling has so far not been tightly integrated into a visual analytics workflow. Instead, most such systems are limited to utilizing a fixed, initial set of topics. Motivated by this gap in the literature, we propose a novel interaction technique called TopicLens that allows a user to dynamically explore data through a lens interface where topic modeling and the corresponding 2D embedding are efficiently computed on the fly. To support this interaction in real time while maintaining view consistency, we propose a novel efficient topic modeling method and a semi-supervised 2D embedding algorithm. Our work is based on improving state-of-the-art methods such as nonnegative matrix factorization and t-distributed stochastic neighbor embedding. Furthermore, we have built a web-based visual analytics system integrated with TopicLens. We use this system to measure the performance and the visualization quality of our proposed methods. We provide several scenarios showcasing the capability of TopicLens using real-world datasets.
Bibliography:ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 14
content type line 23
ISSN:1077-2626
1941-0506
1941-0506
DOI:10.1109/TVCG.2016.2598445