Ranking and tagging bursty features in text streams with context language models

Detecting and using bursty patterns to analyze text streams has been one of the fundamental approaches in many temporal text mining applications. So far, most existing studies have focused on developing methods to detect bursty features based purely on term frequency changes. Few have taken the sema...

Full description

Saved in:

Bibliographic Details
Published in	Frontiers of Computer Science Vol. 11; no. 5; pp. 852 - 862
Main Authors	Xin ZHAO, Wayne, LIU, Chen, WEN, Ji-Rong, LI, Xiaoming
Format	Journal Article
Language	English
Published	Beijing Higher Education Press 01.10.2017 Springer Nature B.V
Subjects	bursty feature tagging bursty features bursty features ranking Computer Science Context context modeling Data mining Language Ranking Research Article Semantics Streams Tags Time series 上下文模型应用程序排序文本挖掘标记特征突发性语言模型 bursty features context modeling bursty features ranking bursty feature tagging
Online Access	Get full text
ISSN	2095-2228 2095-2236
DOI	10.1007/s11704-016-5144-z

Cover

More Information
Summary:	Detecting and using bursty patterns to analyze text streams has been one of the fundamental approaches in many temporal text mining applications. So far, most existing studies have focused on developing methods to detect bursty features based purely on term frequency changes. Few have taken the semantic contexts of bursty features into consideration, and as a result the detected bursty features may not always be interesting and can be hard to interpret. In this article, we propose to model the contexts of bursty features using a language modeling approach. We propose two methods to estimate the context language models based on sentence-level context and document-level context.We then propose a novel topic diversity-based metric using the context models to find newsworthy bursty features. We also propose to use the context models to automatically assign meaningful tags to bursty features. Using a large corpus of news articles, we quantitatively show that the proposed context language models for bursty features can effectively help rank bursty features based on their newsworthiness and to assign meaningful tags to annotate bursty features. We also use two example text mining applications to qualitatively demonstrate the usefulness of bursty feature ranking and tagging.
Bibliography:	Detecting and using bursty pattems to analyze text streams has been one of the fundamental approaches in many temporal text mining applications. So far, most existing studies have focused on developing methods to detect bursty features based purely on term frequency changes. Few have taken the semantic contexts of bursty features into consideration, and as a result the detected bursty features may not always be interesting and can be hard to interpret. In this article, we propose to model the contexts of bursty features using a language modeling approach. We propose two methods to estimate the context language models based on sentence-level context and document-level context. We then propose a novel topic diversity-based metric using the context models to find newsworthy bursty features. We also propose to use the context models to automatically assign meaningful tags to bursty features. Using a large corpus of news articles, we quantitatively show that the proposed context language models for bursty features can effectively help rank bursty features based on their newsworthiness and to assign meaningful tags to annotate bursty features. We also use two example text mining applications to qualitatively demonstrate the usefulness of bursty feature ranking and tagging. 11-5731/TP bursty features, bursty features ranking, bursty feature tagging, context modeling bursty features Document received on :2015-04-14 context modeling bursty features ranking Document accepted on :2015-12-01 bursty feature tagging ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14
ISSN:	2095-2228 2095-2236
DOI:	10.1007/s11704-016-5144-z