Domain-Oriented Topic Discovery Based on Features Extraction and Topic Clustering

Topic detection technology can automatically discover new topics on the Internet. This paper investigates domain-oriented feature extraction methods, and proposes a keyword feature extraction method ITFIDF-LP, a subject word feature extraction method LDA-SLP and a topic clustering model based on vec...

Full description

Saved in:
Bibliographic Details
Published inIEEE access Vol. 8; pp. 93648 - 93662
Main Authors Lu, Xiaofeng, Zhou, Xiao, Wang, Wenting, Lio, Pietro, Hui, Pan
Format Journal Article
LanguageEnglish
Published Piscataway IEEE 2020
The Institute of Electrical and Electronics Engineers, Inc. (IEEE)
Subjects
Online AccessGet full text
ISSN2169-3536
2169-3536
DOI10.1109/ACCESS.2020.2994516

Cover

More Information
Summary:Topic detection technology can automatically discover new topics on the Internet. This paper investigates domain-oriented feature extraction methods, and proposes a keyword feature extraction method ITFIDF-LP, a subject word feature extraction method LDA-SLP and a topic clustering model based on vector product similarity. A novel Domain-oriented Topic Discovery based on Features Extraction and Topic Clustering (DTD-FETC) model is proposed to analyze open source web of a domain and identify emerging topics in the domain in real time. This article describes a DTD-FETC system built for cyber security domain. It filters and aggregates web for specical security threat topics such as vulnerability and malware, and helps security staff respond quickly and defends against the emerging cyber threats as early as possible. The recall rate, accuracy and F1 value results of the DTD-FETC method applied to the cyber security dataset are all above 0.99.
Bibliography:ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 14
ISSN:2169-3536
2169-3536
DOI:10.1109/ACCESS.2020.2994516