Automatic news-roundup generation using clustering, extraction, and presentation

Along with the growth of the internet, the number of information published increased exponentially. This huge flow of information causes a problem called “information overload” which makes it harder for internet users to find key information they needed on the internet. To solve this, this paper pro...

Full description

Saved in:
Bibliographic Details
Published inMultimedia systems Vol. 26; no. 2; pp. 201 - 221
Main Authors Utomo, Vincent, Leu, Jenq-Shiou
Format Journal Article
LanguageEnglish
Published Berlin/Heidelberg Springer Berlin Heidelberg 01.04.2020
Springer Nature B.V
Subjects
Online AccessGet full text
ISSN0942-4962
1432-1882
DOI10.1007/s00530-019-00638-4

Cover

More Information
Summary:Along with the growth of the internet, the number of information published increased exponentially. This huge flow of information causes a problem called “information overload” which makes it harder for internet users to find key information they needed on the internet. To solve this, this paper proposes an application that helps user find trending news of their query/interest easily. Some challenges include how to determining the trending subtopic, how to extract only the content of each webpage, and how to present the data to user. Therefore, three core modules are used in this study, which are clustering, extraction, and presentation. Several methods are tested in this study, including naïve, manual thresholding, and heuristic clustering method. The result shows that hierarchical clustering using tf–idf word weighting, cosine similarity as distance measure and heuristically terminated using elbow point analysis achieves the best result at 50.84% Acc and 61.96% NMI. One challenge commonly faced by extraction algorithm is the tendency to have lower effectivity over time. In this paper, extraction algorithm using a prior-known subject/keyword to help the content extraction process is used. Second stage of noise removal process is also introduced to further remove noise that exists within the content block. The evaluation result shows improved score of 7.48%. The final application was able to receive score of 4.18 of 5 for its helpfulness and 4.35 of 5 for its effectiveness by respondents; showing that the proposed application could really help users to find information and help to solve information overload problem.
Bibliography:ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 14
ISSN:0942-4962
1432-1882
DOI:10.1007/s00530-019-00638-4