Automatic news-roundup generation using clustering, extraction, and presentation
Along with the growth of the internet, the number of information published increased exponentially. This huge flow of information causes a problem called “information overload” which makes it harder for internet users to find key information they needed on the internet. To solve this, this paper pro...
Saved in:
Published in | Multimedia systems Vol. 26; no. 2; pp. 201 - 221 |
---|---|
Main Authors | , |
Format | Journal Article |
Language | English |
Published |
Berlin/Heidelberg
Springer Berlin Heidelberg
01.04.2020
Springer Nature B.V |
Subjects | |
Online Access | Get full text |
ISSN | 0942-4962 1432-1882 |
DOI | 10.1007/s00530-019-00638-4 |
Cover
Summary: | Along with the growth of the internet, the number of information published increased exponentially. This huge flow of information causes a problem called “information overload” which makes it harder for internet users to find key information they needed on the internet. To solve this, this paper proposes an application that helps user find trending news of their query/interest easily. Some challenges include how to determining the trending subtopic, how to extract only the content of each webpage, and how to present the data to user. Therefore, three core modules are used in this study, which are clustering, extraction, and presentation. Several methods are tested in this study, including naïve, manual thresholding, and heuristic clustering method. The result shows that hierarchical clustering using tf–idf word weighting, cosine similarity as distance measure and heuristically terminated using elbow point analysis achieves the best result at 50.84% Acc and 61.96% NMI. One challenge commonly faced by extraction algorithm is the tendency to have lower effectivity over time. In this paper, extraction algorithm using a prior-known subject/keyword to help the content extraction process is used. Second stage of noise removal process is also introduced to further remove noise that exists within the content block. The evaluation result shows improved score of 7.48%. The final application was able to receive score of 4.18 of 5 for its helpfulness and 4.35 of 5 for its effectiveness by respondents; showing that the proposed application could really help users to find information and help to solve information overload problem. |
---|---|
Bibliography: | ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14 |
ISSN: | 0942-4962 1432-1882 |
DOI: | 10.1007/s00530-019-00638-4 |