Cancer-related Keywords in 2023: Insights from Text Mining of a Major Consumer Portal

Objectives: With the growing importance of monitoring cancer patients’ internet usage, there is an increasing need for technology that expands access to relevant information through text mining. This study analyzed internet articles from portal sites in 2023 to identify trends in the information ava...

Full description

Saved in:
Bibliographic Details
Published inHealthcare informatics research Vol. 30; no. 4; pp. 398 - 408
Main Authors Jeong, Wonjeong, Song, Eunkyoung, Jeong, Eunzi, Oh, Kyoung Hee, Lee, Hye-Sun, Jun, Jae Kwan
Format Journal Article
LanguageEnglish
Published Korea (South) Korean Society of Medical Informatics 01.10.2024
The Korean Society of Medical Informatics
대한의료정보학회
Subjects
Online AccessGet full text
ISSN2093-369X
2093-3681
2093-369X
DOI10.4258/hir.2024.30.4.398

Cover

More Information
Summary:Objectives: With the growing importance of monitoring cancer patients’ internet usage, there is an increasing need for technology that expands access to relevant information through text mining. This study analyzed internet articles from portal sites in 2023 to identify trends in the information available to cancer patients and to derive meaningful insights.Methods: This study analyzed 19,578 news articles published on Naver, a major Korean portal site, from January 1, 2023, to December 31, 2023. Natural language processing, text mining, network analysis, and word cloud analysis were employed. The search term “am” (Korean for “cancer”) was used to identify keywords related to cancer.Results: In 2023, an average of 1,631 cancer-related articles were published monthly, with a peak of 1,946 in September and a low of 1,371 in February. A total of 132,456 keywords were extracted, with “cure” (2,218 occurrences), “lung cancer” (1,652), and “breast cancer” (1,235) being the most frequent. Term frequency-inverse document frequency analysis ranked “struggle” (1064.172) as the most significant keyword, followed by “lung cancer” (839.988) and “breast cancer” (744.840). Network analysis revealed four distinct clusters focusing on treatment, celebrity-related issues, major cancer types, and cancer-causing factors.Conclusions: The analysis of cancer-related keywords in 2023 indicates that news articles often prioritize gossip over essential information. These findings provide foundational data for future policy directions and strategies to address misinformation. This study underscores the importance of understanding the nature of cancer-related information consumed by the public and offers insights to guide official policies and healthcare practices.
Bibliography:ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 23
These authors contributed equally to this work.
https://doi.org/10.4258/hir.2024.30.4.398
ISSN:2093-369X
2093-3681
2093-369X
DOI:10.4258/hir.2024.30.4.398