Machine learning-based statistical analysis for early stage detection of cervical cancer

Cervical cancer (CC) is the most common type of cancer in women and remains a significant cause of mortality, particularly in less developed countries, although it can be effectively treated if detected at an early stage. This study aimed to find efficient machine-learning-based classifying models t...

Full description

Saved in:
Bibliographic Details
Published inComputers in biology and medicine Vol. 139; p. 104985
Main Authors Ali, Md Mamun, Ahmed, Kawsar, Bui, Francis M., Paul, Bikash Kumar, Ibrahim, Sobhy M., Quinn, Julian M.W., Moni, Mohammad Ali
Format Journal Article
LanguageEnglish
Published United States Elsevier Ltd 01.12.2021
Elsevier Limited
Subjects
Online AccessGet full text
ISSN0010-4825
1879-0534
1879-0534
DOI10.1016/j.compbiomed.2021.104985

Cover

More Information
Summary:Cervical cancer (CC) is the most common type of cancer in women and remains a significant cause of mortality, particularly in less developed countries, although it can be effectively treated if detected at an early stage. This study aimed to find efficient machine-learning-based classifying models to detect early stage CC using clinical data. We obtained a Kaggle data repository CC dataset which contained four classes of attributes including biopsy, cytology, Hinselmann, and Schiller. This dataset was split into four categories based on these class attributes. Three feature transformation methods, including log, sine function, and Z-score were applied to these datasets. Several supervised machine learning algorithms were assessed for their performance in classification. A Random Tree (RT) algorithm provided the best classification accuracy for the biopsy (98.33%) and cytology (98.65%) data, whereas Random Forest (RF) and Instance-Based K-nearest neighbor (IBk) provided the best performance for Hinselmann (99.16%), and Schiller (98.58%) respectively. Among the feature transformation methods, logarithmic gave the best performance for biopsy datasets whereas sine function was superior for cytology. Both logarithmic and sine functions performed the best for the Hinselmann dataset, while Z-score was best for the Schiller dataset. Various Feature Selection Techniques (FST) methods were applied to the transformed datasets to identify and prioritize important risk factors. The outcomes of this study indicate that appropriate system design and tuning, machine learning methods and classification are able to detect CC accurately and efficiently in its early stages using clinical data. •This study aimed to find an efficient machine-learning-based classifier and models to detect early stage Cervical Cancer.•Three feature transformation methods, such as log, sine function, and Z-score were applied to the datasets.•Logarithmic was the best performer for biopsy datasets whereas sine function was superior for cytology.•Logarithmic and sine functions performed the best for the Hinselmann dataset, while Z-score was best for the Schiller dataset.•Various FST methods were applied to the transformed datasets to identify and prioritize important risk factors.
Bibliography:ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 14
content type line 23
ISSN:0010-4825
1879-0534
1879-0534
DOI:10.1016/j.compbiomed.2021.104985