Machine learning-based statistical analysis for early stage detection of cervical cancer

Cervical cancer (CC) is the most common type of cancer in women and remains a significant cause of mortality, particularly in less developed countries, although it can be effectively treated if detected at an early stage. This study aimed to find efficient machine-learning-based classifying models t...

Full description

Saved in:

Bibliographic Details
Published in	Computers in biology and medicine Vol. 139; p. 104985
Main Authors	Ali, Md Mamun, Ahmed, Kawsar, Bui, Francis M., Paul, Bikash Kumar, Ibrahim, Sobhy M., Quinn, Julian M.W., Moni, Mohammad Ali
Format	Journal Article
Language	English
Published	United States Elsevier Ltd 01.12.2021 Elsevier Limited
Subjects	Accuracy Algorithms Biopsy Cancer Cellular biology Cervical cancer Cervix Classification Cluster Analysis Cytology Datasets Decision trees Developing countries Early Detection of Cancer Efficiency Female Hinselmann Human papillomavirus Humans Internal Medicine Iodine LDCs Learning algorithms Machine Learning Medical screening Neural networks Other Pap smear Random tree Risk analysis Risk factors Schiller Statistical analysis Supervised Machine Learning Support vector machines Systems design Transformations (mathematics) Trigonometric functions Uterine Cervical Neoplasms - diagnosis Womens health Schiller Hinselmann Random tree Biopsy Cervical cancer Cytology
Online Access	Get full text
ISSN	0010-4825 1879-0534 1879-0534
DOI	10.1016/j.compbiomed.2021.104985

Cover

More Information
Summary:	Cervical cancer (CC) is the most common type of cancer in women and remains a significant cause of mortality, particularly in less developed countries, although it can be effectively treated if detected at an early stage. This study aimed to find efficient machine-learning-based classifying models to detect early stage CC using clinical data. We obtained a Kaggle data repository CC dataset which contained four classes of attributes including biopsy, cytology, Hinselmann, and Schiller. This dataset was split into four categories based on these class attributes. Three feature transformation methods, including log, sine function, and Z-score were applied to these datasets. Several supervised machine learning algorithms were assessed for their performance in classification. A Random Tree (RT) algorithm provided the best classification accuracy for the biopsy (98.33%) and cytology (98.65%) data, whereas Random Forest (RF) and Instance-Based K-nearest neighbor (IBk) provided the best performance for Hinselmann (99.16%), and Schiller (98.58%) respectively. Among the feature transformation methods, logarithmic gave the best performance for biopsy datasets whereas sine function was superior for cytology. Both logarithmic and sine functions performed the best for the Hinselmann dataset, while Z-score was best for the Schiller dataset. Various Feature Selection Techniques (FST) methods were applied to the transformed datasets to identify and prioritize important risk factors. The outcomes of this study indicate that appropriate system design and tuning, machine learning methods and classification are able to detect CC accurately and efficiently in its early stages using clinical data. •This study aimed to find an efficient machine-learning-based classifier and models to detect early stage Cervical Cancer.•Three feature transformation methods, such as log, sine function, and Z-score were applied to the datasets.•Logarithmic was the best performer for biopsy datasets whereas sine function was superior for cytology.•Logarithmic and sine functions performed the best for the Hinselmann dataset, while Z-score was best for the Schiller dataset.•Various FST methods were applied to the transformed datasets to identify and prioritize important risk factors.
Bibliography:	ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14 content type line 23
ISSN:	0010-4825 1879-0534 1879-0534
DOI:	10.1016/j.compbiomed.2021.104985