Detection of ovarian cancer using a methodology with feature extraction and selection with genetic algorithms and machine learning

Purpose: Ovarian cancer is one of the most lethal forms of gynecological cancer, mainly due to its diagnosis at advanced stages. This study presents a method to predict ovarian cancer by combining machine learning and feature selection using the genetic algorithm GALGO. The research focuses on creat...

Full description

Saved in:
Bibliographic Details
Published inNetwork modeling and analysis in health informatics and bioinformatics (Wien) Vol. 14; no. 1; p. 3
Main Authors Acosta-Jiménez, Samara, Mendoza-Mendoza, Miguel M., Galván-Tejada, Carlos E., Galván-Tejada, Jorge I., Celaya-Padilla, José M., García-Domínguez, Antonio, Gamboa-Rosales, Hamurabi, Solís-Robles, Roberto
Format Journal Article
LanguageEnglish
Published Vienna Springer Vienna 19.12.2024
Springer Nature B.V
Subjects
Online AccessGet full text
ISSN2192-6670
2192-6662
2192-6670
DOI10.1007/s13721-024-00497-8

Cover

More Information
Summary:Purpose: Ovarian cancer is one of the most lethal forms of gynecological cancer, mainly due to its diagnosis at advanced stages. This study presents a method to predict ovarian cancer by combining machine learning and feature selection using the genetic algorithm GALGO. The research focuses on creating an optimized predictive model that uses fewer features without data imputation to minimize biases and provide a more accurate representation of clinical data variability and natural characteristics. Methods: The dataset consists of 309 patients with 47 variables, including demographics, routine blood tests, general chemistry, and tumor markers. 75% of the data are used for feature extraction and training of machine learning models, and 25% are used for blind testing. The GALGO feature selection method is applied to identify the most relevant features, with which three models are built: Support Vector Machine, Random Forest, and Logistic Regression. Each model employed cross-validation with three folds (k-folds=3). Results: GALGO selected six relevant features. The machine learning models also achieved competitive AUCs: Logistic Regression had the best performance at 0.9055, while Support Vector Machine and Random Forest scored 0.8616 and 0.8854, respectively. Conclusion: The proposed methodology generated a promising model for early detection of ovarian cancer and demonstrated that it is possible to maintain high diagnostic accuracy using a reduced number of features. This reduction decreases the computational complexity and costs associated with laboratory tests and improves the efficiency and speed of diagnosis, making the model more practical and applicable in clinical settings. This approach offers a transparent and clinically relevant alternative to improve early detection of ovarian cancer, facilitating its integration into daily clinical practice.
Bibliography:ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 14
ISSN:2192-6670
2192-6662
2192-6670
DOI:10.1007/s13721-024-00497-8