A Deep Reinforcement Learning-Based Feature Selection Method for Invasive Disease Event Prediction Using Imbalanced Follow-Up Data

The machine learning-based model is a promising paradigm for predicting invasive disease events (iDEs) in breast cancer. Feature selection (FS) is an essential preprocessing technique employed to identify the pertinent features for the prediction model. However, conventional FS methods often fail wi...

Full description

Saved in:

Bibliographic Details
Published in	IEEE journal of biomedical and health informatics Vol. 29; no. 2; pp. 1472 - 1483
Main Authors	Du, Yangyi, Zhou, Xiaojun, Gao, Qian, Yang, Chunhua, Huang, Tingwen
Format	Journal Article
Language	English
Published	United States IEEE 01.02.2025
Subjects	Breast cancer Breast Neoplasms - pathology Datasets as Topic Deep reinforcement learning Diseases Feature extraction feature selection Female Hospitals Humans imbalanced data invasive disease events Iterative methods Neoplasm Invasiveness Prediction algorithms Predictive models Prognosis Prognostics and health management Reinforcement learning Reinforcement Machine Learning Training
Online Access	Get full text
ISSN	2168-2194 2168-2208 2168-2208
DOI	10.1109/JBHI.2024.3497325

Cover

More Information
Summary:	The machine learning-based model is a promising paradigm for predicting invasive disease events (iDEs) in breast cancer. Feature selection (FS) is an essential preprocessing technique employed to identify the pertinent features for the prediction model. However, conventional FS methods often fail with imbalanced clinical data due to the bias towards the majority class. In this paper, a novel FS framework based on reinforcement learning (RLFS) is developed to identify the optimal feature subset for the imbalanced data. The RLFS employs an iterative methodology, wherein data resampling technique generates a balanced dataset before each iteration. A decision network is trained using a deep RL algorithm to identify the relevant features for the dataset in the current iteration. With such an iterative training strategy, numerous constructed datasets gradually boost the FS capacity of the decision network, resulting in a robust performance for imbalanced data. Finally, a weighted model is proposed to determine the most suitable FS solution. The RLFS is employed to predict breast cancer iDEs using real follow-up data. The comparison results demonstrated that RLFS effectively reduces the number of features while outperforming several state-of-the-art FS algorithms.
Bibliography:	ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 23
ISSN:	2168-2194 2168-2208 2168-2208
DOI:	10.1109/JBHI.2024.3497325