A Deep Reinforcement Learning-Based Feature Selection Method for Invasive Disease Event Prediction Using Imbalanced Follow-Up Data

The machine learning-based model is a promising paradigm for predicting invasive disease events (iDEs) in breast cancer. Feature selection (FS) is an essential preprocessing technique employed to identify the pertinent features for the prediction model. However, conventional FS methods often fail wi...

Full description

Saved in:
Bibliographic Details
Published inIEEE journal of biomedical and health informatics Vol. 29; no. 2; pp. 1472 - 1483
Main Authors Du, Yangyi, Zhou, Xiaojun, Gao, Qian, Yang, Chunhua, Huang, Tingwen
Format Journal Article
LanguageEnglish
Published United States IEEE 01.02.2025
Subjects
Online AccessGet full text
ISSN2168-2194
2168-2208
2168-2208
DOI10.1109/JBHI.2024.3497325

Cover

More Information
Summary:The machine learning-based model is a promising paradigm for predicting invasive disease events (iDEs) in breast cancer. Feature selection (FS) is an essential preprocessing technique employed to identify the pertinent features for the prediction model. However, conventional FS methods often fail with imbalanced clinical data due to the bias towards the majority class. In this paper, a novel FS framework based on reinforcement learning (RLFS) is developed to identify the optimal feature subset for the imbalanced data. The RLFS employs an iterative methodology, wherein data resampling technique generates a balanced dataset before each iteration. A decision network is trained using a deep RL algorithm to identify the relevant features for the dataset in the current iteration. With such an iterative training strategy, numerous constructed datasets gradually boost the FS capacity of the decision network, resulting in a robust performance for imbalanced data. Finally, a weighted model is proposed to determine the most suitable FS solution. The RLFS is employed to predict breast cancer iDEs using real follow-up data. The comparison results demonstrated that RLFS effectively reduces the number of features while outperforming several state-of-the-art FS algorithms.
Bibliography:ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 23
ISSN:2168-2194
2168-2208
2168-2208
DOI:10.1109/JBHI.2024.3497325