Issues in the Mining of Heart Failure Datasets

This paper investigates the characteristics of a clinical dataset using a combination of feature selection and classification methods to handle missing values and understand the underlying statistical characteristics of a typical clinical dataset. Typically, when a large clinical dataset is presente...

Full description

Saved in:

Bibliographic Details
Published in	International journal of automation and computing Vol. 11; no. 2; pp. 162 - 179
Main Authors	Poolsawad, Nongnuch, Moore, Lisa, Kambhampati, Chandrasekhar, Cleland, John G. F.
Format	Journal Article
Language	English
Published	Berlin/Heidelberg Springer-Verlag 01.04.2014 Springer Nature B.V
Subjects	Algorithms CAE) and Design Classification Classifiers clinical clustering Computer Applications Computer-Aided Engineering (CAD Control Data mining dataset Datasets Decision trees Diagnostic systems Engineering Exploration failure feature Feature selection Health care Heart Heart failure Human error Learning Machine learning Mechatronics Medical prognosis missing Missing data Patients Radial basis function Representations Robotics selection Supervised learning values Variables Heart failure clinical dataset clustering classification feature selection missing values
Online Access	Get full text
ISSN	1476-8186 2153-182X 1751-8520 1751-8520 2153-1838
DOI	10.1007/s11633-014-0778-5

Cover

More Information
Summary:	This paper investigates the characteristics of a clinical dataset using a combination of feature selection and classification methods to handle missing values and understand the underlying statistical characteristics of a typical clinical dataset. Typically, when a large clinical dataset is presented, it consists of challenges such as missing values, high dimensionality, and unbalanced classes. These pose an inherent problem when implementing feature selection and classification algorithms. With most clinical datasets, an initial exploration of the dataset is carried out, and those attributes with more than a certain percentage of missing values are eliminated from the dataset. Later, with the help of missing value imputation, feature selection and classification algorithms, prognostic and diagnostic models are developed. This paper has two main conclusions: 1) Despite the nature of clinical datasets, and their large size, methods for missing value imputation do not affect the final performance. What is crucial is that the dataset is an accurate representation of the clinical problem and those methods of imputing missing values are not critical for developing classifiers and prognostic/diagnostic models. 2) Supervised learning has proven to be more suitable for mining clinical data than unsupervised methods. It is also shown that non-parametric classifiers such as decision trees give better results when compared to parametric classifiers such as radial basis function networks(RBFNs).
Bibliography:	Nongnuch Poolsawad;Lisa Moore;Chandrasekhar Kambhampati;John G. F. Cleland;Intelligent Systems Research Group(IS, Department of Computer Science), University of Hull;Hull York Medical School,Department of Cardiology,University of Hull 11-5350/TP SourceType-Scholarly Journals-1 ObjectType-Feature-1 content type line 14 ObjectType-Article-1 ObjectType-Feature-2 ObjectType-Article-2 content type line 23
ISSN:	1476-8186 2153-182X 1751-8520 1751-8520 2153-1838
DOI:	10.1007/s11633-014-0778-5