Classification Framework Based on C4.5 Algorithm For Medicinal Data

This study proposes a framework with preprocessing techniques namely Missing value replacement, Discretization, Principal Component Analysis (PCA) to extract the key features and then applying c4.5 classifier algorithm to enhance the classification of medicinal data. The input data gets subjected to...

Full description

Saved in:
Bibliographic Details
Published inInternational journal of computer science and information security Vol. 13; no. 4; p. 63
Main Author Ganesan, Karthik
Format Journal Article
LanguageEnglish
Published Pittsburgh L J S Publishing 01.04.2015
Subjects
Online AccessGet full text
ISSN1947-5500

Cover

More Information
Summary:This study proposes a framework with preprocessing techniques namely Missing value replacement, Discretization, Principal Component Analysis (PCA) to extract the key features and then applying c4.5 classifier algorithm to enhance the classification of medicinal data. The input data gets subjected to missing data imputation through any one of the standard methods like mean, mode, constant and manual input. The dataset is then subjected to Discretization to formalize a reasonable set of discrete bins. PCA is then applied on the dataset to identify the principal components of the dataset, which attribute to the mean data inference. C4.5 algorithm has been used to construct a decision tree based on the information gain of the training set. This work used Cleveland heart disease dataset, obtained from UCI machine learning repository. The dataset is composed of details of about 303 patients and helps to predict presence or absence of cardio vascular disorder based on 75 attributes. The proposed framework was applied on this dataset and exhibited an accuracy of about 77.73%.
Bibliography:SourceType-Scholarly Journals-1
ObjectType-Feature-1
content type line 14
ObjectType-Article-1
ObjectType-Feature-2
content type line 23
ISSN:1947-5500