Multilevel Weighted Support Vector Machine for Classification on Healthcare Data with Missing Values

This work is motivated by the needs of predictive analytics on healthcare data as represented by Electronic Medical Records. Such data is invariably problematic: noisy, with missing entries, with imbalance in classes of interests, leading to serious bias in predictive modeling. Since standard data m...

Full description

Saved in:
Bibliographic Details
Published inPloS one Vol. 11; no. 5; p. e0155119
Main Authors Razzaghi, Talayeh, Roderick, Oleg, Safro, Ilya, Marko, Nicholas
Format Journal Article
LanguageEnglish
Published United States Public Library of Science 19.05.2016
Public Library of Science (PLoS)
Subjects
Online AccessGet full text
ISSN1932-6203
1932-6203
DOI10.1371/journal.pone.0155119

Cover

More Information
Summary:This work is motivated by the needs of predictive analytics on healthcare data as represented by Electronic Medical Records. Such data is invariably problematic: noisy, with missing entries, with imbalance in classes of interests, leading to serious bias in predictive modeling. Since standard data mining methods often produce poor performance measures, we argue for development of specialized techniques of data-preprocessing and classification. In this paper, we propose a new method to simultaneously classify large datasets and reduce the effects of missing values. It is based on a multilevel framework of the cost-sensitive SVM and the expected maximization imputation method for missing values, which relies on iterated regression analyses. We compare classification results of multilevel SVM-based algorithms on public benchmark datasets with imbalanced classes and missing values as well as real data in health applications, and show that our multilevel SVM-based method produces fast, and more accurate and robust classification results.
Bibliography:ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 14
content type line 23
Conceived and designed the experiments: TR, IS, OR, NM. Performed the experiments: TR, OR, IS. Analyzed the data: TR, OR, IS. Contributed reagents/materials/analysis tools: TR, IS, OR, NM. Wrote the paper: TR, IS, OR, NM.
Competing Interests: The authors have declared that no competing interests exist.
ISSN:1932-6203
1932-6203
DOI:10.1371/journal.pone.0155119