Performance comparison of multi-label learning algorithms on clinical data for chronic diseases

We are motivated by the issue of classifying diseases of chronically ill patients to assist physicians in their everyday work. Our goal is to provide a performance comparison of state-of-the-art multi-label learning algorithms for the analysis of multivariate sequential clinical data from medical re...

Full description

Saved in:

Bibliographic Details
Published in	Computers in biology and medicine Vol. 65; pp. 34 - 43
Main Authors	Zufferey, Damien, Hofer, Thomas, Hennebert, Jean, Schumacher, Michael, Ingold, Rolf, Bromuri, Stefano
Format	Journal Article
Language	English
Published	United States Elsevier Ltd 01.10.2015 Elsevier Limited
Subjects	Algorithms Chronic Disease Chronic illnesses Chronic obstructive pulmonary disease Classification Clinical data Coma Complex patient Data mining Data processing Databases, Factual Diabetes Diagnosis, Computer-Assisted - methods Disease Female Humans Internal Medicine Machine Learning Male Multi-label learning Other Pain Patients Physicians Plasma Proteins Summary statistics Urine Clinical data Multi-label learning Chronic disease Complex patient Summary statistics
Online Access	Get full text
ISSN	0010-4825 1879-0534
DOI	10.1016/j.compbiomed.2015.07.017

Cover

More Information
Summary:	We are motivated by the issue of classifying diseases of chronically ill patients to assist physicians in their everyday work. Our goal is to provide a performance comparison of state-of-the-art multi-label learning algorithms for the analysis of multivariate sequential clinical data from medical records of patients affected by chronic diseases. As a matter of fact, the multi-label learning approach appears to be a good candidate for modeling overlapped medical conditions, specific to chronically ill patients. With the availability of such comparison study, the evaluation of new algorithms should be enhanced. According to the method, we choose a summary statistics approach for the processing of the sequential clinical data, so that the extracted features maintain an interpretable link to their corresponding medical records. The publicly available MIMIC-II dataset, which contains more than 19,000 patients with chronic diseases, is used in this study. For the comparison we selected the following multi-label algorithms: ML-kNN, AdaBoostMH, binary relevance, classifier chains, HOMER and RAkEL. Regarding the results, binary relevance approaches, despite their elementary design and their independence assumption concerning the chronic illnesses, perform optimally in most scenarios, in particular for the detection of relevant diseases. In addition, binary relevance approaches scale up to large dataset and are easy to learn. However, the RAkEL algorithm, despite its scalability problems when it is confronted to large dataset, performs well in the scenario which consists of the ranking of the labels according to the dominant disease of the patient. [Display omitted] •We evaluate multi-label learning algorithms for the analysis of clinical data.•We focus on patients affected by multiple chronic diseases.•We use a summary statistics approach to extract features on medical time series.
Bibliography:	ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14 content type line 23
ISSN:	0010-4825 1879-0534
DOI:	10.1016/j.compbiomed.2015.07.017