Polar labeling: silver standard algorithm for training disease classifiers
Abstract Motivation Expert-labeled data are essential to train phenotyping algorithms for cohort identification. However expert labeling is time and labor intensive, and the costs remain prohibitive for scaling phenotyping to wider use-cases. Results We present an approach referred to as polar label...
Saved in:
| Published in | Bioinformatics Vol. 36; no. 10; pp. 3200 - 3206 |
|---|---|
| Main Authors | , , , |
| Format | Journal Article |
| Language | English |
| Published |
England
Oxford University Press
01.05.2020
|
| Subjects | |
| Online Access | Get full text |
| ISSN | 1367-4803 1367-4811 1460-2059 1367-4811 |
| DOI | 10.1093/bioinformatics/btaa088 |
Cover
| Summary: | Abstract
Motivation
Expert-labeled data are essential to train phenotyping algorithms for cohort identification. However expert labeling is time and labor intensive, and the costs remain prohibitive for scaling phenotyping to wider use-cases.
Results
We present an approach referred to as polar labeling (PL), to create silver standard for training machine learning (ML) for disease classification. We test the hypothesis that ML models trained on the silver standard created by applying PL on unlabeled patient records, are comparable in performance to the ML models trained on gold standard, created by clinical experts through manual review of patient records. We perform experimental validation using health records of 38 023 patients spanning six diseases. Our results demonstrate the superior performance of the proposed approach.
Availability and implementation
We provide a Python implementation of the algorithm and the Python code developed for this study on Github.
Supplementary information
Supplementary data are available at Bioinformatics online. |
|---|---|
| Bibliography: | ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 23 |
| ISSN: | 1367-4803 1367-4811 1460-2059 1367-4811 |
| DOI: | 10.1093/bioinformatics/btaa088 |