Semi-supervised Learning for Phenotyping Tasks

Supervised learning is the dominant approach to automatic electronic health records-based phenotyping, but it is expensive due to the cost of manual chart review. Semi-supervised learning takes advantage of both scarce labeled and plentiful unlabeled data. In this work, we study a family of semi-sup...

Full description

Saved in:

Bibliographic Details
Published in	AMIA ... Annual Symposium proceedings Vol. 2015; pp. 502 - 511
Main Authors	Dligach, Dmitriy, Miller, Timothy, Savova, Guergana K
Format	Journal Article
Language	English
Published	United States American Medical Informatics Association 2015
Subjects	Algorithms Datasets as Topic Disease - classification Electronic Health Records Humans Information Storage and Retrieval - methods Supervised Machine Learning
Online Access	Get full text
ISSN	1942-597X 1559-4076

Cover

More Information
Summary:	Supervised learning is the dominant approach to automatic electronic health records-based phenotyping, but it is expensive due to the cost of manual chart review. Semi-supervised learning takes advantage of both scarce labeled and plentiful unlabeled data. In this work, we study a family of semi-supervised learning algorithms based on Expectation Maximization (EM) in the context of several phenotyping tasks. We first experiment with the basic EM algorithm. When the modeling assumptions are violated, basic EM leads to inaccurate parameter estimation. Augmented EM attenuates this shortcoming by introducing a weighting factor that downweights the unlabeled data. Cross-validation does not always lead to the best setting of the weighting factor and other heuristic methods may be preferred. We show that accurate phenotyping models can be trained with only a few hundred labeled (and a large number of unlabeled) examples, potentially providing substantial savings in the amount of the required manual chart review.
Bibliography:	ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 23
ISSN:	1942-597X 1559-4076