Expert-augmented machine learning

Machine learning is proving invaluable across disciplines. However, its success is often limited by the quality and quantity of available data, while its adoption is limited by the level of trust afforded by given models. Human vs. machine performance is commonly compared empirically to decide wheth...

Full description

Saved in:

Bibliographic Details
Published in	Proceedings of the National Academy of Sciences - PNAS Vol. 117; no. 9; pp. 4571 - 4577
Main Authors	Gennatas, Efstathios D., Friedman, Jerome H., Ungar, Lyle H., Pirracchio, Romain, Eaton, Eric, Reichmann, Lara G., Interian, Yannet, Luna, José Marcio, Simone, Charles B., Auerbach, Andrew, Delgado, Elier, van der Laan, Mark J., Solberg, Timothy D., Valdes, Gilmer
Format	Journal Article
Language	English
Published	United States National Academy of Sciences 03.03.2020
Subjects	Automation Biological Sciences Human performance Learning algorithms Machine learning Physical Sciences Risk assessment machine learning computational medicine medicine
Online Access	Get full text
ISSN	0027-8424 1091-6490 1091-6490
DOI	10.1073/pnas.1906831117

Cover

More Information
Summary:	Machine learning is proving invaluable across disciplines. However, its success is often limited by the quality and quantity of available data, while its adoption is limited by the level of trust afforded by given models. Human vs. machine performance is commonly compared empirically to decide whether a certain task should be performed by a computer or an expert. In reality, the optimal learning strategy may involve combining the complementary strengths of humans and machines. Here, we present expertaugmented machine learning (EAML), an automated method that guides the extraction of expert knowledge and its integration into machine-learned models. We used a large dataset of intensive-care patient data to derive 126 decision rules that predict hospital mortality. Using an online platform, we asked 15 clinicians to assess the relative risk of the subpopulation defined by each rule compared to the total sample. We compared the clinician-assessed risk to the empirical risk and found that, while clinicians agreed with the data in most cases, there were notable exceptions where they overestimated or underestimated the true risk. Studying the rules with greatest disagreement, we identified problems with the training data, including one miscoded variable and one hidden confounder. Filtering the rules based on the extent of disagreement between clinician-assessed risk and empirical risk, we improved performance on out-of-sample data and were able to train with less data. EAML provides a platform for automated creation of problemspecific priors, which help build robust and dependable machinelearning models in critical applications.
Bibliography:	ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14 content type line 23 Edited by Peter J. Bickel, University of California, Berkeley, CA, and approved January 22, 2020 (received for review April 29, 2019) Author contributions: E.D.G., J.H.F., and G.V. designed research; E.D.G. and G.V. performed research; E.D.G., R.P., and L.G.R. contributed new reagents/analytic tools; E.D.G. analyzed data; and E.D.G., J.H.F., L.H.U., R.P., E.E., L.G.R., Y.I., J.M.L., C.B.S., A.A., E.D., M.J.v.d.L., T.D.S., and G.V. wrote the paper. 2Present address: Department of Radiation Oncology, Stanford University, Stanford, CA 94305.
ISSN:	0027-8424 1091-6490 1091-6490
DOI:	10.1073/pnas.1906831117