Adapting electronic health records-derived phenotypes to claims data: Lessons learned in using limited clinical data for phenotyping

[Display omitted] •Coarse code granularity, erroneous data entry and poor generalizability may influence the performance of phenotyping algorithms.•Vocabulary-driven methods for concept sets creation shows advantages in improving the accuracy for phenotyping.•Observational Health Data Sciences and I...

Full description

Saved in:
Bibliographic Details
Published inJournal of biomedical informatics Vol. 102; p. 103363
Main Authors Ostropolets, Anna, Reich, Christian, Ryan, Patrick, Shang, Ning, Hripcsak, George, Weng, Chunhua
Format Journal Article
LanguageEnglish
Published United States Elsevier Inc 01.02.2020
Subjects
Online AccessGet full text
ISSN1532-0464
1532-0480
1532-0480
DOI10.1016/j.jbi.2019.103363

Cover

More Information
Summary:[Display omitted] •Coarse code granularity, erroneous data entry and poor generalizability may influence the performance of phenotyping algorithms.•Vocabulary-driven methods for concept sets creation shows advantages in improving the accuracy for phenotyping.•Observational Health Data Sciences and Informatics (OHDSI) OMOP Common Data Model facilitate phenotype generalizability and consistency.•More data is not necessarily better: performance of a diagnosis-based chronic kidney failure algorithm is not improved by adding other codes indirectly related to chronic kidney disorder. Algorithms for identifying patients of interest from observational data must address missing and inaccurate data and are desired to achieve comparable performance on both administrative claims and electronic health records data. However, administrative claims data do not contain the necessary information to develop accurate algorithms for disorders that require laboratory results, and this omission can result in insensitive diagnostic code-based algorithms. In this paper, we tested our assertion that the performance of a diagnosis code-based algorithm for chronic kidney disorder (CKD) can be improved by adding other codes indirectly related to CKD (e.g., codes for dialysis, kidney transplant, suspicious kidney disorders). Following the best practices from Observational Health Data Sciences and Informatics (OHDSI), we adapted an electronic health record-based gold standard algorithm for CKD and then created algorithms that can be executed on administrative claims data and account for related data quality issues. We externally validated our algorithms on four electronic health record datasets in the OHDSI network. Compared to the algorithm that uses CKD diagnostic codes only, positive predictive value of the algorithms that use additional codes was slightly increased (47.4% vs. 47.9–48.5% respectively). The algorithms adapted from the gold standard algorithm can be used to infer chronic kidney disorder based on administrative claims data. We succeeded in improving the generalizability and consistency of the CKD phenotypes by using data and vocabulary standardized across the OHDSI network, although performance variability across datasets remains. We showed that identifying and addressing coding and data heterogeneity can improve the performance of the algorithms.
Bibliography:ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 23
Anna Ostropolets: Conceptualization, Methodology, Algorithm, Data Collection and Analysis, Writing- Original draft preparation and revision
Christian Reich and Patrick Ryan: Conceptualization, Results Validation, Discussion, Methodology, Writing- Reviewing and Editing
Author Contributions
George Hripcsak, Chunhua Weng: Co-Supervision, Conceptualization, Discussion, Supervision, Writing- Reviewing and Editing
Ning Shang: Phenotyping Knowledge Resource Provision, Writing- Reviewing
ISSN:1532-0464
1532-0480
1532-0480
DOI:10.1016/j.jbi.2019.103363