Adapting electronic health records-derived phenotypes to claims data: Lessons learned in using limited clinical data for phenotyping
[Display omitted] •Coarse code granularity, erroneous data entry and poor generalizability may influence the performance of phenotyping algorithms.•Vocabulary-driven methods for concept sets creation shows advantages in improving the accuracy for phenotyping.•Observational Health Data Sciences and I...
Saved in:
| Published in | Journal of biomedical informatics Vol. 102; p. 103363 |
|---|---|
| Main Authors | , , , , , |
| Format | Journal Article |
| Language | English |
| Published |
United States
Elsevier Inc
01.02.2020
|
| Subjects | |
| Online Access | Get full text |
| ISSN | 1532-0464 1532-0480 1532-0480 |
| DOI | 10.1016/j.jbi.2019.103363 |
Cover
| Summary: | [Display omitted]
•Coarse code granularity, erroneous data entry and poor generalizability may influence the performance of phenotyping algorithms.•Vocabulary-driven methods for concept sets creation shows advantages in improving the accuracy for phenotyping.•Observational Health Data Sciences and Informatics (OHDSI) OMOP Common Data Model facilitate phenotype generalizability and consistency.•More data is not necessarily better: performance of a diagnosis-based chronic kidney failure algorithm is not improved by adding other codes indirectly related to chronic kidney disorder.
Algorithms for identifying patients of interest from observational data must address missing and inaccurate data and are desired to achieve comparable performance on both administrative claims and electronic health records data. However, administrative claims data do not contain the necessary information to develop accurate algorithms for disorders that require laboratory results, and this omission can result in insensitive diagnostic code-based algorithms. In this paper, we tested our assertion that the performance of a diagnosis code-based algorithm for chronic kidney disorder (CKD) can be improved by adding other codes indirectly related to CKD (e.g., codes for dialysis, kidney transplant, suspicious kidney disorders). Following the best practices from Observational Health Data Sciences and Informatics (OHDSI), we adapted an electronic health record-based gold standard algorithm for CKD and then created algorithms that can be executed on administrative claims data and account for related data quality issues. We externally validated our algorithms on four electronic health record datasets in the OHDSI network. Compared to the algorithm that uses CKD diagnostic codes only, positive predictive value of the algorithms that use additional codes was slightly increased (47.4% vs. 47.9–48.5% respectively). The algorithms adapted from the gold standard algorithm can be used to infer chronic kidney disorder based on administrative claims data. We succeeded in improving the generalizability and consistency of the CKD phenotypes by using data and vocabulary standardized across the OHDSI network, although performance variability across datasets remains. We showed that identifying and addressing coding and data heterogeneity can improve the performance of the algorithms. |
|---|---|
| Bibliography: | ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 23 Anna Ostropolets: Conceptualization, Methodology, Algorithm, Data Collection and Analysis, Writing- Original draft preparation and revision Christian Reich and Patrick Ryan: Conceptualization, Results Validation, Discussion, Methodology, Writing- Reviewing and Editing Author Contributions George Hripcsak, Chunhua Weng: Co-Supervision, Conceptualization, Discussion, Supervision, Writing- Reviewing and Editing Ning Shang: Phenotyping Knowledge Resource Provision, Writing- Reviewing |
| ISSN: | 1532-0464 1532-0480 1532-0480 |
| DOI: | 10.1016/j.jbi.2019.103363 |