Validation of algorithms in studies based on routinely collected health data: general principles

Abstract Clinicians, researchers, regulators, and other decision-makers increasingly rely on evidence from real-world data (RWD), including data routinely accumulating in health and administrative databases. RWD studies often rely on algorithms to operationalize variable definitions. An algorithm is...

Full description

Saved in:

Bibliographic Details
Published in	American journal of epidemiology Vol. 193; no. 11; pp. 1612 - 1624
Main Authors	Ehrenstein, Vera, Hellfritzsch, Maja, Kahlert, Johnny, Langan, Sinéad M, Urushihara, Hisashi, Marinac-Dabic, Danica, Lund, Jennifer L, Sørensen, Henrik Toft, Benchimol, Eric I
Format	Journal Article
Language	English
Published	United States Oxford University Press 04.11.2024 Oxford Publishing Limited (England)
Subjects	Algorithms Data Collection - methods Data Collection - standards Data sources Databases, Factual - standards Epidemiology Humans Reproducibility of Results Terminology Validation Studies as Topic Validity algorithms data quality measurement error routinely collected health data real-world data validity information bias misclassification
Online Access	Get full text
ISSN	0002-9262 1476-6256 1476-6256
DOI	10.1093/aje/kwae071

Cover

More Information
Summary:	Abstract Clinicians, researchers, regulators, and other decision-makers increasingly rely on evidence from real-world data (RWD), including data routinely accumulating in health and administrative databases. RWD studies often rely on algorithms to operationalize variable definitions. An algorithm is a combination of codes or concepts used to identify persons with a specific health condition or characteristic. Establishing the validity of algorithms is a prerequisite for generating valid study findings that can ultimately inform evidence-based health care. In this paper, we aim to systematize terminology, methods, and practical considerations relevant to the conduct of validation studies of RWD-based algorithms. We discuss measures of algorithm accuracy, gold/reference standards, study size, prioritization of accuracy measures, algorithm portability, and implications for interpretation. Information bias is common in epidemiologic studies, underscoring the importance of transparency in decisions regarding choice and prioritizing measures of algorithm validity. The validity of an algorithm should be judged in the context of a data source, and one size does not fit all. Prioritizing validity measures within a given data source depends on the role of a given variable in the analysis (eligibility criterion, exposure, outcome, or covariate). Validation work should be part of routine maintenance of RWD sources. This article is part of a Special Collection on Pharmacoepidemiology.
Bibliography:	ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14 content type line 23
ISSN:	0002-9262 1476-6256 1476-6256
DOI:	10.1093/aje/kwae071