Validation of algorithms in studies based on routinely collected health data: general principles

Abstract Clinicians, researchers, regulators, and other decision-makers increasingly rely on evidence from real-world data (RWD), including data routinely accumulating in health and administrative databases. RWD studies often rely on algorithms to operationalize variable definitions. An algorithm is...

Full description

Saved in:
Bibliographic Details
Published inAmerican journal of epidemiology Vol. 193; no. 11; pp. 1612 - 1624
Main Authors Ehrenstein, Vera, Hellfritzsch, Maja, Kahlert, Johnny, Langan, Sinéad M, Urushihara, Hisashi, Marinac-Dabic, Danica, Lund, Jennifer L, Sørensen, Henrik Toft, Benchimol, Eric I
Format Journal Article
LanguageEnglish
Published United States Oxford University Press 04.11.2024
Oxford Publishing Limited (England)
Subjects
Online AccessGet full text
ISSN0002-9262
1476-6256
1476-6256
DOI10.1093/aje/kwae071

Cover

More Information
Summary:Abstract Clinicians, researchers, regulators, and other decision-makers increasingly rely on evidence from real-world data (RWD), including data routinely accumulating in health and administrative databases. RWD studies often rely on algorithms to operationalize variable definitions. An algorithm is a combination of codes or concepts used to identify persons with a specific health condition or characteristic. Establishing the validity of algorithms is a prerequisite for generating valid study findings that can ultimately inform evidence-based health care. In this paper, we aim to systematize terminology, methods, and practical considerations relevant to the conduct of validation studies of RWD-based algorithms. We discuss measures of algorithm accuracy, gold/reference standards, study size, prioritization of accuracy measures, algorithm portability, and implications for interpretation. Information bias is common in epidemiologic studies, underscoring the importance of transparency in decisions regarding choice and prioritizing measures of algorithm validity. The validity of an algorithm should be judged in the context of a data source, and one size does not fit all. Prioritizing validity measures within a given data source depends on the role of a given variable in the analysis (eligibility criterion, exposure, outcome, or covariate). Validation work should be part of routine maintenance of RWD sources. This article is part of a Special Collection on Pharmacoepidemiology.
Bibliography:ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 14
content type line 23
ISSN:0002-9262
1476-6256
1476-6256
DOI:10.1093/aje/kwae071