The validity of causal claims with repeated measures designs: A within-study comparison evaluation of differences-in-differences and the comparative interrupted time series

Modern policies are commonly evaluated not with randomized experiments but with repeated measures designs like difference-in-differences (DID) and the comparative interrupted time series (CITS). The key benefit of these designs is that they control for unobserved confounders that are fixed over time...

Full description

Saved in:
Bibliographic Details
Published inEvaluation review Vol. 47; no. 5; pp. 895 - 931
Main Authors Anglin, Kylie L, Wong, Vivian C, Wing, Coady, Miller-Bains, Kate, McConeghy, Kevin
Format Journal Article
LanguageEnglish
Published Los Angeles, CA SAGE Publications 01.10.2023
SAGE PUBLICATIONS, INC
Subjects
Online AccessGet full text
ISSN0193-841X
1552-3926
1552-3926
DOI10.1177/0193841X231167672

Cover

More Information
Summary:Modern policies are commonly evaluated not with randomized experiments but with repeated measures designs like difference-in-differences (DID) and the comparative interrupted time series (CITS). The key benefit of these designs is that they control for unobserved confounders that are fixed over time. However, DID and CITS designs only result in unbiased impact estimates when the model assumptions are consistent with the data at hand. In this paper, we empirically test whether the assumptions of repeated measures designs are met in field settings. Using a within-study comparison design, we compare experimental estimates of the impact of patient-directed care on medical expenditures to non-experimental DID and CITS estimates for the same target population and outcome. Our data come from a multi-site experiment that includes participants receiving Medicaid in Arkansas, Florida, and New Jersey. We present summary measures of repeated measures bias across three states, four comparison groups, two model specifications, and two outcomes. We find that, on average, bias resulting from repeated measures designs are very close to zero (less than 0.01 standard deviations; SDs). Further, we find that comparison groups which have pre-treatment trends that are visibly parallel to the treatment group result in less bias than those with visibly divergent trends. However, CITS models that control for baseline trends produced slightly more bias and were less precise than DID models that only control for baseline means. Overall, we offer optimistic evidence in favor of repeated measures designs when randomization is not feasible.
Bibliography:ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 14
content type line 23
ISSN:0193-841X
1552-3926
1552-3926
DOI:10.1177/0193841X231167672