Post‐selection inference for changepoint detection algorithms with application to copy number variation data

Changepoint detection methods are used in many areas of science and engineering, for example, in the analysis of copy number variation data to detect abnormalities in copy numbers along the genome. Despite the broad array of available tools, methodology for quantifying our uncertainty in the strengt...

Full description

Saved in:
Bibliographic Details
Published inBiometrics Vol. 77; no. 3; pp. 1037 - 1049
Main Authors Hyun, Sangwon, Lin, Kevin Z., G'Sell, Max, Tibshirani, Ryan J.
Format Journal Article
LanguageEnglish
Published United States Blackwell Publishing Ltd 01.09.2021
Subjects
Online AccessGet full text
ISSN0006-341X
1541-0420
1541-0420
DOI10.1111/biom.13422

Cover

More Information
Summary:Changepoint detection methods are used in many areas of science and engineering, for example, in the analysis of copy number variation data to detect abnormalities in copy numbers along the genome. Despite the broad array of available tools, methodology for quantifying our uncertainty in the strength (or the presence) of given changepoints post‐selection are lacking. Post‐selection inference offers a framework to fill this gap, but the most straightforward application of these methods results in low‐powered hypothesis tests and leaves open several important questions about practical usability. In this work, we carefully tailor post‐selection inference methods toward changepoint detection, focusing on copy number variation data. To accomplish this, we study commonly used changepoint algorithms: binary segmentation, as well as two of its most popular variants, wild and circular, and the fused lasso. We implement some of the latest developments in post‐selection inference theory, mainly auxiliary randomization. This improves the power, which requires implementations of Markov chain Monte Carlo algorithms (importance sampling and hit‐and‐run sampling) to carry out our tests. We also provide recommendations for improving practical useability, detailed simulations, and example analyses on array comparative genomic hybridization as well as sequencing data.
Bibliography:ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 14
content type line 23
ISSN:0006-341X
1541-0420
1541-0420
DOI:10.1111/biom.13422