Privacy-preserving and robust watermarking on sequential genome data using belief propagation and local differential privacy

Abstract Motivation Genome data is a subject of study for both biology and computer science since the start of the Human Genome Project in 1990. Since then, genome sequencing for medical and social purposes becomes more and more available and affordable. Genome data can be shared on public websites...

Full description

Saved in:

Bibliographic Details
Published in	Bioinformatics (Oxford, England) Vol. 37; no. 17; pp. 2668 - 2674
Main Authors	Öksüz, Abdullah Çağlar, Ayday, Erman, Güdükbay, Uğur
Format	Journal Article
Language	English
Published	England Oxford University Press 09.09.2021 Oxford Publishing Limited (England)
Subjects	Availability Bioinformatics Collusion Embedding Gene frequency Gene sequencing Genomes Human Genome Project Liability Linkage disequilibrium Original Papers Phenotypes Privacy Robustness Watermarking
Online Access	Get full text
ISSN	1367-4803 1367-4811 1367-4811
DOI	10.1093/bioinformatics/btab128

Cover

More Information
Summary:	Abstract Motivation Genome data is a subject of study for both biology and computer science since the start of the Human Genome Project in 1990. Since then, genome sequencing for medical and social purposes becomes more and more available and affordable. Genome data can be shared on public websites or with service providers (SPs). However, this sharing compromises the privacy of donors even under partial sharing conditions. We mainly focus on the liability aspect ensued by the unauthorized sharing of these genome data. One of the techniques to address the liability issues in data sharing is the watermarking mechanism. Results To detect malicious correspondents and SPs—whose aim is to share genome data without individuals’ consent and undetected—, we propose a novel watermarking method on sequential genome data using belief propagation algorithm. In our method, we have two criteria to satisfy. (i) Embedding robust watermarks so that the malicious adversaries cannot temper the watermark by modification and are identified with high probability. (ii) Achieving ϵ-local differential privacy in all data sharings with SPs. For the preservation of system robustness against single SP and collusion attacks, we consider publicly available genomic information like Minor Allele Frequency, Linkage Disequilibrium, Phenotype Information and Familial Information. Our proposed scheme achieves 100% detection rate against the single SP attacks with only 3% watermark length. For the worst case scenario of collusion attacks (50% of SPs are malicious), 80% detection is achieved with 5% watermark length and 90% detection is achieved with 10% watermark length. For all cases, the impact of ϵ on precision remained negligible and high privacy is ensured. Availability and implementation https://github.com/acoksuz/PPRW\_SGD\_BPLDP Supplementary information Supplementary data are available at Bioinformatics online.
Bibliography:	ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14 content type line 23
ISSN:	1367-4803 1367-4811 1367-4811
DOI:	10.1093/bioinformatics/btab128