Local synthesis for disclosure limitation that satisfies probabilistic k -anonymity criterion

Before releasing databases which contain sensitive information about individuals, data publishers must apply Statistical Disclosure Limitation (SDL) methods to them, in order to avoid disclosure of sensitive information on any identifiable data subject. SDL methods often consist of masking or synthe...

Full description

Saved in:
Bibliographic Details
Published inTransactions on data privacy Vol. 10; no. 1; pp. 61 - 81
Main Authors Oganian, Anna, Domingo-Ferrer, Josep
Format Journal Article
LanguageEnglish
Published Spain 01.04.2017
Subjects
Online AccessGet full text
ISSN1888-5063
2013-1631

Cover

Abstract Before releasing databases which contain sensitive information about individuals, data publishers must apply Statistical Disclosure Limitation (SDL) methods to them, in order to avoid disclosure of sensitive information on any identifiable data subject. SDL methods often consist of masking or synthesizing the original data records in such a way as to minimize the risk of disclosure of the sensitive information while providing data users with accurate information about the population of interest. In this paper we propose a new scheme for disclosure limitation, based on the idea of of data. Our approach is predicated on model-based clustering. The proposed method satisfies the requirements of -anonymity; in particular we use a variant of the -anonymity privacy model, namely probabilistic -anonymity, by incorporating constraints on cluster cardinality. Regarding data utility, for continuous attributes, we exactly preserve means and covariances of the original data, while approximately preserving higher-order moments and analyses on subdomains (defined by clusters and cluster combinations). For both continuous and categorical data, our experiments with medical data sets show that, from the point of view of data utility, local synthesis compares very favorably with other methods of disclosure limitation including the sequential regression approach for synthetic data generation.
AbstractList Before releasing databases which contain sensitive information about individuals, data publishers must apply Statistical Disclosure Limitation (SDL) methods to them, in order to avoid disclosure of sensitive information on any identifiable data subject. SDL methods often consist of masking or synthesizing the original data records in such a way as to minimize the risk of disclosure of the sensitive information while providing data users with accurate information about the population of interest. In this paper we propose a new scheme for disclosure limitation, based on the idea of local synthesis of data. Our approach is predicated on model-based clustering. The proposed method satisfies the requirements of k-anonymity; in particular we use a variant of the k-anonymity privacy model, namely probabilistic k-anonymity, by incorporating constraints on cluster cardinality. Regarding data utility, for continuous attributes, we exactly preserve means and covariances of the original data, while approximately preserving higher-order moments and analyses on subdomains (defined by clusters and cluster combinations). For both continuous and categorical data, our experiments with medical data sets show that, from the point of view of data utility, local synthesis compares very favorably with other methods of disclosure limitation including the sequential regression approach for synthetic data generation.
Before releasing databases which contain sensitive information about individuals, data publishers must apply Statistical Disclosure Limitation (SDL) methods to them, in order to avoid disclosure of sensitive information on any identifiable data subject. SDL methods often consist of masking or synthesizing the original data records in such a way as to minimize the risk of disclosure of the sensitive information while providing data users with accurate information about the population of interest. In this paper we propose a new scheme for disclosure limitation, based on the idea of of data. Our approach is predicated on model-based clustering. The proposed method satisfies the requirements of -anonymity; in particular we use a variant of the -anonymity privacy model, namely probabilistic -anonymity, by incorporating constraints on cluster cardinality. Regarding data utility, for continuous attributes, we exactly preserve means and covariances of the original data, while approximately preserving higher-order moments and analyses on subdomains (defined by clusters and cluster combinations). For both continuous and categorical data, our experiments with medical data sets show that, from the point of view of data utility, local synthesis compares very favorably with other methods of disclosure limitation including the sequential regression approach for synthetic data generation.
Author Domingo-Ferrer, Josep
Oganian, Anna
Author_xml – sequence: 1
  givenname: Anna
  surname: Oganian
  fullname: Oganian, Anna
  organization: National Center for Health Statistics, 3311 Toledo Rd, Hyattsville, MD 20782, USA and Georgia Southern University, P.O. Box 8093, Statesboro, GA 30460, USA
– sequence: 2
  givenname: Josep
  surname: Domingo-Ferrer
  fullname: Domingo-Ferrer, Josep
  organization: UNESCO Chair in Data Privacy, Universitat Rovira i Virgili, Department of Computer Engineering and Maths, Av. Països Catalans 26, E-43007 Tarragona, Catalonia
BackLink https://www.ncbi.nlm.nih.gov/pubmed/31555393$$D View this record in MEDLINE/PubMed
BookMark eNpVkFtLxDAQhYOsuOu6f0HyBwpJY24vgixeFhZ80UcpudWOdpuSZIX-ewte0Hk5A-ecj2HO0WKIQzhBq5pQVlHB6AKtqFKq4kSwJdrk_EbmYVpqQc_QklHOOdNshV720Zke52koXciQcRsT9pBdH_MxBdzDAYopEAdcOlNwnvfcQsh4TNEaCz3kAg6_48rMR0xzesIuQQlp7lyg09b0OWy-dY2e726ftg_V_vF-t73ZV2PNWKkMbanlnhPvedBWEmWYUk4QF2runVWESK61bhWxnIlaWyWvpPU2tNQ55dgaXX9xx6M9BO_CUJLpmzHBwaSpiQaa_84AXfMaPxohBdFEzoDLv4Df5s-j2Cf0N2w8
ContentType Journal Article
DBID NPM
5PM
DatabaseName PubMed
PubMed Central (Full Participant titles)
DatabaseTitle PubMed
DatabaseTitleList
PubMed
Database_xml – sequence: 1
  dbid: NPM
  name: PubMed
  url: https://proxy.k.utb.cz/login?url=http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=PubMed
  sourceTypes: Index Database
DeliveryMethod fulltext_linktorsrc
Discipline Computer Science
EISSN 2013-1631
EndPage 81
ExternalDocumentID PMC6760907
31555393
Genre Journal Article
GrantInformation_xml – fundername: Intramural CDC HHS
  grantid: CC999999
GroupedDBID NPM
29Q
2WC
5PM
5VS
AAKPC
ADDVE
ALMA_UNASSIGNED_HOLDINGS
E3Z
IPNFZ
J9A
KQ8
OK1
RIG
TR2
ID FETCH-LOGICAL-p233t-a1f1b5d50dd5e9b708a388c60ce25dcb80075999f80b53629b8747bdbef1cc8c3
ISSN 1888-5063
IngestDate Tue Sep 30 16:50:18 EDT 2025
Sat May 31 02:11:33 EDT 2025
IsPeerReviewed false
IsScholarly true
Issue 1
Keywords probabilistic k-anonymity
Expectation-Maximization (EM) algorithm
mixture model
synthetic data
Statistical Disclosure Limitation (SDL)
Language English
LinkModel OpenURL
MergedId FETCHMERGED-LOGICAL-p233t-a1f1b5d50dd5e9b708a388c60ce25dcb80075999f80b53629b8747bdbef1cc8c3
PMID 31555393
PageCount 21
ParticipantIDs pubmedcentral_primary_oai_pubmedcentral_nih_gov_6760907
pubmed_primary_31555393
PublicationCentury 2000
PublicationDate 2017-Apr
20170401
PublicationDateYYYYMMDD 2017-04-01
PublicationDate_xml – month: 04
  year: 2017
  text: 2017-Apr
PublicationDecade 2010
PublicationPlace Spain
PublicationPlace_xml – name: Spain
PublicationTitle Transactions on data privacy
PublicationTitleAlternate Trans Data Priv
PublicationYear 2017
SSID ssj0000397961
Score 2.066281
Snippet Before releasing databases which contain sensitive information about individuals, data publishers must apply Statistical Disclosure Limitation (SDL) methods to...
SourceID pubmedcentral
pubmed
SourceType Open Access Repository
Index Database
StartPage 61
Title Local synthesis for disclosure limitation that satisfies probabilistic k -anonymity criterion
URI https://www.ncbi.nlm.nih.gov/pubmed/31555393
https://pubmed.ncbi.nlm.nih.gov/PMC6760907
Volume 10
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
journalDatabaseRights – providerCode: PRVAFT
  databaseName: Open Access Digital Library
  customDbUrl:
  eissn: 2013-1631
  dateEnd: 99991231
  omitProxy: true
  ssIdentifier: ssj0000397961
  issn: 1888-5063
  databaseCode: KQ8
  dateStart: 20080101
  isFulltext: true
  titleUrlDefault: http://grweb.coalliance.org/oadl/oadl.html
  providerName: Colorado Alliance of Research Libraries
link http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwnV1Lb9QwELagB9QL70fLQz5wWwU5cew4R1SoKmBLkVqpB6TKr1Vb2t1VSZHg1zNjO04iOEAvVuRNnJU_Z2Y8nvmGkNfSmYaVysEnbtqiVlIWoBXrotKuLLlmTRXomub7cu-o_nAsjvua7Sm7pDNv7K-_5pXcBFXoA1wxS_Y_kM2DQgdcA77QAsLQ_hPGn1Yhp_HnEqw4JBbBkEFMs71Yod9vdoHJSxHg7lR3s-8hkwG2xhiVZQK1LrI0z74VOjgB0CAHIYLszQms85FCixkQ4XQBw0phjLMf2maP_GfM6dSJkGCZhf271SUox1Wx669S4exw5jB2NoACG2JUonyEDXMhWJJJPvRVWCACzLpyIlTZH4snSshIvd7r2nIsxWFy15cBKg5WjuAtH5RUDh08mO_IRrIWGQRu8xKLVnz8orJbjeE5ZWTJ7f_qJrnTjzcyM6YhsCOb4vA-uZs2A_RtRPYBueWXD8m9vtAGTXL3EfkagKYZaApA0wFoOgBNEWiagaYToOkIaJqBfkyOdt8f7uwVqSxGsa447wpdLkojnGDOCd_Cp6Y0V8pKZn0lnDUKzUCw-xeKGQH2SWsU7BmNM35RWqssf0I24GX-GaG-dm3jGBfNoqoVPOudrK3hUgtmXeu2yNM4TSfryH1y0s_kFmkmE5hvQLry6S_Ls9NAW55g277xk8_J5rAmX5CN7uravwSTsDOvwiKAdv9g_hu4VW-K
linkProvider Colorado Alliance of Research Libraries
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Local+synthesis+for+disclosure+limitation+that+satisfies+probabilistic+k-anonymity+criterion&rft.jtitle=Transactions+on+data+privacy&rft.au=Oganian%2C+Anna&rft.au=Domingo-Ferrer%2C+Josep&rft.date=2017-04-01&rft.issn=1888-5063&rft.eissn=2013-1631&rft.volume=10&rft.issue=1&rft.spage=61&rft.epage=81&rft_id=info%3Apmid%2F31555393&rft.externalDocID=PMC6760907
thumbnail_l http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=1888-5063&client=summon
thumbnail_m http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=1888-5063&client=summon
thumbnail_s http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=1888-5063&client=summon