Local synthesis for disclosure limitation that satisfies probabilistic k -anonymity criterion
Before releasing databases which contain sensitive information about individuals, data publishers must apply Statistical Disclosure Limitation (SDL) methods to them, in order to avoid disclosure of sensitive information on any identifiable data subject. SDL methods often consist of masking or synthe...
Saved in:
| Published in | Transactions on data privacy Vol. 10; no. 1; pp. 61 - 81 |
|---|---|
| Main Authors | , |
| Format | Journal Article |
| Language | English |
| Published |
Spain
01.04.2017
|
| Subjects | |
| Online Access | Get full text |
| ISSN | 1888-5063 2013-1631 |
Cover
| Abstract | Before releasing databases which contain sensitive information about individuals, data publishers must apply Statistical Disclosure Limitation (SDL) methods to them, in order to avoid disclosure of sensitive information on any identifiable data subject. SDL methods often consist of masking or synthesizing the original data records in such a way as to minimize the risk of disclosure of the sensitive information while providing data users with accurate information about the population of interest. In this paper we propose a new scheme for disclosure limitation, based on the idea of
of data. Our approach is predicated on model-based clustering. The proposed method satisfies the requirements of
-anonymity; in particular we use a variant of the
-anonymity privacy model, namely probabilistic
-anonymity, by incorporating constraints on cluster cardinality. Regarding data utility, for continuous attributes, we exactly preserve means and covariances of the original data, while approximately preserving higher-order moments and analyses on subdomains (defined by clusters and cluster combinations). For both continuous and categorical data, our experiments with medical data sets show that, from the point of view of data utility, local synthesis compares very favorably with other methods of disclosure limitation including the sequential regression approach for synthetic data generation. |
|---|---|
| AbstractList | Before releasing databases which contain sensitive information about individuals, data publishers must apply Statistical Disclosure Limitation (SDL) methods to them, in order to avoid disclosure of sensitive information on any identifiable data subject. SDL methods often consist of masking or synthesizing the original data records in such a way as to minimize the risk of disclosure of the sensitive information while providing data users with accurate information about the population of interest. In this paper we propose a new scheme for disclosure limitation, based on the idea of local synthesis of data. Our approach is predicated on model-based clustering. The proposed method satisfies the requirements of k-anonymity; in particular we use a variant of the k-anonymity privacy model, namely probabilistic k-anonymity, by incorporating constraints on cluster cardinality. Regarding data utility, for continuous attributes, we exactly preserve means and covariances of the original data, while approximately preserving higher-order moments and analyses on subdomains (defined by clusters and cluster combinations). For both continuous and categorical data, our experiments with medical data sets show that, from the point of view of data utility, local synthesis compares very favorably with other methods of disclosure limitation including the sequential regression approach for synthetic data generation. Before releasing databases which contain sensitive information about individuals, data publishers must apply Statistical Disclosure Limitation (SDL) methods to them, in order to avoid disclosure of sensitive information on any identifiable data subject. SDL methods often consist of masking or synthesizing the original data records in such a way as to minimize the risk of disclosure of the sensitive information while providing data users with accurate information about the population of interest. In this paper we propose a new scheme for disclosure limitation, based on the idea of of data. Our approach is predicated on model-based clustering. The proposed method satisfies the requirements of -anonymity; in particular we use a variant of the -anonymity privacy model, namely probabilistic -anonymity, by incorporating constraints on cluster cardinality. Regarding data utility, for continuous attributes, we exactly preserve means and covariances of the original data, while approximately preserving higher-order moments and analyses on subdomains (defined by clusters and cluster combinations). For both continuous and categorical data, our experiments with medical data sets show that, from the point of view of data utility, local synthesis compares very favorably with other methods of disclosure limitation including the sequential regression approach for synthetic data generation. |
| Author | Domingo-Ferrer, Josep Oganian, Anna |
| Author_xml | – sequence: 1 givenname: Anna surname: Oganian fullname: Oganian, Anna organization: National Center for Health Statistics, 3311 Toledo Rd, Hyattsville, MD 20782, USA and Georgia Southern University, P.O. Box 8093, Statesboro, GA 30460, USA – sequence: 2 givenname: Josep surname: Domingo-Ferrer fullname: Domingo-Ferrer, Josep organization: UNESCO Chair in Data Privacy, Universitat Rovira i Virgili, Department of Computer Engineering and Maths, Av. Països Catalans 26, E-43007 Tarragona, Catalonia |
| BackLink | https://www.ncbi.nlm.nih.gov/pubmed/31555393$$D View this record in MEDLINE/PubMed |
| BookMark | eNpVkFtLxDAQhYOsuOu6f0HyBwpJY24vgixeFhZ80UcpudWOdpuSZIX-ewte0Hk5A-ecj2HO0WKIQzhBq5pQVlHB6AKtqFKq4kSwJdrk_EbmYVpqQc_QklHOOdNshV720Zke52koXciQcRsT9pBdH_MxBdzDAYopEAdcOlNwnvfcQsh4TNEaCz3kAg6_48rMR0xzesIuQQlp7lyg09b0OWy-dY2e726ftg_V_vF-t73ZV2PNWKkMbanlnhPvedBWEmWYUk4QF2runVWESK61bhWxnIlaWyWvpPU2tNQ55dgaXX9xx6M9BO_CUJLpmzHBwaSpiQaa_84AXfMaPxohBdFEzoDLv4Df5s-j2Cf0N2w8 |
| ContentType | Journal Article |
| DBID | NPM 5PM |
| DatabaseName | PubMed PubMed Central (Full Participant titles) |
| DatabaseTitle | PubMed |
| DatabaseTitleList | PubMed |
| Database_xml | – sequence: 1 dbid: NPM name: PubMed url: https://proxy.k.utb.cz/login?url=http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=PubMed sourceTypes: Index Database |
| DeliveryMethod | fulltext_linktorsrc |
| Discipline | Computer Science |
| EISSN | 2013-1631 |
| EndPage | 81 |
| ExternalDocumentID | PMC6760907 31555393 |
| Genre | Journal Article |
| GrantInformation_xml | – fundername: Intramural CDC HHS grantid: CC999999 |
| GroupedDBID | NPM 29Q 2WC 5PM 5VS AAKPC ADDVE ALMA_UNASSIGNED_HOLDINGS E3Z IPNFZ J9A KQ8 OK1 RIG TR2 |
| ID | FETCH-LOGICAL-p233t-a1f1b5d50dd5e9b708a388c60ce25dcb80075999f80b53629b8747bdbef1cc8c3 |
| ISSN | 1888-5063 |
| IngestDate | Tue Sep 30 16:50:18 EDT 2025 Sat May 31 02:11:33 EDT 2025 |
| IsPeerReviewed | false |
| IsScholarly | true |
| Issue | 1 |
| Keywords | probabilistic k-anonymity Expectation-Maximization (EM) algorithm mixture model synthetic data Statistical Disclosure Limitation (SDL) |
| Language | English |
| LinkModel | OpenURL |
| MergedId | FETCHMERGED-LOGICAL-p233t-a1f1b5d50dd5e9b708a388c60ce25dcb80075999f80b53629b8747bdbef1cc8c3 |
| PMID | 31555393 |
| PageCount | 21 |
| ParticipantIDs | pubmedcentral_primary_oai_pubmedcentral_nih_gov_6760907 pubmed_primary_31555393 |
| PublicationCentury | 2000 |
| PublicationDate | 2017-Apr 20170401 |
| PublicationDateYYYYMMDD | 2017-04-01 |
| PublicationDate_xml | – month: 04 year: 2017 text: 2017-Apr |
| PublicationDecade | 2010 |
| PublicationPlace | Spain |
| PublicationPlace_xml | – name: Spain |
| PublicationTitle | Transactions on data privacy |
| PublicationTitleAlternate | Trans Data Priv |
| PublicationYear | 2017 |
| SSID | ssj0000397961 |
| Score | 2.066281 |
| Snippet | Before releasing databases which contain sensitive information about individuals, data publishers must apply Statistical Disclosure Limitation (SDL) methods to... |
| SourceID | pubmedcentral pubmed |
| SourceType | Open Access Repository Index Database |
| StartPage | 61 |
| Title | Local synthesis for disclosure limitation that satisfies probabilistic k -anonymity criterion |
| URI | https://www.ncbi.nlm.nih.gov/pubmed/31555393 https://pubmed.ncbi.nlm.nih.gov/PMC6760907 |
| Volume | 10 |
| hasFullText | 1 |
| inHoldings | 1 |
| isFullTextHit | |
| isPrint | |
| journalDatabaseRights | – providerCode: PRVAFT databaseName: Open Access Digital Library customDbUrl: eissn: 2013-1631 dateEnd: 99991231 omitProxy: true ssIdentifier: ssj0000397961 issn: 1888-5063 databaseCode: KQ8 dateStart: 20080101 isFulltext: true titleUrlDefault: http://grweb.coalliance.org/oadl/oadl.html providerName: Colorado Alliance of Research Libraries |
| link | http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwnV1Lb9QwELagB9QL70fLQz5wWwU5cew4R1SoKmBLkVqpB6TKr1Vb2t1VSZHg1zNjO04iOEAvVuRNnJU_Z2Y8nvmGkNfSmYaVysEnbtqiVlIWoBXrotKuLLlmTRXomub7cu-o_nAsjvua7Sm7pDNv7K-_5pXcBFXoA1wxS_Y_kM2DQgdcA77QAsLQ_hPGn1Yhp_HnEqw4JBbBkEFMs71Yod9vdoHJSxHg7lR3s-8hkwG2xhiVZQK1LrI0z74VOjgB0CAHIYLszQms85FCixkQ4XQBw0phjLMf2maP_GfM6dSJkGCZhf271SUox1Wx669S4exw5jB2NoACG2JUonyEDXMhWJJJPvRVWCACzLpyIlTZH4snSshIvd7r2nIsxWFy15cBKg5WjuAtH5RUDh08mO_IRrIWGQRu8xKLVnz8orJbjeE5ZWTJ7f_qJrnTjzcyM6YhsCOb4vA-uZs2A_RtRPYBueWXD8m9vtAGTXL3EfkagKYZaApA0wFoOgBNEWiagaYToOkIaJqBfkyOdt8f7uwVqSxGsa447wpdLkojnGDOCd_Cp6Y0V8pKZn0lnDUKzUCw-xeKGQH2SWsU7BmNM35RWqssf0I24GX-GaG-dm3jGBfNoqoVPOudrK3hUgtmXeu2yNM4TSfryH1y0s_kFmkmE5hvQLry6S_Ls9NAW55g277xk8_J5rAmX5CN7uravwSTsDOvwiKAdv9g_hu4VW-K |
| linkProvider | Colorado Alliance of Research Libraries |
| openUrl | ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Local+synthesis+for+disclosure+limitation+that+satisfies+probabilistic+k-anonymity+criterion&rft.jtitle=Transactions+on+data+privacy&rft.au=Oganian%2C+Anna&rft.au=Domingo-Ferrer%2C+Josep&rft.date=2017-04-01&rft.issn=1888-5063&rft.eissn=2013-1631&rft.volume=10&rft.issue=1&rft.spage=61&rft.epage=81&rft_id=info%3Apmid%2F31555393&rft.externalDocID=PMC6760907 |
| thumbnail_l | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=1888-5063&client=summon |
| thumbnail_m | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=1888-5063&client=summon |
| thumbnail_s | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=1888-5063&client=summon |