Towards an Automatic Detection of Sensitive Information in a Database

In order to validate user requirements, tests are often conducted on real data. However, developments and tests are more and more outsourced, leading companies to provide external staff with real confidential data. A solution to this problem is known as Data Scrambling. Many algorithms aim at smartl...

Full description

Saved in:
Bibliographic Details
Published in2010 Second International Conference on Advances in Databases, Knowledge, and Data Applications pp. 247 - 252
Main Authors du Mouza, Cédric, Métais, Elisabeth, Lammari, Nadira, Akoka, Jacky, Aubonnet, Tatiana, Comyn-Wattiau, Isabelle, Fadili, Hammou, Cherfi, Samira Si-Saïd
Format Conference Proceeding
LanguageEnglish
Published IEEE 01.04.2010
Subjects
Online AccessGet full text
ISBN9781424460816
1424460816
DOI10.1109/DBKDA.2010.17

Cover

More Information
Summary:In order to validate user requirements, tests are often conducted on real data. However, developments and tests are more and more outsourced, leading companies to provide external staff with real confidential data. A solution to this problem is known as Data Scrambling. Many algorithms aim at smartly replacing true data by false but realistic ones. However, nothing has been developed to automate the crucial task of the detection of the data to be scrambled. In this paper we propose an innovative approach - and its implementation as an expert system - to achieve the automatic detection of the candidate attributes for scrambling. Our approach is mainly based on semantic rules that determine which concepts have to be scrambled, and on a linguistic component that retrieves the attributes that semantically correspond to these concepts. Since attributes can not be considered independently from each other we also address the challenging problem of the propagation of the scrambling among the whole database. An important contribution of our approach is to provide a semantic modelling of sensitive data. This knowledge is made available through production rules, operationalizing the sensitive data detection.
ISBN:9781424460816
1424460816
DOI:10.1109/DBKDA.2010.17