Efficient and Privacy-Preserving k-Means Clustering for Big Data Mining

Recent advances in sensing and storing technologies have led to big data age where a huge amount of data are distributed across sites to be stored and analysed. Indeed, cluster analysis is one of the data mining tasks that aims to discover patterns and knowledge through different algorithmic techniq...

Full description

Saved in:

Bibliographic Details
Published in	2016 IEEE Trustcom/BigDataSE/ISPA pp. 791 - 798
Main Authors	Gheid, Zakaria, Challal, Yacine
Format	Conference Proceeding
Language	English
Published	IEEE 01.08.2016
Subjects	Big data Clustering algorithms Data privacy Distributed databases efficiency horizontally partitioned data k-means clustering privacy Protocols Security
Online Access	Get full text
ISSN	2324-9013
DOI	10.1109/TrustCom.2016.0140

Cover

More Information
Summary:	Recent advances in sensing and storing technologies have led to big data age where a huge amount of data are distributed across sites to be stored and analysed. Indeed, cluster analysis is one of the data mining tasks that aims to discover patterns and knowledge through different algorithmic techniques such as k-means. Nevertheless, running k-means over distributed big data stores has given rise to serious privacy issues. Accordingly, many proposed works attempted to tackle this concern using cryptographic protocols. However, these cryptographic solutions introduced performance degradation issues in analysis tasks which does not meet big data properties. In this work, we propose a novel privacy-preserving k-means algorithm based on a simple yet secure and efficient multiparty additive scheme that is cryptography-free. We designed our solution for horizontally partitioned data. Moreover, we demonstrate that our scheme resists against adversaries passive model.
ISSN:	2324-9013
DOI:	10.1109/TrustCom.2016.0140