Policy-Based Access Control System for Delta Lake

Delta lake is a new generation of data storage solutions. It stores both transaction log and data files in one directory, and provides ACID transactions, scalable metadata handling, and unifies streaming and batch data processing on top of existing data lakes, such as S3, ADLS, GCS, and HDFS. Differ...

Full description

Saved in:
Bibliographic Details
Published in2022 Tenth International Conference on Advanced Cloud and Big Data (CBD) pp. 60 - 65
Main Authors Chen, Zhe, Shao, Hangyu, Li, Yuping, Lu, Hongru, Jin, Jiahui
Format Conference Proceeding
LanguageEnglish
Published IEEE 01.11.2022
Subjects
Online AccessGet full text
DOI10.1109/CBD58033.2022.00020

Cover

More Information
Summary:Delta lake is a new generation of data storage solutions. It stores both transaction log and data files in one directory, and provides ACID transactions, scalable metadata handling, and unifies streaming and batch data processing on top of existing data lakes, such as S3, ADLS, GCS, and HDFS. Different from data warehouses, delta lakes allow data to be stored in the original format, retain complete data information, and provide efficient and low-cost storage solutions for data computing and analysis businesses. However, Since Delta Lake metadata is scattered in different resource files, the lack of a unified metadata view increases the difficulty of data governance. Also, Delta Lake adopts an open source storage system as the underlying storage, and its basic access control does not isolate different users, which may lead the risk of data leakage. At present, most common storage systems use data tables' row and column fields for access control, while delta lake treats the file group as an object. In this paper, aiming at the difficulty of data governance, we design a data lake metadata management method to achieve unified and efficient management of metadata information in heterogeneous data. Then, we design a policy-based data lake access control mechanism, combined with the open source permission framework, and complete the access request for different users and roles in Delta Lake.
DOI:10.1109/CBD58033.2022.00020