Summarizing Large-Scale Database Schema Using Community Detection

Schema summarization on large-scale databases is a challenge. In a typical large database schema, a great proportion of the tables are closely connected through a few high degree tables. It is thus difficult to separate these tables into clusters that represent different topics. Moreover, as a schem...

Full description

Saved in:

Bibliographic Details
Published in	Journal of computer science and technology Vol. 27; no. 3; pp. 515 - 526
Main Author	王雪周烜王珊
Format	Journal Article
Language	English
Published	Boston Springer US 01.01.2012 Springer Nature B.V Key Laboratory of Data Engineering and Knowledge Engineering,Renmin University of China,Beijing 100872,China School of Information,Renmin University of China,Beijing 100872,China%Key Laboratory of Data Engineering and Knowledge Engineering,Renmin University of China,Beijing 100872,China%School of Information,Renmin University of China,Beijing 100872,China
Subjects	Algorithms Amusement rides Artificial Intelligence Clusters Communities Computer Science Data Structures and Information Theory Football Information Systems Applications (incl.Internet) Navigation R&D Regular Paper Research & development Science Social networks Software Engineering Summaries Tables Tables (data) Theory of Computation Usability 大型数据库大比例层次结构架构检测技术用户数据库网络社区高度表 schema large scale summarization community detection
Online Access	Get full text
ISSN	1000-9000 1860-4749
DOI	10.1007/s11390-012-1240-1

Cover

More Information
Summary:	Schema summarization on large-scale databases is a challenge. In a typical large database schema, a great proportion of the tables are closely connected through a few high degree tables. It is thus difficult to separate these tables into clusters that represent different topics. Moreover, as a schema can be very big, the schema summary needs to be structured into multiple levels, to further improve the usability. In this paper, we introduce a new schema summarization approach utilizing the techniques of community detection in social networks. Our approach contains three steps. First, we use a community detection algorithm to divide a database schema into subject groups, each representing a specific subject. Second, we cluster the subject groups into abstract domains to form a multi-level navigation structure. Third, we discover representative tables in each cluster to label the schema summary. We evaluate our approach on Freebase, a real world large-scale database. The results show that our approach can identify subject groups precisely. The generated abstract schema layers are very helpful for users to explore database.
Bibliography:	Schema summarization on large-scale databases is a challenge. In a typical large database schema, a great proportion of the tables are closely connected through a few high degree tables. It is thus difficult to separate these tables into clusters that represent different topics. Moreover, as a schema can be very big, the schema summary needs to be structured into multiple levels, to further improve the usability. In this paper, we introduce a new schema summarization approach utilizing the techniques of community detection in social networks. Our approach contains three steps. First, we use a community detection algorithm to divide a database schema into subject groups, each representing a specific subject. Second, we cluster the subject groups into abstract domains to form a multi-level navigation structure. Third, we discover representative tables in each cluster to label the schema summary. We evaluate our approach on Freebase, a real world large-scale database. The results show that our approach can identify subject groups precisely. The generated abstract schema layers are very helpful for users to explore database. 11-2296/TP Xue Wang, Xuan Zhou, and Shan Wang, Senior Member, CCF, Member, ACM（1School of Information, Renmin University of China, Beijing 100872, China 2Key Laboratory of Data Engineering and Knowledge Engineering, Renmin University of China, Beijing 100872, China） schema, summarization, large scale, community detection ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14 ObjectType-Article-2 ObjectType-Feature-1 content type line 23
ISSN:	1000-9000 1860-4749
DOI:	10.1007/s11390-012-1240-1