Large-scale factorization of type-constrained multi-relational data

The statistical modeling of large multi-relational datasets has increasingly gained attention in recent years. Typical applications involve large knowledge bases like DBpedia, Freebase, YAGO and the recently introduced Google Knowledge Graph that contain millions of entities, hundreds and thousands...

Full description

Saved in:
Bibliographic Details
Published inDSAA : 2014 International Conference on Data Science and Advanced Analytics : October 30, 2014-November 1, 2014 pp. 18 - 24
Main Authors Krompass, Denis, Nickel, Maximilian, Tresp, Volker
Format Conference Proceeding
LanguageEnglish
Published IEEE 01.10.2014
Subjects
Online AccessGet full text
DOI10.1109/DSAA.2014.7058046

Cover

More Information
Summary:The statistical modeling of large multi-relational datasets has increasingly gained attention in recent years. Typical applications involve large knowledge bases like DBpedia, Freebase, YAGO and the recently introduced Google Knowledge Graph that contain millions of entities, hundreds and thousands of relations, and billions of relational tuples. Collective factorization methods have been shown to scale up to these large multi-relational datasets, in particular in form of tensor approaches that can exploit the highly scalable alternating least squares (ALS) algorithms for calculating the factors. In this paper we extend the recently proposed state-of-the-art RESCAL tensor factorization to consider relational type-constraints. Relational type-constraints explicitly define the logic of relations by excluding entities from the subject or object role. In addition we will show that in absence of prior knowledge about type-constraints, local closed-world assumptions can be approximated for each relation by ignoring unobserved subject or object entities in a relation. In our experiments on representative large datasets (Cora, DBpedia), that contain up to millions of entities and hundreds of type-constrained relations, we show that the proposed approach is scalable. It further significantly outperforms RESCAL without type-constraints in both, runtime and prediction quality.
DOI:10.1109/DSAA.2014.7058046