PresQ: Discovery of Multidimensional Equally-Distributed Dependencies Via Quasi-Cliques on Hypergraphs

Cross-matching data stored on separate files is an everyday activity in the scientific domain. However, sometimes the relation between attributes may not be obvious. The discovery of foreign keys on relational databases is a similar problem. Thus techniques devised for this problem can be adapted. N...

Full description

Saved in:
Bibliographic Details
Published inIEEE transactions on emerging topics in computing Vol. 12; no. 1; pp. 1 - 16
Main Authors Alvarez-Ayllon, Alejandro, Palomo-Duarte, Manuel, Dodero, Juan-Manuel
Format Journal Article
LanguageEnglish
Published New York IEEE 01.01.2024
The Institute of Electrical and Electronics Engineers, Inc. (IEEE)
Subjects
Online AccessGet full text
ISSN2168-6750
2168-6750
DOI10.1109/TETC.2022.3198252

Cover

More Information
Summary:Cross-matching data stored on separate files is an everyday activity in the scientific domain. However, sometimes the relation between attributes may not be obvious. The discovery of foreign keys on relational databases is a similar problem. Thus techniques devised for this problem can be adapted. Nonetheless, when the data is numeric and subject to uncertainty, this adaptation is not trivial. This paper firstly introduces the concept of Equally-Distributed Dependencies , which is similar to the Inclusion Dependencies from the relational domain. We describe a correspondence in order to bridge existing ideas. We then propose PresQ : a new algorithm based on the search of maximal quasi-cliques on hyper-graphs to make it more robust to the nature of uncertain numerical data. This algorithm has been tested on seven public datasets, showing promising results both in its capacity to find multidimensional equally-distributed sets of attributes and in run-time.
Bibliography:ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 14
ISSN:2168-6750
2168-6750
DOI:10.1109/TETC.2022.3198252