A cost model and index architecture for the similarity join

The similarity join is an important database primitive which has been successfully applied to speed up data mining algorithms. In the similarity join, two point sets of a multidimensional vector space are combined such that the result contains all point pairs where the distance does not exceed a par...

Full description

Saved in:

Bibliographic Details
Published in	Proceedings 17th International Conference on Data Engineering pp. 411 - 420
Main Authors	Bohm, C., Kriegel, H.-P.
Format	Conference Proceeding
Language	English
Published	IEEE 2001
Subjects	Algorithm design and analysis Biomedical imaging Clustering algorithms Costs Data mining Image analysis Multidimensional systems Performance analysis Spatial databases Time series analysis
Online Access	Get full text
ISBN	0769510019 9780769510019
ISSN	1063-6382
DOI	10.1109/ICDE.2001.914854

Cover

More Information
Summary:	The similarity join is an important database primitive which has been successfully applied to speed up data mining algorithms. In the similarity join, two point sets of a multidimensional vector space are combined such that the result contains all point pairs where the distance does not exceed a parameter /spl epsiv/. Due to its high practical relevance, many similarity join algorithms have been devised. The authors propose an analytical cost model for the similarity join operation based on indexes. Our problem analysis reveals a serious optimization conflict between CPU time and I/O time: fine-grained index structures are beneficial for CPU efficiency, but deteriorate the I/O performance. As a consequence of this observation, we propose a new index architecture and join algorithm which allows a separate optimization of CPU time and I/O time. Our solution utilizes large pages which are optimized for I/O processing. The pages accommodate a search structure which minimizes the computational effort in the experimental evaluation, and a substantial improvement over competitive techniques is shown.
ISBN:	0769510019 9780769510019
ISSN:	1063-6382
DOI:	10.1109/ICDE.2001.914854