A cost model and index architecture for the similarity join

The similarity join is an important database primitive which has been successfully applied to speed up data mining algorithms. In the similarity join, two point sets of a multidimensional vector space are combined such that the result contains all point pairs where the distance does not exceed a par...

Full description

Saved in:
Bibliographic Details
Published inProceedings 17th International Conference on Data Engineering pp. 411 - 420
Main Authors Bohm, C., Kriegel, H.-P.
Format Conference Proceeding
LanguageEnglish
Published IEEE 2001
Subjects
Online AccessGet full text
ISBN0769510019
9780769510019
ISSN1063-6382
DOI10.1109/ICDE.2001.914854

Cover

Abstract The similarity join is an important database primitive which has been successfully applied to speed up data mining algorithms. In the similarity join, two point sets of a multidimensional vector space are combined such that the result contains all point pairs where the distance does not exceed a parameter /spl epsiv/. Due to its high practical relevance, many similarity join algorithms have been devised. The authors propose an analytical cost model for the similarity join operation based on indexes. Our problem analysis reveals a serious optimization conflict between CPU time and I/O time: fine-grained index structures are beneficial for CPU efficiency, but deteriorate the I/O performance. As a consequence of this observation, we propose a new index architecture and join algorithm which allows a separate optimization of CPU time and I/O time. Our solution utilizes large pages which are optimized for I/O processing. The pages accommodate a search structure which minimizes the computational effort in the experimental evaluation, and a substantial improvement over competitive techniques is shown.
AbstractList The similarity join is an important database primitive which has been successfully applied to speed up data mining algorithms. In the similarity join, two point sets of a multidimensional vector space are combined such that the result contains all point pairs where the distance does not exceed a parameter /spl epsiv/. Due to its high practical relevance, many similarity join algorithms have been devised. The authors propose an analytical cost model for the similarity join operation based on indexes. Our problem analysis reveals a serious optimization conflict between CPU time and I/O time: fine-grained index structures are beneficial for CPU efficiency, but deteriorate the I/O performance. As a consequence of this observation, we propose a new index architecture and join algorithm which allows a separate optimization of CPU time and I/O time. Our solution utilizes large pages which are optimized for I/O processing. The pages accommodate a search structure which minimizes the computational effort in the experimental evaluation, and a substantial improvement over competitive techniques is shown.
Author Bohm, C.
Kriegel, H.-P.
Author_xml – sequence: 1
  givenname: C.
  surname: Bohm
  fullname: Bohm, C.
  organization: Munich Univ., Germany
– sequence: 2
  givenname: H.-P.
  surname: Kriegel
  fullname: Kriegel, H.-P.
BookMark eNotj01LAzEURQNWsK3di6v8gRlfJi8zCa7KWLVQcKPrkkze0JT5kEwE--8dqHdz4C4O967YYhgHYuxBQC4EmKd9_bLLCwCRG4Fa4Q1bQVUaJebKLNhSQCmzUurijm2m6QxzDAqhYMmet7wZp8T70VPH7eB5GDz9chubU0jUpJ9IvB0jTyfiU-hDZ2NIF34ew3DPblvbTbT555p9ve4-6_fs8PG2r7eHLAjAlBUKtasAJGpTSm_QFdRiS83MeUcjvCPtqXSFNa1CVLZyDqx2SCgqNHLNHq_eQETH7xh6Gy_H61X5ByFJSL4
ContentType Conference Proceeding
DBID 6IE
6IH
CBEJK
RIE
RIO
DOI 10.1109/ICDE.2001.914854
DatabaseName IEEE Electronic Library (IEL) Conference Proceedings
IEEE Proceedings Order Plan (POP) 1998-present by volume
IEEE Xplore All Conference Proceedings
IEEE Electronic Library (IEL)
IEEE Proceedings Order Plans (POP) 1998-present
DatabaseTitleList
Database_xml – sequence: 1
  dbid: RIE
  name: IEEE Xplorer
  url: https://proxy.k.utb.cz/login?url=https://ieeexplore.ieee.org/
  sourceTypes: Publisher
DeliveryMethod fulltext_linktorsrc
Discipline Computer Science
EndPage 420
ExternalDocumentID 914854
GroupedDBID 6IE
6IH
6IL
6IN
AAWTH
ABLEC
ADZIZ
ALMA_UNASSIGNED_HOLDINGS
BEFXN
BFFAM
BGNUA
BKEBE
BPEOZ
CBEJK
CHZPO
IEGSK
IJVOP
OCL
RIE
RIL
RIO
ID FETCH-LOGICAL-i104t-2548b700348963d94b2ef4fecb2e094c1dbe8de6b2a9f5445a7bb0a8b4e417493
IEDL.DBID RIE
ISBN 0769510019
9780769510019
ISSN 1063-6382
IngestDate Tue Aug 26 18:50:32 EDT 2025
IsPeerReviewed false
IsScholarly false
Language English
LinkModel DirectLink
MergedId FETCHMERGED-LOGICAL-i104t-2548b700348963d94b2ef4fecb2e094c1dbe8de6b2a9f5445a7bb0a8b4e417493
PageCount 10
ParticipantIDs ieee_primary_914854
PublicationCentury 2000
PublicationDate 20010000
PublicationDateYYYYMMDD 2001-01-01
PublicationDate_xml – year: 2001
  text: 20010000
PublicationDecade 2000
PublicationTitle Proceedings 17th International Conference on Data Engineering
PublicationTitleAbbrev ICDE
PublicationYear 2001
Publisher IEEE
Publisher_xml – name: IEEE
SSID ssj0000941150
ssj0000455441
Score 1.3300425
Snippet The similarity join is an important database primitive which has been successfully applied to speed up data mining algorithms. In the similarity join, two...
SourceID ieee
SourceType Publisher
StartPage 411
SubjectTerms Algorithm design and analysis
Biomedical imaging
Clustering algorithms
Costs
Data mining
Image analysis
Multidimensional systems
Performance analysis
Spatial databases
Time series analysis
Title A cost model and index architecture for the similarity join
URI https://ieeexplore.ieee.org/document/914854
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
link http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV3NS8MwFA9uJ0_TOfGbHLym29qkS_Akc2MKigcHu418FTqxlbW7-Nf7knTzAw9CoU0vpY-E9_u9j99D6NrQoeFMcRe5F4RKkxLJNCeJyWSsaMZ04hqcH5_S2Zw-LNii0dn2vTDWWl98ZiP36HP5ptQbFyrrC8DujLZQa8TT0Kq1C6cAMmFbz74KBXMO6_hcZ5oQ2GUNaXeIAmBNo72zW28zmAPRvx_fTRxxHEbhez_mrni3M-2Efu7KqxW6apPXaFOrSH_80nL85x8doN5Xfx9-3nmuQ7Rniy7qbAc84Oa8H6GbW6zLqsZ-XA6WhcFeXBF_zz5gQL0YUCSu8rccaDKgerwq86KH5tPJy3hGmmkLJAdKVhNgilyNnF4Nh0NpBFWxzWhmNdzBmnpolOXGpiqWInMSPnKk1EByRS0FWiOSY9QuysKeIKw50C6qhWRCuVZZrjIRy5hpmkhGbXyKus4Uy_cgqLEMVjj78-052g9lX-66QO16vbGXgANqdeV3wCeVtapN
linkProvider IEEE
linkToHtml http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV1LT8JAEN4oHvSEIsa3e_DaAu1u2Y0ngxBQIB4g4Ub21aQYW2PLxV_v7LbgIx5MmrTbS9PJbub75vENQreadDSjktnIPfeI0JEnqGJeqGMRSBJTFdoG58k0Gs7J44IuKp1t1wtjjHHFZ8a3jy6XrzO1tqGyFgfsTsku2qOEEFo2a20DKoBN6Ma3r8qSOYt2XLYzCj3YZxVtt5gCgE2lvrNdb3KYbd4a9R76ljp2_PKLPyavOMczqJcd3bnTK7T1Ji_-upC--vil5vjPfzpEza8OP_y89V1HaMekDVTfjHjA1Yk_Rnf3WGV5gd3AHCxSjZ28Iv6ef8CAezHgSJwnrwkQZcD1eJUlaRPNB_1Zb-hV8xa8BEhZ4QFXZLJrFWsYHEvNiQxMTGKj4A7WVB0tDdMmkoHgsRXxEV0p24JJYggQGx6eoFqapeYUYcWAeBHFBeXSNssyGfNABFSRUFBigjPUsKZYvpWSGsvSCud_vr1B-8PZZLwcj6ZPF-igLAKz1yWqFe9rcwWooJDXbjd8ArsHrZo
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Abook&rft.genre=proceeding&rft.title=Proceedings+17th+International+Conference+on+Data+Engineering&rft.atitle=A+cost+model+and+index+architecture+for+the+similarity+join&rft.au=Bohm%2C+C.&rft.au=Kriegel%2C+H.-P.&rft.date=2001-01-01&rft.pub=IEEE&rft.isbn=9780769510019&rft.issn=1063-6382&rft.spage=411&rft.epage=420&rft_id=info:doi/10.1109%2FICDE.2001.914854&rft.externalDocID=914854
thumbnail_l http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=1063-6382&client=summon
thumbnail_m http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=1063-6382&client=summon
thumbnail_s http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=1063-6382&client=summon