A cost model and index architecture for the similarity join
The similarity join is an important database primitive which has been successfully applied to speed up data mining algorithms. In the similarity join, two point sets of a multidimensional vector space are combined such that the result contains all point pairs where the distance does not exceed a par...
Saved in:
| Published in | Proceedings 17th International Conference on Data Engineering pp. 411 - 420 |
|---|---|
| Main Authors | , |
| Format | Conference Proceeding |
| Language | English |
| Published |
IEEE
2001
|
| Subjects | |
| Online Access | Get full text |
| ISBN | 0769510019 9780769510019 |
| ISSN | 1063-6382 |
| DOI | 10.1109/ICDE.2001.914854 |
Cover
| Abstract | The similarity join is an important database primitive which has been successfully applied to speed up data mining algorithms. In the similarity join, two point sets of a multidimensional vector space are combined such that the result contains all point pairs where the distance does not exceed a parameter /spl epsiv/. Due to its high practical relevance, many similarity join algorithms have been devised. The authors propose an analytical cost model for the similarity join operation based on indexes. Our problem analysis reveals a serious optimization conflict between CPU time and I/O time: fine-grained index structures are beneficial for CPU efficiency, but deteriorate the I/O performance. As a consequence of this observation, we propose a new index architecture and join algorithm which allows a separate optimization of CPU time and I/O time. Our solution utilizes large pages which are optimized for I/O processing. The pages accommodate a search structure which minimizes the computational effort in the experimental evaluation, and a substantial improvement over competitive techniques is shown. |
|---|---|
| AbstractList | The similarity join is an important database primitive which has been successfully applied to speed up data mining algorithms. In the similarity join, two point sets of a multidimensional vector space are combined such that the result contains all point pairs where the distance does not exceed a parameter /spl epsiv/. Due to its high practical relevance, many similarity join algorithms have been devised. The authors propose an analytical cost model for the similarity join operation based on indexes. Our problem analysis reveals a serious optimization conflict between CPU time and I/O time: fine-grained index structures are beneficial for CPU efficiency, but deteriorate the I/O performance. As a consequence of this observation, we propose a new index architecture and join algorithm which allows a separate optimization of CPU time and I/O time. Our solution utilizes large pages which are optimized for I/O processing. The pages accommodate a search structure which minimizes the computational effort in the experimental evaluation, and a substantial improvement over competitive techniques is shown. |
| Author | Bohm, C. Kriegel, H.-P. |
| Author_xml | – sequence: 1 givenname: C. surname: Bohm fullname: Bohm, C. organization: Munich Univ., Germany – sequence: 2 givenname: H.-P. surname: Kriegel fullname: Kriegel, H.-P. |
| BookMark | eNotj01LAzEURQNWsK3di6v8gRlfJi8zCa7KWLVQcKPrkkze0JT5kEwE--8dqHdz4C4O967YYhgHYuxBQC4EmKd9_bLLCwCRG4Fa4Q1bQVUaJebKLNhSQCmzUurijm2m6QxzDAqhYMmet7wZp8T70VPH7eB5GDz9chubU0jUpJ9IvB0jTyfiU-hDZ2NIF34ew3DPblvbTbT555p9ve4-6_fs8PG2r7eHLAjAlBUKtasAJGpTSm_QFdRiS83MeUcjvCPtqXSFNa1CVLZyDqx2SCgqNHLNHq_eQETH7xh6Gy_H61X5ByFJSL4 |
| ContentType | Conference Proceeding |
| DBID | 6IE 6IH CBEJK RIE RIO |
| DOI | 10.1109/ICDE.2001.914854 |
| DatabaseName | IEEE Electronic Library (IEL) Conference Proceedings IEEE Proceedings Order Plan (POP) 1998-present by volume IEEE Xplore All Conference Proceedings IEEE Electronic Library (IEL) IEEE Proceedings Order Plans (POP) 1998-present |
| DatabaseTitleList | |
| Database_xml | – sequence: 1 dbid: RIE name: IEEE Xplorer url: https://proxy.k.utb.cz/login?url=https://ieeexplore.ieee.org/ sourceTypes: Publisher |
| DeliveryMethod | fulltext_linktorsrc |
| Discipline | Computer Science |
| EndPage | 420 |
| ExternalDocumentID | 914854 |
| GroupedDBID | 6IE 6IH 6IL 6IN AAWTH ABLEC ADZIZ ALMA_UNASSIGNED_HOLDINGS BEFXN BFFAM BGNUA BKEBE BPEOZ CBEJK CHZPO IEGSK IJVOP OCL RIE RIL RIO |
| ID | FETCH-LOGICAL-i104t-2548b700348963d94b2ef4fecb2e094c1dbe8de6b2a9f5445a7bb0a8b4e417493 |
| IEDL.DBID | RIE |
| ISBN | 0769510019 9780769510019 |
| ISSN | 1063-6382 |
| IngestDate | Tue Aug 26 18:50:32 EDT 2025 |
| IsPeerReviewed | false |
| IsScholarly | false |
| Language | English |
| LinkModel | DirectLink |
| MergedId | FETCHMERGED-LOGICAL-i104t-2548b700348963d94b2ef4fecb2e094c1dbe8de6b2a9f5445a7bb0a8b4e417493 |
| PageCount | 10 |
| ParticipantIDs | ieee_primary_914854 |
| PublicationCentury | 2000 |
| PublicationDate | 20010000 |
| PublicationDateYYYYMMDD | 2001-01-01 |
| PublicationDate_xml | – year: 2001 text: 20010000 |
| PublicationDecade | 2000 |
| PublicationTitle | Proceedings 17th International Conference on Data Engineering |
| PublicationTitleAbbrev | ICDE |
| PublicationYear | 2001 |
| Publisher | IEEE |
| Publisher_xml | – name: IEEE |
| SSID | ssj0000941150 ssj0000455441 |
| Score | 1.3300425 |
| Snippet | The similarity join is an important database primitive which has been successfully applied to speed up data mining algorithms. In the similarity join, two... |
| SourceID | ieee |
| SourceType | Publisher |
| StartPage | 411 |
| SubjectTerms | Algorithm design and analysis Biomedical imaging Clustering algorithms Costs Data mining Image analysis Multidimensional systems Performance analysis Spatial databases Time series analysis |
| Title | A cost model and index architecture for the similarity join |
| URI | https://ieeexplore.ieee.org/document/914854 |
| hasFullText | 1 |
| inHoldings | 1 |
| isFullTextHit | |
| isPrint | |
| link | http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV3NS8MwFA9uJ0_TOfGbHLym29qkS_Akc2MKigcHu418FTqxlbW7-Nf7knTzAw9CoU0vpY-E9_u9j99D6NrQoeFMcRe5F4RKkxLJNCeJyWSsaMZ04hqcH5_S2Zw-LNii0dn2vTDWWl98ZiP36HP5ptQbFyrrC8DujLZQa8TT0Kq1C6cAMmFbz74KBXMO6_hcZ5oQ2GUNaXeIAmBNo72zW28zmAPRvx_fTRxxHEbhez_mrni3M-2Efu7KqxW6apPXaFOrSH_80nL85x8doN5Xfx9-3nmuQ7Rniy7qbAc84Oa8H6GbW6zLqsZ-XA6WhcFeXBF_zz5gQL0YUCSu8rccaDKgerwq86KH5tPJy3hGmmkLJAdKVhNgilyNnF4Nh0NpBFWxzWhmNdzBmnpolOXGpiqWInMSPnKk1EByRS0FWiOSY9QuysKeIKw50C6qhWRCuVZZrjIRy5hpmkhGbXyKus4Uy_cgqLEMVjj78-052g9lX-66QO16vbGXgANqdeV3wCeVtapN |
| linkProvider | IEEE |
| linkToHtml | http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV1LT8JAEN4oHvSEIsa3e_DaAu1u2Y0ngxBQIB4g4Ub21aQYW2PLxV_v7LbgIx5MmrTbS9PJbub75vENQreadDSjktnIPfeI0JEnqGJeqGMRSBJTFdoG58k0Gs7J44IuKp1t1wtjjHHFZ8a3jy6XrzO1tqGyFgfsTsku2qOEEFo2a20DKoBN6Ma3r8qSOYt2XLYzCj3YZxVtt5gCgE2lvrNdb3KYbd4a9R76ljp2_PKLPyavOMczqJcd3bnTK7T1Ji_-upC--vil5vjPfzpEza8OP_y89V1HaMekDVTfjHjA1Yk_Rnf3WGV5gd3AHCxSjZ28Iv6ef8CAezHgSJwnrwkQZcD1eJUlaRPNB_1Zb-hV8xa8BEhZ4QFXZLJrFWsYHEvNiQxMTGKj4A7WVB0tDdMmkoHgsRXxEV0p24JJYggQGx6eoFqapeYUYcWAeBHFBeXSNssyGfNABFSRUFBigjPUsKZYvpWSGsvSCud_vr1B-8PZZLwcj6ZPF-igLAKz1yWqFe9rcwWooJDXbjd8ArsHrZo |
| openUrl | ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Abook&rft.genre=proceeding&rft.title=Proceedings+17th+International+Conference+on+Data+Engineering&rft.atitle=A+cost+model+and+index+architecture+for+the+similarity+join&rft.au=Bohm%2C+C.&rft.au=Kriegel%2C+H.-P.&rft.date=2001-01-01&rft.pub=IEEE&rft.isbn=9780769510019&rft.issn=1063-6382&rft.spage=411&rft.epage=420&rft_id=info:doi/10.1109%2FICDE.2001.914854&rft.externalDocID=914854 |
| thumbnail_l | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=1063-6382&client=summon |
| thumbnail_m | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=1063-6382&client=summon |
| thumbnail_s | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=1063-6382&client=summon |