Selectivity estimators for multidimensional range queries over real attributes
Estimating the selectivity of multidimensional range queries over real valued attributes has significant applications in data exploration and database query optimization. In this paper, we consider the following problem: given a table of d attributes whose domain is the real numbers and a query that...
        Saved in:
      
    
          | Published in | The VLDB journal Vol. 14; no. 2; pp. 137 - 154 | 
|---|---|
| Main Authors | , , , | 
| Format | Journal Article | 
| Language | English | 
| Published | 
        Heidelberg
          Springer
    
        01.04.2005
     | 
| Subjects | |
| Online Access | Get full text | 
| ISSN | 1066-8888 0949-877X  | 
| DOI | 10.1007/s00778-003-0090-4 | 
Cover
| Abstract | Estimating the selectivity of multidimensional range queries over real valued attributes has significant applications in data exploration and database query optimization. In this paper, we consider the following problem: given a table of d attributes whose domain is the real numbers and a query that specifies a range in each dimension, find a good approximation of the number of records in the table that satisfy the query. The simplest approach to tackle this problem is to assume that the attributes are independent. More accurate estimators try to capture the joint data distribution of the attributes. In databases, such estimators include the construction of multidimensional histograms, random sampling, or the wavelet transform. In statistics, kernel estimation techniques are being used. Many traditional approaches assume that attribute values come from discrete, finite domains, where different values have high frequencies. However, for many novel applications (as in temporal, spatial, and multimedia databases) attribute values come from the infinite domain of real numbers. Consequently, each value appears very infrequently, a characteristic that affects the behavior and effectiveness of the estimator. Moreover, real-life data exhibit attribute correlations that also affect the estimator. We present a new histogram technique that is designed to approximate the density of multidimensional datasets with real attributes. Our technique defines buckets of variable size and allows the buckets to overlap. The size of the cells is based on the local density of the data. The use of overlapping buckets allows a more compact approximation of the data distribution. We also show how to generalize kernel density estimators and how to apply them to the multidimensional query approximation problem. Finally, we compare the accuracy of the proposed techniques with existing techniques using real and synthetic datasets. The experimental results show that the proposed techniques behave more accurately in high dimensionalities than previous approaches. | 
    
|---|---|
| AbstractList | Estimating the selectivity of multidimensional range queries over real valued attributes has significant applications in data exploration and database query optimization. In this paper, we consider the following problem: given a table of d attributes whose domain is the real numbers and a query that specifies a range in each dimension, find a good approximation of the number of records in the table that satisfy the query. The simplest approach to tackle this problem is to assume that the attributes are independent. More accurate estimators try to capture the joint data distribution of the attributes. In databases, such estimators include the construction of multidimensional histograms, random sampling, or the wavelet transform. In statistics, kernel estimation techniques are being used. Many traditional approaches assume that attribute values come from discrete, finite domains, where different values have high frequencies. However, for many novel applications (as in temporal, spatial, and multimedia databases) attribute values come from the infinite domain of real numbers. Consequently, each value appears very infrequently, a characteristic that affects the behavior and effectiveness of the estimator. Moreover, real-life data exhibit attribute correlations that also affect the estimator. We present a new histogram technique that is designed to approximate the density of multidimensional datasets with real attributes. Our technique defines buckets of variable size and allows the buckets to overlap. The size of the cells is based on the local density of the data. The use of overlapping buckets allows a more compact approximation of the data distribution. We also show how to generalize kernel density estimators and how to apply them to the multidimensional query approximation problem. Finally, we compare the accuracy of the proposed techniques with existing techniques using real and synthetic datasets. The experimental results show that the proposed techniques behave more accurately in high dimensionalities than previous approaches. | 
    
| Author | Kollios, George Domeniconi, Carlotta Tsotras, Vassilis J. Gunopulos, Dimitrios  | 
    
| Author_xml | – sequence: 1 givenname: Dimitrios surname: Gunopulos fullname: Gunopulos, Dimitrios – sequence: 2 givenname: George surname: Kollios fullname: Kollios, George – sequence: 3 givenname: Vassilis J. surname: Tsotras fullname: Tsotras, Vassilis J. – sequence: 4 givenname: Carlotta surname: Domeniconi fullname: Domeniconi, Carlotta  | 
    
| BackLink | http://pascal-francis.inist.fr/vibad/index.php?action=getRecordDetail&idt=16706437$$DView record in Pascal Francis | 
    
| BookMark | eNqNkU1LAzEQhoNUsK3-AG970dtqvjbZHKX4BUUP9uAtZLMTiWx3a5It9N-b0oLgQRyYGRieGWbmnaFJP_SA0CXBNwRjeRtzkHWJMcuucMlP0BQrrspayvcJmhIsRFlnO0OzGD8xxpTSaope3qADm_zWp10BMfm1SUOIhRtCsR675Fu_hj76oTddEUz_AcXXCMFDLIYthCJArpuUgm_GBPEcnTrTRbg45jlaPdyvFk_l8vXxeXG3LC1jMpWUV62wQLjg3DWKSELahoBitSCOyMrVbZV3lUoy1SjARDFDXV1VxkqoWjZH14exmzDkdWLSax8tdJ3pYRijpooLKRX9B4hFRSnP4NURNNGazuVTrY96E_JDwk4TIbHgTGaOHDgbhhgDuB8E670S-qCEzkrovRJ6P1v-6rE-mZR_moLx3R-d35p0j5Q | 
    
| CitedBy_id | crossref_primary_10_1007_s10115_023_02013_2 crossref_primary_10_1080_13658816_2012_698017 crossref_primary_10_1109_TKDE_2021_3112753 crossref_primary_10_1145_3555811 crossref_primary_10_14778_3421424_3421432 crossref_primary_10_1007_s10115_021_01547_7 crossref_primary_10_1016_j_dss_2011_05_006 crossref_primary_10_1007_s10489_020_01712_5 crossref_primary_10_1016_j_is_2011_03_007 crossref_primary_10_1016_j_is_2020_101520 crossref_primary_10_1145_2487259_2487263 crossref_primary_10_14778_3503585_3503586 crossref_primary_10_1007_s41019_020_00149_7 crossref_primary_10_14778_3151106_3151112 crossref_primary_10_1145_1386118_1386124 crossref_primary_10_3923_jas_2007_91_97 crossref_primary_10_1109_TKDE_2012_48 crossref_primary_10_14778_3368289_3368294 crossref_primary_10_1007_s00778_008_0128_8 crossref_primary_10_1587_transinf_2018DAP0020 crossref_primary_10_1007_s11280_022_01033_2 crossref_primary_10_1109_TKDE_2008_21 crossref_primary_10_1007_s10844_009_0099_2 crossref_primary_10_14778_3329772_3329780 crossref_primary_10_1007_s10489_017_1093_y crossref_primary_10_14778_3342263_3342635 crossref_primary_10_1016_j_ins_2011_11_009 crossref_primary_10_1145_3059177 crossref_primary_10_1016_j_datak_2013_04_003 crossref_primary_10_1016_j_orl_2011_06_001 crossref_primary_10_14778_3461535_3461552 crossref_primary_10_1007_s10707_012_0154_y crossref_primary_10_1145_3588721 crossref_primary_10_1007_s10844_013_0268_1 crossref_primary_10_14778_1687627_1687703 crossref_primary_10_14778_3436905_3436907 crossref_primary_10_1016_j_datak_2006_08_013 crossref_primary_10_1016_j_is_2021_101738 crossref_primary_10_14778_3461535_3461539 crossref_primary_10_1007_s10115_011_0441_1 crossref_primary_10_14778_3137628_3137658 crossref_primary_10_1002_cpe_7817 crossref_primary_10_1016_j_camwa_2008_10_056 crossref_primary_10_1145_3689209 crossref_primary_10_1007_s10115_007_0087_1  | 
    
| Cites_doi | 10.1145/342009.335448 10.1145/582095.582099 10.1145/304182.304200 10.1007/3-540-52342-1_23 10.1145/130283.130335 10.1109/SSDM.1999.787618 10.1002/9781119115151 10.1145/93597.93611 10.1007/978-1-4899-4493-1 10.1145/253260.253291 10.1145/288627.288645 10.1145/276304.276344 10.1145/304182.304199 10.1145/276304.276343 10.1145/304182.304184 10.1145/304182.304198 10.1145/304182.304203 10.1145/375663.375686 10.1145/233269.233342 10.1145/276304.276334 10.2307/2347366 10.1109/SSDM.1999.787640 10.1007/3-540-49257-7_16  | 
    
| ContentType | Journal Article | 
    
| Copyright | 2005 INIST-CNRS | 
    
| Copyright_xml | – notice: 2005 INIST-CNRS | 
    
| DBID | AAYXX CITATION IQODW 7SC 8FD JQ2 L7M L~C L~D  | 
    
| DOI | 10.1007/s00778-003-0090-4 | 
    
| DatabaseName | CrossRef Pascal-Francis Computer and Information Systems Abstracts Technology Research Database ProQuest Computer Science Collection Advanced Technologies Database with Aerospace Computer and Information Systems Abstracts  Academic Computer and Information Systems Abstracts Professional  | 
    
| DatabaseTitle | CrossRef Computer and Information Systems Abstracts Technology Research Database Computer and Information Systems Abstracts – Academic Advanced Technologies Database with Aerospace ProQuest Computer Science Collection Computer and Information Systems Abstracts Professional  | 
    
| DatabaseTitleList | Computer and Information Systems Abstracts Computer and Information Systems Abstracts  | 
    
| DeliveryMethod | fulltext_linktorsrc | 
    
| Discipline | Computer Science Applied Sciences  | 
    
| EISSN | 0949-877X | 
    
| EndPage | 154 | 
    
| ExternalDocumentID | 16706437 10_1007_s00778_003_0090_4  | 
    
| GroupedDBID | -Y2 -~C -~X .4S .86 .DC .VR 06D 0R~ 123 1N0 1SB 2.D 203 29R 2J2 2JN 2JY 2KG 2KM 2LR 2P1 2VQ 2~H 3-Y 30V 4.4 406 408 409 40D 40E 5QI 5VS 67Z 6NX 8TC 8UJ 95- 95. 95~ 96X AAAVM AABHQ AACDK AAHNG AAIAL AAJBT AAJKR AAKMM AALFJ AANZL AAOBN AAPKM AARHV AARTL AASML AATNV AATVU AAUYE AAWCG AAWTV AAYFX AAYIU AAYQN AAYTO AAYXX AAYZH ABAKF ABBBX ABBRH ABBXA ABDBE ABDZT ABECU ABFSG ABFTD ABFTV ABHLI ABHQN ABJNI ABJOX ABKCH ABKTR ABMNI ABMQK ABNWP ABQBU ABQSL ABRTQ ABSXP ABTEG ABTHY ABTKH ABTMW ABULA ABWNU ABXPI ACAOD ACBXY ACDTI ACGFS ACHSB ACHXU ACKNC ACM ACMDZ ACMLO ACOKC ACOMO ACPIV ACSTC ACZOJ ADHHG ADHIR ADHKG ADIMF ADKNI ADKPE ADL ADQRH ADRFC ADTPH ADURQ ADYFF ADZKW AEBTG AEBYY AEFIE AEFQL AEFXT AEGAL AEGNC AEJHL AEJOY AEJRE AEKMD AEMSY AENEX AENSD AEOHA AEPYU AESKC AETLH AEVLU AEXYK AEZWR AFBBN AFDZB AFEXP AFGCZ AFHIU AFLOW AFOHR AFQWF AFWIH AFWTZ AFWXC AFZKB AGAYW AGDGC AGGDS AGJBK AGMZJ AGQEE AGQMX AGQPQ AGWIL AGWZB AGYKE AHAVH AHBYD AHPBZ AHSBF AHWEU AHYZX AIAKS AIGIU AIIXL AILAN AITGF AIXLP AJBLW AJRNO AJZVZ AKRVB ALMA_UNASSIGNED_HOLDINGS ALWAN AMKLP AMXSW AMYLF AMYQR AOCGG ARCSS ARMRJ ASPBG ATHPR AVWKF AXYYD AYFIA AYJHY AZFZN B-. BA0 BBWZM BDATZ BGNMA BSONS CAG CCLIF CITATION COF CS3 CSCUP DDRTE DL5 DNIVK DPUIP DU5 EBLON EBS EDO EIOEI EJD ESBYG FEDTE FERAY FFXSO FIGPU FINBP FNLPD FRRFC FSGXE FWDCC GGCAI GGRSB GJIRD GNWQR GQ7 GQ8 GUFHI GXS H13 HF~ HG5 HG6 HGAVV HMJXF HQYDN HRMNR HVGLF HZ~ I07 I09 IHE IJ- IKXTQ ITM IWAJR IXC IZIGR IZQ I~X I~Z J-C J0Z JBSCW JCJTX JZLTJ KDC KOV KOW LAS LHSKQ LLZTM M4Y MA- N2Q N9A NB0 NDZJH NPVJJ NQJWS NU0 O9- O93 O9G O9I O9J OAM P0- P19 P2P P9O PF0 PT4 PT5 QOK QOS R4E R89 R9I RHV RNI RNS ROL RPX RSV RZK S16 S1Z S26 S27 S28 S3B SAP SCJ SCLPG SCO SDH SDM SHX SISQX SJYHP SNE SNPRN SNX SOHCF SOJ SPISZ SRMVM SSLCW STPWE T13 T16 TSG TSK TSV TUC TUS U2A UG4 UOJIU UTJUX UZXMN VC2 VFIZW W23 W48 WK8 YLTOR YZZ Z45 ZMTXR ~EX IQODW RIG 7SC 8FD JQ2 L7M L~C L~D  | 
    
| ID | FETCH-LOGICAL-c337t-245d6ce14644fb91711db1e93861f175f8d588879739b9e0193a2f855ac7e5d3 | 
    
| ISSN | 1066-8888 | 
    
| IngestDate | Thu Oct 02 11:20:33 EDT 2025 Thu Oct 02 11:06:28 EDT 2025 Mon Jul 21 09:14:16 EDT 2025 Wed Oct 01 02:42:51 EDT 2025 Thu Apr 24 23:04:21 EDT 2025  | 
    
| IsPeerReviewed | false | 
    
| IsScholarly | true | 
    
| Issue | 2 | 
    
| Keywords | Computational geometry Database query Query Wavelet transformation Multimedia databases Temporal databases Database Data distribution Information retrieval Selectivity Spatial database  | 
    
| Language | English | 
    
| License | http://www.springer.com/tdm CC BY 4.0  | 
    
| LinkModel | OpenURL | 
    
| MergedId | FETCHMERGED-LOGICAL-c337t-245d6ce14644fb91711db1e93861f175f8d588879739b9e0193a2f855ac7e5d3 | 
    
| Notes | ObjectType-Article-2 SourceType-Scholarly Journals-1 ObjectType-Feature-1 content type line 23  | 
    
| PQID | 29065224 | 
    
| PQPubID | 23500 | 
    
| PageCount | 18 | 
    
| ParticipantIDs | proquest_miscellaneous_29467792 proquest_miscellaneous_29065224 pascalfrancis_primary_16706437 crossref_primary_10_1007_s00778_003_0090_4 crossref_citationtrail_10_1007_s00778_003_0090_4  | 
    
| ProviderPackageCode | CITATION AAYXX  | 
    
| PublicationCentury | 2000 | 
    
| PublicationDate | 2005-04-01 | 
    
| PublicationDateYYYYMMDD | 2005-04-01 | 
    
| PublicationDate_xml | – month: 04 year: 2005 text: 2005-04-01 day: 01  | 
    
| PublicationDecade | 2000 | 
    
| PublicationPlace | Heidelberg | 
    
| PublicationPlace_xml | – name: Heidelberg | 
    
| PublicationTitle | The VLDB journal | 
    
| PublicationYear | 2005 | 
    
| Publisher | Springer | 
    
| Publisher_xml | – name: Springer | 
    
| References | CR19 CR18 CR17 CR39 CR16 CR38 CR15 CR37 CR14 CR36 CR13 CR35 CR12 CR34 CR11 CR33 CR10 CR32 CR30 Scott (CR31) 1992; estimation CR2 CR1 CR4 CR3 CR6 CR5 CR8 CR7 CR29 CR28 CR9 CR27 CR26 CR25 CR24 CR23 CR22 CR21 CR20  | 
    
| References_xml | – ident: CR12 doi: 10.1145/342009.335448 – ident: CR18 – ident: CR32 doi: 10.1145/582095.582099 – ident: CR39 – ident: CR16 – ident: CR22 doi: 10.1145/304182.304200 – ident: CR27 doi: 10.1007/3-540-52342-1_23 – ident: CR33 – ident: CR13 doi: 10.1145/130283.130335 – ident: CR28 doi: 10.1109/SSDM.1999.787618 – ident: CR35 – ident: CR7 doi: 10.1002/9781119115151 – ident: CR21 doi: 10.1145/93597.93611 – ident: CR29 – ident: CR38 doi: 10.1007/978-1-4899-4493-1 – ident: CR25 – ident: CR14 doi: 10.1145/253260.253291 – ident: CR19 – ident: CR37 doi: 10.1145/288627.288645 – ident: CR15 – ident: CR17 – ident: CR23 doi: 10.1145/276304.276344 – ident: CR36 doi: 10.1145/304182.304199 – ident: CR6 doi: 10.1145/276304.276343 – ident: CR11 – ident: CR2 doi: 10.1145/304182.304184 – ident: CR9 – volume: estimation start-page: theory year: 1992 ident: CR31 publication-title: Multivariate density – ident: CR1 doi: 10.1145/304182.304198 – ident: CR34 – ident: CR5 – ident: CR3 doi: 10.1145/304182.304203 – ident: CR4 doi: 10.1145/375663.375686 – ident: CR30 doi: 10.1145/233269.233342 – ident: CR24 – ident: CR10 doi: 10.1145/276304.276334 – ident: CR8 doi: 10.2307/2347366 – ident: CR20 doi: 10.1109/SSDM.1999.787640 – ident: CR26 doi: 10.1007/3-540-49257-7_16  | 
    
| SSID | ssj0002225 | 
    
| Score | 2.072623 | 
    
| Snippet | Estimating the selectivity of multidimensional range queries over real valued attributes has significant applications in data exploration and database query... | 
    
| SourceID | proquest pascalfrancis crossref  | 
    
| SourceType | Aggregation Database Index Database Enrichment Source  | 
    
| StartPage | 137 | 
    
| SubjectTerms | Applied sciences Computer science; control theory; systems Exact sciences and technology Information systems. Data bases Memory organisation. Data processing Software  | 
    
| Title | Selectivity estimators for multidimensional range queries over real attributes | 
    
| URI | https://www.proquest.com/docview/29065224 https://www.proquest.com/docview/29467792  | 
    
| Volume | 14 | 
    
| hasFullText | 1 | 
    
| inHoldings | 1 | 
    
| isFullTextHit | |
| isPrint | |
| journalDatabaseRights | – providerCode: PRVLSH databaseName: SpringerLink Journals customDbUrl: mediaType: online eissn: 0949-877X dateEnd: 99991231 omitProxy: false ssIdentifier: ssj0002225 issn: 1066-8888 databaseCode: AFBBN dateStart: 19970201 isFulltext: true providerName: Library Specific Holdings – providerCode: PRVAVX databaseName: SpringerLINK - Czech Republic Consortium customDbUrl: eissn: 0949-877X dateEnd: 99991231 omitProxy: false ssIdentifier: ssj0002225 issn: 1066-8888 databaseCode: AGYKE dateStart: 19970101 isFulltext: true titleUrlDefault: http://link.springer.com providerName: Springer Nature – providerCode: PRVAVX databaseName: SpringerLink Journals (ICM) customDbUrl: eissn: 0949-877X dateEnd: 99991231 omitProxy: true ssIdentifier: ssj0002225 issn: 1066-8888 databaseCode: U2A dateStart: 19970101 isFulltext: true titleUrlDefault: http://www.springerlink.com/journals/ providerName: Springer Nature  | 
    
| link | http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwnV1Lj9MwELbKckFCvNGWx-ID4kAUlIcTJ8d9aoWqciBFvUWx40ig3aSk6YVfz4zt5sGi5XGJKstpU8-X8Yw9_j5C3irFykQIz_UKJV2mpHBTEWORo5Q8rLwqVprtcxlfrtjHdbSezTajqqVdJz7IH789V_I_VoU2sCuekv0Hy_ZfCg3wGewLV7AwXP_Kxp-1iI2Rf0C2jOtCa-dg5aAuFCyRut_QbjgtniJwYBbA3NjBwk2nRVLhojOaV7aW8NuAni-LsxNn_BxYprOrUfHL1Oad4eGo9msz7Achw3czWmvv1wW2Tdfao2N96IzcD4BDrSmFhSdXTdcVk1WIaFS8YhwnhC4uZNPGlyq7xMhS8LZ8PfG2bISqYOQ6fUP-Ymdh31BL33DwpqZjiyxEiauVEL3Us-eEJmTay0_5xWqxyLPzdfZu891FnTHcj7eiK3fI3QDmART7WAXH_dyN2a_eH7f_Zr8P7mna2elvTiKZ-5tiCy9VZdRQbkzsOlrJHpEHNs2gxwYzj8lM1U_Iw72EB7Ue_SlZjiBEBwhRgBD9FUJUQ4haCFGEEEUI0QFCz0h2cZ6dXrpWYsOVYcg7N2BRGUsF0yVjlYDU3fdL4as0TGK_gsiySsoIhoGnPExFqiAfCIugSqKokFxFZficHNRNrQ4JDRPJoigIOZOMCaSb9arC46Vk0BJX8Zx4-9HKpaWfRxWUq7wnztYDjGS1OQ5wzubkfX_LxnCv3Nb5aGKC4Y6Y693pOXmzt0kOLhT3xYpaNbttjooHkIaw23pAPMHT4MUfe7wk94bX4xU56Nqdeg1hayeONNJ-Alsim80 | 
    
| linkProvider | Springer Nature | 
    
| openUrl | ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Selectivity+estimators+for+multidimensional+range+queries+over+real+attributes&rft.jtitle=The+VLDB+journal&rft.au=Gunopulos%2C+Dimitrios&rft.au=Kollios%2C+George&rft.au=Tsotras%2C+J&rft.au=Domeniconi%2C+Carlotta&rft.date=2005-04-01&rft.issn=1066-8888&rft.eissn=0949-877X&rft.volume=14&rft.issue=2&rft.spage=137&rft.epage=154&rft_id=info:doi/10.1007%2Fs00778-003-0090-4&rft.externalDBID=NO_FULL_TEXT | 
    
| thumbnail_l | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=1066-8888&client=summon | 
    
| thumbnail_m | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=1066-8888&client=summon | 
    
| thumbnail_s | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=1066-8888&client=summon |