Selectivity estimators for multidimensional range queries over real attributes

Estimating the selectivity of multidimensional range queries over real valued attributes has significant applications in data exploration and database query optimization. In this paper, we consider the following problem: given a table of d attributes whose domain is the real numbers and a query that...

Full description

Saved in:
Bibliographic Details
Published inThe VLDB journal Vol. 14; no. 2; pp. 137 - 154
Main Authors Gunopulos, Dimitrios, Kollios, George, Tsotras, Vassilis J., Domeniconi, Carlotta
Format Journal Article
LanguageEnglish
Published Heidelberg Springer 01.04.2005
Subjects
Online AccessGet full text
ISSN1066-8888
0949-877X
DOI10.1007/s00778-003-0090-4

Cover

Abstract Estimating the selectivity of multidimensional range queries over real valued attributes has significant applications in data exploration and database query optimization. In this paper, we consider the following problem: given a table of d attributes whose domain is the real numbers and a query that specifies a range in each dimension, find a good approximation of the number of records in the table that satisfy the query. The simplest approach to tackle this problem is to assume that the attributes are independent. More accurate estimators try to capture the joint data distribution of the attributes. In databases, such estimators include the construction of multidimensional histograms, random sampling, or the wavelet transform. In statistics, kernel estimation techniques are being used. Many traditional approaches assume that attribute values come from discrete, finite domains, where different values have high frequencies. However, for many novel applications (as in temporal, spatial, and multimedia databases) attribute values come from the infinite domain of real numbers. Consequently, each value appears very infrequently, a characteristic that affects the behavior and effectiveness of the estimator. Moreover, real-life data exhibit attribute correlations that also affect the estimator. We present a new histogram technique that is designed to approximate the density of multidimensional datasets with real attributes. Our technique defines buckets of variable size and allows the buckets to overlap. The size of the cells is based on the local density of the data. The use of overlapping buckets allows a more compact approximation of the data distribution. We also show how to generalize kernel density estimators and how to apply them to the multidimensional query approximation problem. Finally, we compare the accuracy of the proposed techniques with existing techniques using real and synthetic datasets. The experimental results show that the proposed techniques behave more accurately in high dimensionalities than previous approaches.
AbstractList Estimating the selectivity of multidimensional range queries over real valued attributes has significant applications in data exploration and database query optimization. In this paper, we consider the following problem: given a table of d attributes whose domain is the real numbers and a query that specifies a range in each dimension, find a good approximation of the number of records in the table that satisfy the query. The simplest approach to tackle this problem is to assume that the attributes are independent. More accurate estimators try to capture the joint data distribution of the attributes. In databases, such estimators include the construction of multidimensional histograms, random sampling, or the wavelet transform. In statistics, kernel estimation techniques are being used. Many traditional approaches assume that attribute values come from discrete, finite domains, where different values have high frequencies. However, for many novel applications (as in temporal, spatial, and multimedia databases) attribute values come from the infinite domain of real numbers. Consequently, each value appears very infrequently, a characteristic that affects the behavior and effectiveness of the estimator. Moreover, real-life data exhibit attribute correlations that also affect the estimator. We present a new histogram technique that is designed to approximate the density of multidimensional datasets with real attributes. Our technique defines buckets of variable size and allows the buckets to overlap. The size of the cells is based on the local density of the data. The use of overlapping buckets allows a more compact approximation of the data distribution. We also show how to generalize kernel density estimators and how to apply them to the multidimensional query approximation problem. Finally, we compare the accuracy of the proposed techniques with existing techniques using real and synthetic datasets. The experimental results show that the proposed techniques behave more accurately in high dimensionalities than previous approaches.
Author Kollios, George
Domeniconi, Carlotta
Tsotras, Vassilis J.
Gunopulos, Dimitrios
Author_xml – sequence: 1
  givenname: Dimitrios
  surname: Gunopulos
  fullname: Gunopulos, Dimitrios
– sequence: 2
  givenname: George
  surname: Kollios
  fullname: Kollios, George
– sequence: 3
  givenname: Vassilis J.
  surname: Tsotras
  fullname: Tsotras, Vassilis J.
– sequence: 4
  givenname: Carlotta
  surname: Domeniconi
  fullname: Domeniconi, Carlotta
BackLink http://pascal-francis.inist.fr/vibad/index.php?action=getRecordDetail&idt=16706437$$DView record in Pascal Francis
BookMark eNqNkU1LAzEQhoNUsK3-AG970dtqvjbZHKX4BUUP9uAtZLMTiWx3a5It9N-b0oLgQRyYGRieGWbmnaFJP_SA0CXBNwRjeRtzkHWJMcuucMlP0BQrrspayvcJmhIsRFlnO0OzGD8xxpTSaope3qADm_zWp10BMfm1SUOIhRtCsR675Fu_hj76oTddEUz_AcXXCMFDLIYthCJArpuUgm_GBPEcnTrTRbg45jlaPdyvFk_l8vXxeXG3LC1jMpWUV62wQLjg3DWKSELahoBitSCOyMrVbZV3lUoy1SjARDFDXV1VxkqoWjZH14exmzDkdWLSax8tdJ3pYRijpooLKRX9B4hFRSnP4NURNNGazuVTrY96E_JDwk4TIbHgTGaOHDgbhhgDuB8E670S-qCEzkrovRJ6P1v-6rE-mZR_moLx3R-d35p0j5Q
CitedBy_id crossref_primary_10_1007_s10115_023_02013_2
crossref_primary_10_1080_13658816_2012_698017
crossref_primary_10_1109_TKDE_2021_3112753
crossref_primary_10_1145_3555811
crossref_primary_10_14778_3421424_3421432
crossref_primary_10_1007_s10115_021_01547_7
crossref_primary_10_1016_j_dss_2011_05_006
crossref_primary_10_1007_s10489_020_01712_5
crossref_primary_10_1016_j_is_2011_03_007
crossref_primary_10_1016_j_is_2020_101520
crossref_primary_10_1145_2487259_2487263
crossref_primary_10_14778_3503585_3503586
crossref_primary_10_1007_s41019_020_00149_7
crossref_primary_10_14778_3151106_3151112
crossref_primary_10_1145_1386118_1386124
crossref_primary_10_3923_jas_2007_91_97
crossref_primary_10_1109_TKDE_2012_48
crossref_primary_10_14778_3368289_3368294
crossref_primary_10_1007_s00778_008_0128_8
crossref_primary_10_1587_transinf_2018DAP0020
crossref_primary_10_1007_s11280_022_01033_2
crossref_primary_10_1109_TKDE_2008_21
crossref_primary_10_1007_s10844_009_0099_2
crossref_primary_10_14778_3329772_3329780
crossref_primary_10_1007_s10489_017_1093_y
crossref_primary_10_14778_3342263_3342635
crossref_primary_10_1016_j_ins_2011_11_009
crossref_primary_10_1145_3059177
crossref_primary_10_1016_j_datak_2013_04_003
crossref_primary_10_1016_j_orl_2011_06_001
crossref_primary_10_14778_3461535_3461552
crossref_primary_10_1007_s10707_012_0154_y
crossref_primary_10_1145_3588721
crossref_primary_10_1007_s10844_013_0268_1
crossref_primary_10_14778_1687627_1687703
crossref_primary_10_14778_3436905_3436907
crossref_primary_10_1016_j_datak_2006_08_013
crossref_primary_10_1016_j_is_2021_101738
crossref_primary_10_14778_3461535_3461539
crossref_primary_10_1007_s10115_011_0441_1
crossref_primary_10_14778_3137628_3137658
crossref_primary_10_1002_cpe_7817
crossref_primary_10_1016_j_camwa_2008_10_056
crossref_primary_10_1145_3689209
crossref_primary_10_1007_s10115_007_0087_1
Cites_doi 10.1145/342009.335448
10.1145/582095.582099
10.1145/304182.304200
10.1007/3-540-52342-1_23
10.1145/130283.130335
10.1109/SSDM.1999.787618
10.1002/9781119115151
10.1145/93597.93611
10.1007/978-1-4899-4493-1
10.1145/253260.253291
10.1145/288627.288645
10.1145/276304.276344
10.1145/304182.304199
10.1145/276304.276343
10.1145/304182.304184
10.1145/304182.304198
10.1145/304182.304203
10.1145/375663.375686
10.1145/233269.233342
10.1145/276304.276334
10.2307/2347366
10.1109/SSDM.1999.787640
10.1007/3-540-49257-7_16
ContentType Journal Article
Copyright 2005 INIST-CNRS
Copyright_xml – notice: 2005 INIST-CNRS
DBID AAYXX
CITATION
IQODW
7SC
8FD
JQ2
L7M
L~C
L~D
DOI 10.1007/s00778-003-0090-4
DatabaseName CrossRef
Pascal-Francis
Computer and Information Systems Abstracts
Technology Research Database
ProQuest Computer Science Collection
Advanced Technologies Database with Aerospace
Computer and Information Systems Abstracts – Academic
Computer and Information Systems Abstracts Professional
DatabaseTitle CrossRef
Computer and Information Systems Abstracts
Technology Research Database
Computer and Information Systems Abstracts – Academic
Advanced Technologies Database with Aerospace
ProQuest Computer Science Collection
Computer and Information Systems Abstracts Professional
DatabaseTitleList Computer and Information Systems Abstracts
Computer and Information Systems Abstracts
DeliveryMethod fulltext_linktorsrc
Discipline Computer Science
Applied Sciences
EISSN 0949-877X
EndPage 154
ExternalDocumentID 16706437
10_1007_s00778_003_0090_4
GroupedDBID -Y2
-~C
-~X
.4S
.86
.DC
.VR
06D
0R~
123
1N0
1SB
2.D
203
29R
2J2
2JN
2JY
2KG
2KM
2LR
2P1
2VQ
2~H
3-Y
30V
4.4
406
408
409
40D
40E
5QI
5VS
67Z
6NX
8TC
8UJ
95-
95.
95~
96X
AAAVM
AABHQ
AACDK
AAHNG
AAIAL
AAJBT
AAJKR
AAKMM
AALFJ
AANZL
AAOBN
AAPKM
AARHV
AARTL
AASML
AATNV
AATVU
AAUYE
AAWCG
AAWTV
AAYFX
AAYIU
AAYQN
AAYTO
AAYXX
AAYZH
ABAKF
ABBBX
ABBRH
ABBXA
ABDBE
ABDZT
ABECU
ABFSG
ABFTD
ABFTV
ABHLI
ABHQN
ABJNI
ABJOX
ABKCH
ABKTR
ABMNI
ABMQK
ABNWP
ABQBU
ABQSL
ABRTQ
ABSXP
ABTEG
ABTHY
ABTKH
ABTMW
ABULA
ABWNU
ABXPI
ACAOD
ACBXY
ACDTI
ACGFS
ACHSB
ACHXU
ACKNC
ACM
ACMDZ
ACMLO
ACOKC
ACOMO
ACPIV
ACSTC
ACZOJ
ADHHG
ADHIR
ADHKG
ADIMF
ADKNI
ADKPE
ADL
ADQRH
ADRFC
ADTPH
ADURQ
ADYFF
ADZKW
AEBTG
AEBYY
AEFIE
AEFQL
AEFXT
AEGAL
AEGNC
AEJHL
AEJOY
AEJRE
AEKMD
AEMSY
AENEX
AENSD
AEOHA
AEPYU
AESKC
AETLH
AEVLU
AEXYK
AEZWR
AFBBN
AFDZB
AFEXP
AFGCZ
AFHIU
AFLOW
AFOHR
AFQWF
AFWIH
AFWTZ
AFWXC
AFZKB
AGAYW
AGDGC
AGGDS
AGJBK
AGMZJ
AGQEE
AGQMX
AGQPQ
AGWIL
AGWZB
AGYKE
AHAVH
AHBYD
AHPBZ
AHSBF
AHWEU
AHYZX
AIAKS
AIGIU
AIIXL
AILAN
AITGF
AIXLP
AJBLW
AJRNO
AJZVZ
AKRVB
ALMA_UNASSIGNED_HOLDINGS
ALWAN
AMKLP
AMXSW
AMYLF
AMYQR
AOCGG
ARCSS
ARMRJ
ASPBG
ATHPR
AVWKF
AXYYD
AYFIA
AYJHY
AZFZN
B-.
BA0
BBWZM
BDATZ
BGNMA
BSONS
CAG
CCLIF
CITATION
COF
CS3
CSCUP
DDRTE
DL5
DNIVK
DPUIP
DU5
EBLON
EBS
EDO
EIOEI
EJD
ESBYG
FEDTE
FERAY
FFXSO
FIGPU
FINBP
FNLPD
FRRFC
FSGXE
FWDCC
GGCAI
GGRSB
GJIRD
GNWQR
GQ7
GQ8
GUFHI
GXS
H13
HF~
HG5
HG6
HGAVV
HMJXF
HQYDN
HRMNR
HVGLF
HZ~
I07
I09
IHE
IJ-
IKXTQ
ITM
IWAJR
IXC
IZIGR
IZQ
I~X
I~Z
J-C
J0Z
JBSCW
JCJTX
JZLTJ
KDC
KOV
KOW
LAS
LHSKQ
LLZTM
M4Y
MA-
N2Q
N9A
NB0
NDZJH
NPVJJ
NQJWS
NU0
O9-
O93
O9G
O9I
O9J
OAM
P0-
P19
P2P
P9O
PF0
PT4
PT5
QOK
QOS
R4E
R89
R9I
RHV
RNI
RNS
ROL
RPX
RSV
RZK
S16
S1Z
S26
S27
S28
S3B
SAP
SCJ
SCLPG
SCO
SDH
SDM
SHX
SISQX
SJYHP
SNE
SNPRN
SNX
SOHCF
SOJ
SPISZ
SRMVM
SSLCW
STPWE
T13
T16
TSG
TSK
TSV
TUC
TUS
U2A
UG4
UOJIU
UTJUX
UZXMN
VC2
VFIZW
W23
W48
WK8
YLTOR
YZZ
Z45
ZMTXR
~EX
IQODW
RIG
7SC
8FD
JQ2
L7M
L~C
L~D
ID FETCH-LOGICAL-c337t-245d6ce14644fb91711db1e93861f175f8d588879739b9e0193a2f855ac7e5d3
ISSN 1066-8888
IngestDate Thu Oct 02 11:20:33 EDT 2025
Thu Oct 02 11:06:28 EDT 2025
Mon Jul 21 09:14:16 EDT 2025
Wed Oct 01 02:42:51 EDT 2025
Thu Apr 24 23:04:21 EDT 2025
IsPeerReviewed false
IsScholarly true
Issue 2
Keywords Computational geometry
Database query
Query
Wavelet transformation
Multimedia databases
Temporal databases
Database
Data distribution
Information retrieval
Selectivity
Spatial database
Language English
License http://www.springer.com/tdm
CC BY 4.0
LinkModel OpenURL
MergedId FETCHMERGED-LOGICAL-c337t-245d6ce14644fb91711db1e93861f175f8d588879739b9e0193a2f855ac7e5d3
Notes ObjectType-Article-2
SourceType-Scholarly Journals-1
ObjectType-Feature-1
content type line 23
PQID 29065224
PQPubID 23500
PageCount 18
ParticipantIDs proquest_miscellaneous_29467792
proquest_miscellaneous_29065224
pascalfrancis_primary_16706437
crossref_primary_10_1007_s00778_003_0090_4
crossref_citationtrail_10_1007_s00778_003_0090_4
ProviderPackageCode CITATION
AAYXX
PublicationCentury 2000
PublicationDate 2005-04-01
PublicationDateYYYYMMDD 2005-04-01
PublicationDate_xml – month: 04
  year: 2005
  text: 2005-04-01
  day: 01
PublicationDecade 2000
PublicationPlace Heidelberg
PublicationPlace_xml – name: Heidelberg
PublicationTitle The VLDB journal
PublicationYear 2005
Publisher Springer
Publisher_xml – name: Springer
References CR19
CR18
CR17
CR39
CR16
CR38
CR15
CR37
CR14
CR36
CR13
CR35
CR12
CR34
CR11
CR33
CR10
CR32
CR30
Scott (CR31) 1992; estimation
CR2
CR1
CR4
CR3
CR6
CR5
CR8
CR7
CR29
CR28
CR9
CR27
CR26
CR25
CR24
CR23
CR22
CR21
CR20
References_xml – ident: CR12
  doi: 10.1145/342009.335448
– ident: CR18
– ident: CR32
  doi: 10.1145/582095.582099
– ident: CR39
– ident: CR16
– ident: CR22
  doi: 10.1145/304182.304200
– ident: CR27
  doi: 10.1007/3-540-52342-1_23
– ident: CR33
– ident: CR13
  doi: 10.1145/130283.130335
– ident: CR28
  doi: 10.1109/SSDM.1999.787618
– ident: CR35
– ident: CR7
  doi: 10.1002/9781119115151
– ident: CR21
  doi: 10.1145/93597.93611
– ident: CR29
– ident: CR38
  doi: 10.1007/978-1-4899-4493-1
– ident: CR25
– ident: CR14
  doi: 10.1145/253260.253291
– ident: CR19
– ident: CR37
  doi: 10.1145/288627.288645
– ident: CR15
– ident: CR17
– ident: CR23
  doi: 10.1145/276304.276344
– ident: CR36
  doi: 10.1145/304182.304199
– ident: CR6
  doi: 10.1145/276304.276343
– ident: CR11
– ident: CR2
  doi: 10.1145/304182.304184
– ident: CR9
– volume: estimation
  start-page: theory
  year: 1992
  ident: CR31
  publication-title: Multivariate density
– ident: CR1
  doi: 10.1145/304182.304198
– ident: CR34
– ident: CR5
– ident: CR3
  doi: 10.1145/304182.304203
– ident: CR4
  doi: 10.1145/375663.375686
– ident: CR30
  doi: 10.1145/233269.233342
– ident: CR24
– ident: CR10
  doi: 10.1145/276304.276334
– ident: CR8
  doi: 10.2307/2347366
– ident: CR20
  doi: 10.1109/SSDM.1999.787640
– ident: CR26
  doi: 10.1007/3-540-49257-7_16
SSID ssj0002225
Score 2.072623
Snippet Estimating the selectivity of multidimensional range queries over real valued attributes has significant applications in data exploration and database query...
SourceID proquest
pascalfrancis
crossref
SourceType Aggregation Database
Index Database
Enrichment Source
StartPage 137
SubjectTerms Applied sciences
Computer science; control theory; systems
Exact sciences and technology
Information systems. Data bases
Memory organisation. Data processing
Software
Title Selectivity estimators for multidimensional range queries over real attributes
URI https://www.proquest.com/docview/29065224
https://www.proquest.com/docview/29467792
Volume 14
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
journalDatabaseRights – providerCode: PRVLSH
  databaseName: SpringerLink Journals
  customDbUrl:
  mediaType: online
  eissn: 0949-877X
  dateEnd: 99991231
  omitProxy: false
  ssIdentifier: ssj0002225
  issn: 1066-8888
  databaseCode: AFBBN
  dateStart: 19970201
  isFulltext: true
  providerName: Library Specific Holdings
– providerCode: PRVAVX
  databaseName: SpringerLINK - Czech Republic Consortium
  customDbUrl:
  eissn: 0949-877X
  dateEnd: 99991231
  omitProxy: false
  ssIdentifier: ssj0002225
  issn: 1066-8888
  databaseCode: AGYKE
  dateStart: 19970101
  isFulltext: true
  titleUrlDefault: http://link.springer.com
  providerName: Springer Nature
– providerCode: PRVAVX
  databaseName: SpringerLink Journals (ICM)
  customDbUrl:
  eissn: 0949-877X
  dateEnd: 99991231
  omitProxy: true
  ssIdentifier: ssj0002225
  issn: 1066-8888
  databaseCode: U2A
  dateStart: 19970101
  isFulltext: true
  titleUrlDefault: http://www.springerlink.com/journals/
  providerName: Springer Nature
link http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwnV1Lj9MwELbKckFCvNGWx-ID4kAUlIcTJ8d9aoWqciBFvUWx40ig3aSk6YVfz4zt5sGi5XGJKstpU8-X8Yw9_j5C3irFykQIz_UKJV2mpHBTEWORo5Q8rLwqVprtcxlfrtjHdbSezTajqqVdJz7IH789V_I_VoU2sCuekv0Hy_ZfCg3wGewLV7AwXP_Kxp-1iI2Rf0C2jOtCa-dg5aAuFCyRut_QbjgtniJwYBbA3NjBwk2nRVLhojOaV7aW8NuAni-LsxNn_BxYprOrUfHL1Oad4eGo9msz7Achw3czWmvv1wW2Tdfao2N96IzcD4BDrSmFhSdXTdcVk1WIaFS8YhwnhC4uZNPGlyq7xMhS8LZ8PfG2bISqYOQ6fUP-Ymdh31BL33DwpqZjiyxEiauVEL3Us-eEJmTay0_5xWqxyLPzdfZu891FnTHcj7eiK3fI3QDmART7WAXH_dyN2a_eH7f_Zr8P7mna2elvTiKZ-5tiCy9VZdRQbkzsOlrJHpEHNs2gxwYzj8lM1U_Iw72EB7Ue_SlZjiBEBwhRgBD9FUJUQ4haCFGEEEUI0QFCz0h2cZ6dXrpWYsOVYcg7N2BRGUsF0yVjlYDU3fdL4as0TGK_gsiySsoIhoGnPExFqiAfCIugSqKokFxFZficHNRNrQ4JDRPJoigIOZOMCaSb9arC46Vk0BJX8Zx4-9HKpaWfRxWUq7wnztYDjGS1OQ5wzubkfX_LxnCv3Nb5aGKC4Y6Y693pOXmzt0kOLhT3xYpaNbttjooHkIaw23pAPMHT4MUfe7wk94bX4xU56Nqdeg1hayeONNJ-Alsim80
linkProvider Springer Nature
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Selectivity+estimators+for+multidimensional+range+queries+over+real+attributes&rft.jtitle=The+VLDB+journal&rft.au=Gunopulos%2C+Dimitrios&rft.au=Kollios%2C+George&rft.au=Tsotras%2C+J&rft.au=Domeniconi%2C+Carlotta&rft.date=2005-04-01&rft.issn=1066-8888&rft.eissn=0949-877X&rft.volume=14&rft.issue=2&rft.spage=137&rft.epage=154&rft_id=info:doi/10.1007%2Fs00778-003-0090-4&rft.externalDBID=NO_FULL_TEXT
thumbnail_l http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=1066-8888&client=summon
thumbnail_m http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=1066-8888&client=summon
thumbnail_s http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=1066-8888&client=summon