Sampling the National Deep Web

A huge portion of today’s Web consists of web pages filled with information from myriads of online databases. This part of the Web, known as the deep Web, is to date relatively unexplored and even major characteristics such as number of searchable databases on the Web or databases’ subject distribut...

Full description

Saved in:
Bibliographic Details
Published inDatabase and Expert Systems Applications pp. 331 - 340
Main Author Shestakov, Denis
Format Book Chapter
LanguageEnglish
Published Berlin, Heidelberg Springer Berlin Heidelberg 2011
SeriesLecture Notes in Computer Science
Subjects
Online AccessGet full text
ISBN9783642230875
3642230873
ISSN0302-9743
1611-3349
DOI10.1007/978-3-642-23088-2_24

Cover

Abstract A huge portion of today’s Web consists of web pages filled with information from myriads of online databases. This part of the Web, known as the deep Web, is to date relatively unexplored and even major characteristics such as number of searchable databases on the Web or databases’ subject distribution are somewhat disputable. In this paper, we revisit a problem of deep Web characterization: how to estimate the total number of online databases on the Web? We propose the Host-IP clustering sampling method to address the drawbacks of existing approaches for deep Web characterization and report our findings based on the survey of Russian Web. Obtained estimates together with a proposed sampling technique could be useful for further studies to handle data in the deep Web.
AbstractList A huge portion of today’s Web consists of web pages filled with information from myriads of online databases. This part of the Web, known as the deep Web, is to date relatively unexplored and even major characteristics such as number of searchable databases on the Web or databases’ subject distribution are somewhat disputable. In this paper, we revisit a problem of deep Web characterization: how to estimate the total number of online databases on the Web? We propose the Host-IP clustering sampling method to address the drawbacks of existing approaches for deep Web characterization and report our findings based on the survey of Russian Web. Obtained estimates together with a proposed sampling technique could be useful for further studies to handle data in the deep Web.
Author Shestakov, Denis
Author_xml – sequence: 1
  givenname: Denis
  surname: Shestakov
  fullname: Shestakov, Denis
  email: denis.shestakov@aalto.fi
  organization: Department of Media Technology, Aalto University, Espoo, Finland
BookMark eNpVkMtOwzAQRQ0Uibb0DxDKDxjGM05sL1F5ShUsALG0bMeGQkiiuv8v3MKG1UhnpDtz7oxN-qGPjJ0JuBAA6tIozYk3EjkSaM3Rojxgi4KpwD3DQzYVjRCcSJqjfztVT9gUCJAbJemEzXL-BABUBqfs_Nl9j926f6-2H7F6dNv10Luuuo5xrN6iP2XHyXU5Lv7mnL3e3rws7_nq6e5hebXiWQiSHJVD0kFKH12bGhUwGBBtuR4aUSflCUNLdUqNIA1KI7TBR0rgwXghNc0Z_ubmcVOeiRvrh-ErWwF214AtOpZsEbJ7W7trgH4A0HBI7g
ContentType Book Chapter
Copyright Springer-Verlag Berlin Heidelberg 2011
Copyright_xml – notice: Springer-Verlag Berlin Heidelberg 2011
DOI 10.1007/978-3-642-23088-2_24
DatabaseTitleList
DeliveryMethod fulltext_linktorsrc
Discipline Computer Science
EISBN 9783642230882
3642230881
EISSN 1611-3349
Editor Schewe, Klaus-Dieter
Liddle, Stephen W.
Hameurlain, Abdelkader
Zhou, Xiaofang
Editor_xml – sequence: 1
  givenname: Abdelkader
  surname: Hameurlain
  fullname: Hameurlain, Abdelkader
  email: hameur@irit.fr
– sequence: 2
  givenname: Stephen W.
  surname: Liddle
  fullname: Liddle, Stephen W.
  email: liddle@byu.edu
– sequence: 3
  givenname: Klaus-Dieter
  surname: Schewe
  fullname: Schewe, Klaus-Dieter
  email: kd.schewe@scch.at
– sequence: 4
  givenname: Xiaofang
  surname: Zhou
  fullname: Zhou, Xiaofang
  email: zxf@uq.edu.au
EndPage 340
GroupedDBID -DT
-GH
-~X
1SB
29L
2HA
2HV
5QI
875
AASHB
ABMNI
ACGFS
ADCXD
AEFIE
ALMA_UNASSIGNED_HOLDINGS
EJD
F5P
FEDTE
HVGLF
LAS
LDH
P2P
RNI
RSU
SVGTG
VI1
~02
ID FETCH-LOGICAL-s1134-27a238c44beadf67c2c901d230c615f7b32cd35ff613807820dcbe3f0b09b1483
ISBN 9783642230875
3642230873
ISSN 0302-9743
IngestDate Wed Sep 17 03:31:05 EDT 2025
IsPeerReviewed false
IsScholarly false
Language English
LinkModel OpenURL
MergedId FETCHMERGED-LOGICAL-s1134-27a238c44beadf67c2c901d230c615f7b32cd35ff613807820dcbe3f0b09b1483
PageCount 10
ParticipantIDs springer_books_10_1007_978_3_642_23088_2_24
PublicationCentury 2000
PublicationDate 2011
PublicationDateYYYYMMDD 2011-01-01
PublicationDate_xml – year: 2011
  text: 2011
PublicationDecade 2010
PublicationPlace Berlin, Heidelberg
PublicationPlace_xml – name: Berlin, Heidelberg
PublicationSeriesTitle Lecture Notes in Computer Science
PublicationSubtitle 22nd International Conference, DEXA 2011, Toulouse, France, August 29 - September 2, 2011. Proceedings, Part I
PublicationTitle Database and Expert Systems Applications
PublicationYear 2011
Publisher Springer Berlin Heidelberg
Publisher_xml – name: Springer Berlin Heidelberg
RelatedPersons Kleinberg, Jon M.
Mattern, Friedemann
Nierstrasz, Oscar
Steffen, Bernhard
Kittler, Josef
Vardi, Moshe Y.
Weikum, Gerhard
Sudan, Madhu
Naor, Moni
Mitchell, John C.
Terzopoulos, Demetri
Pandu Rangan, C.
Kanade, Takeo
Hutchison, David
Tygar, Doug
RelatedPersons_xml – sequence: 1
  givenname: David
  surname: Hutchison
  fullname: Hutchison, David
  organization: Lancaster University, Lancaster, UK
– sequence: 2
  givenname: Takeo
  surname: Kanade
  fullname: Kanade, Takeo
  organization: Carnegie Mellon University, Pittsburgh, USA
– sequence: 3
  givenname: Josef
  surname: Kittler
  fullname: Kittler, Josef
  organization: University of Surrey, Guildford, UK
– sequence: 4
  givenname: Jon M.
  surname: Kleinberg
  fullname: Kleinberg, Jon M.
  organization: Cornell University, Ithaca, USA
– sequence: 5
  givenname: Friedemann
  surname: Mattern
  fullname: Mattern, Friedemann
  organization: ETH Zurich, Zurich, Switzerland
– sequence: 6
  givenname: John C.
  surname: Mitchell
  fullname: Mitchell, John C.
  organization: Stanford University, Stanford, USA
– sequence: 7
  givenname: Moni
  surname: Naor
  fullname: Naor, Moni
  organization: Weizmann Institute of Science, Rehovot, Israel
– sequence: 8
  givenname: Oscar
  surname: Nierstrasz
  fullname: Nierstrasz, Oscar
  organization: University of Bern, Bern, Switzerland
– sequence: 9
  givenname: C.
  surname: Pandu Rangan
  fullname: Pandu Rangan, C.
  organization: Indian Institute of Technology, Madras, India
– sequence: 10
  givenname: Bernhard
  surname: Steffen
  fullname: Steffen, Bernhard
  organization: University of Dortmund, Dortmund, Germany
– sequence: 11
  givenname: Madhu
  surname: Sudan
  fullname: Sudan, Madhu
  organization: Massachusetts Institute of Technology, USA
– sequence: 12
  givenname: Demetri
  surname: Terzopoulos
  fullname: Terzopoulos, Demetri
  organization: University of California, Los Angeles, USA
– sequence: 13
  givenname: Doug
  surname: Tygar
  fullname: Tygar, Doug
  organization: University of California, Berkeley, USA
– sequence: 14
  givenname: Moshe Y.
  surname: Vardi
  fullname: Vardi, Moshe Y.
  organization: Rice University, Houston, USA
– sequence: 15
  givenname: Gerhard
  surname: Weikum
  fullname: Weikum, Gerhard
  organization: Max-Planck Institute of Computer Science, Saarbrücken, Germany
SSID ssj0002792
ssj0000609487
Score 1.3899432
Snippet A huge portion of today’s Web consists of web pages filled with information from myriads of online databases. This part of the Web, known as the deep Web, is...
SourceID springer
SourceType Publisher
StartPage 331
SubjectTerms deep Web
DNS load balancing
Host-IP clustering
national web domain
random sampling
virtual hosting
web characterization
web databases
Title Sampling the National Deep Web
URI http://link.springer.com/10.1007/978-3-642-23088-2_24
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
link http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwnV3LToQwFG2ccWNc-I7vsHBHMEALHRYujI9MjM7G547QUjYmM8ZBF369py1FdCYmuiGEEGjvKZd7b3tOCTkSlCuGZDYQYZkFLFE0EAmyVpnEg0RFIitMMedmlA7v2dVT8uT2km_YJbU4lh9zeSX_QRXXgKtmyf4B2fahuIBz4IsjEMbxR_D7vcxq17IUdaH_Qab8bxSLa6c_bmPLbjHOijAiEnyevFs3YxVF2uFyW-il5Q11auQqhOdKvfiPSljfozWRpyfXzazDaFKbxVy-2xjC-YluIcHQ6bqFBFdI9H_R2TKcD4aIQuvgdzwVhVtFYmI9lbKeNNX6iNTqkTbe0ZGz7I-WWp2mGR_eXbaRavIQ3gbc85j1SI9zuLHF04ur64e2lBamyFEHbZ4dak1EO3lkW6UpPa7V1IouffWiQ6ec98qZCXITd9ytkmXNRfE0SQQGXiMLarxOVpzBvcbgG-TQwecBPs_B52n4PMC3Se4vL-7OhkGz80UwjSLKgpgXCKUkYwIfepVyGUvEbSWaJRGBVlzQWJY0qSoEYwMjeVhKoWgVijATSHDpFumPJ2O1TbwsZIVgopJZLFnGo6xkRal4pDg-xVjKHeK7DuZ6LE9zJ2QNc-Q0hzlyY45cm2P3T3fvkaWvgbZP-vXrmzpADFeLwwbDTxj9OSk
linkProvider Library Specific Holdings
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Abook&rft.genre=bookitem&rft.title=Database+and+Expert+Systems+Applications&rft.au=Shestakov%2C+Denis&rft.atitle=Sampling+the+National+Deep+Web&rft.series=Lecture+Notes+in+Computer+Science&rft.date=2011-01-01&rft.pub=Springer+Berlin+Heidelberg&rft.isbn=9783642230875&rft.issn=0302-9743&rft.eissn=1611-3349&rft.spage=331&rft.epage=340&rft_id=info:doi/10.1007%2F978-3-642-23088-2_24
thumbnail_l http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=0302-9743&client=summon
thumbnail_m http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=0302-9743&client=summon
thumbnail_s http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=0302-9743&client=summon