Sampling the National Deep Web

A huge portion of today’s Web consists of web pages filled with information from myriads of online databases. This part of the Web, known as the deep Web, is to date relatively unexplored and even major characteristics such as number of searchable databases on the Web or databases’ subject distribut...

Full description

Saved in:

Bibliographic Details
Published in	Database and Expert Systems Applications pp. 331 - 340
Main Author	Shestakov, Denis
Format	Book Chapter
Language	English
Published	Berlin, Heidelberg Springer Berlin Heidelberg 2011
Series	Lecture Notes in Computer Science
Subjects	deep Web DNS load balancing Host-IP clustering national web domain random sampling virtual hosting web characterization web databases
Online Access	Get full text
ISBN	9783642230875 3642230873
ISSN	0302-9743 1611-3349
DOI	10.1007/978-3-642-23088-2_24

Cover

Abstract	A huge portion of today’s Web consists of web pages filled with information from myriads of online databases. This part of the Web, known as the deep Web, is to date relatively unexplored and even major characteristics such as number of searchable databases on the Web or databases’ subject distribution are somewhat disputable. In this paper, we revisit a problem of deep Web characterization: how to estimate the total number of online databases on the Web? We propose the Host-IP clustering sampling method to address the drawbacks of existing approaches for deep Web characterization and report our findings based on the survey of Russian Web. Obtained estimates together with a proposed sampling technique could be useful for further studies to handle data in the deep Web.
AbstractList	A huge portion of today’s Web consists of web pages filled with information from myriads of online databases. This part of the Web, known as the deep Web, is to date relatively unexplored and even major characteristics such as number of searchable databases on the Web or databases’ subject distribution are somewhat disputable. In this paper, we revisit a problem of deep Web characterization: how to estimate the total number of online databases on the Web? We propose the Host-IP clustering sampling method to address the drawbacks of existing approaches for deep Web characterization and report our findings based on the survey of Russian Web. Obtained estimates together with a proposed sampling technique could be useful for further studies to handle data in the deep Web.
Author	Shestakov, Denis
Author_xml	– sequence: 1 givenname: Denis surname: Shestakov fullname: Shestakov, Denis email: denis.shestakov@aalto.fi organization: Department of Media Technology, Aalto University, Espoo, Finland
BookMark	eNpVkMtOwzAQRQ0Uibb0DxDKDxjGM05sL1F5ShUsALG0bMeGQkiiuv8v3MKG1UhnpDtz7oxN-qGPjJ0JuBAA6tIozYk3EjkSaM3Rojxgi4KpwD3DQzYVjRCcSJqjfztVT9gUCJAbJemEzXL-BABUBqfs_Nl9j926f6-2H7F6dNv10Luuuo5xrN6iP2XHyXU5Lv7mnL3e3rws7_nq6e5hebXiWQiSHJVD0kFKH12bGhUwGBBtuR4aUSflCUNLdUqNIA1KI7TBR0rgwXghNc0Z_ubmcVOeiRvrh-ErWwF214AtOpZsEbJ7W7trgH4A0HBI7g
ContentType	Book Chapter
Copyright	Springer-Verlag Berlin Heidelberg 2011
Copyright_xml	– notice: Springer-Verlag Berlin Heidelberg 2011
DOI	10.1007/978-3-642-23088-2_24
DatabaseTitleList
DeliveryMethod	fulltext_linktorsrc
Discipline	Computer Science
EISBN	9783642230882 3642230881
EISSN	1611-3349
Editor	Schewe, Klaus-Dieter Liddle, Stephen W. Hameurlain, Abdelkader Zhou, Xiaofang
Editor_xml	– sequence: 1 givenname: Abdelkader surname: Hameurlain fullname: Hameurlain, Abdelkader email: hameur@irit.fr – sequence: 2 givenname: Stephen W. surname: Liddle fullname: Liddle, Stephen W. email: liddle@byu.edu – sequence: 3 givenname: Klaus-Dieter surname: Schewe fullname: Schewe, Klaus-Dieter email: kd.schewe@scch.at – sequence: 4 givenname: Xiaofang surname: Zhou fullname: Zhou, Xiaofang email: zxf@uq.edu.au
EndPage	340
GroupedDBID	-DT -GH -~X 1SB 29L 2HA 2HV 5QI 875 AASHB ABMNI ACGFS ADCXD AEFIE ALMA_UNASSIGNED_HOLDINGS EJD F5P FEDTE HVGLF LAS LDH P2P RNI RSU SVGTG VI1 ~02
ID	FETCH-LOGICAL-s1134-27a238c44beadf67c2c901d230c615f7b32cd35ff613807820dcbe3f0b09b1483
ISBN	9783642230875 3642230873
ISSN	0302-9743
IngestDate	Wed Sep 17 03:31:05 EDT 2025
IsPeerReviewed	false
IsScholarly	false
Language	English
LinkModel	OpenURL
MergedId	FETCHMERGED-LOGICAL-s1134-27a238c44beadf67c2c901d230c615f7b32cd35ff613807820dcbe3f0b09b1483
PageCount	10
ParticipantIDs	springer_books_10_1007_978_3_642_23088_2_24
PublicationCentury	2000
PublicationDate	2011
PublicationDateYYYYMMDD	2011-01-01
PublicationDate_xml	– year: 2011 text: 2011
PublicationDecade	2010
PublicationPlace	Berlin, Heidelberg
PublicationPlace_xml	– name: Berlin, Heidelberg
PublicationSeriesTitle	Lecture Notes in Computer Science
PublicationSubtitle	22nd International Conference, DEXA 2011, Toulouse, France, August 29 - September 2, 2011. Proceedings, Part I
PublicationTitle	Database and Expert Systems Applications
PublicationYear	2011
Publisher	Springer Berlin Heidelberg
Publisher_xml	– name: Springer Berlin Heidelberg
RelatedPersons	Kleinberg, Jon M. Mattern, Friedemann Nierstrasz, Oscar Steffen, Bernhard Kittler, Josef Vardi, Moshe Y. Weikum, Gerhard Sudan, Madhu Naor, Moni Mitchell, John C. Terzopoulos, Demetri Pandu Rangan, C. Kanade, Takeo Hutchison, David Tygar, Doug
RelatedPersons_xml	– sequence: 1 givenname: David surname: Hutchison fullname: Hutchison, David organization: Lancaster University, Lancaster, UK – sequence: 2 givenname: Takeo surname: Kanade fullname: Kanade, Takeo organization: Carnegie Mellon University, Pittsburgh, USA – sequence: 3 givenname: Josef surname: Kittler fullname: Kittler, Josef organization: University of Surrey, Guildford, UK – sequence: 4 givenname: Jon M. surname: Kleinberg fullname: Kleinberg, Jon M. organization: Cornell University, Ithaca, USA – sequence: 5 givenname: Friedemann surname: Mattern fullname: Mattern, Friedemann organization: ETH Zurich, Zurich, Switzerland – sequence: 6 givenname: John C. surname: Mitchell fullname: Mitchell, John C. organization: Stanford University, Stanford, USA – sequence: 7 givenname: Moni surname: Naor fullname: Naor, Moni organization: Weizmann Institute of Science, Rehovot, Israel – sequence: 8 givenname: Oscar surname: Nierstrasz fullname: Nierstrasz, Oscar organization: University of Bern, Bern, Switzerland – sequence: 9 givenname: C. surname: Pandu Rangan fullname: Pandu Rangan, C. organization: Indian Institute of Technology, Madras, India – sequence: 10 givenname: Bernhard surname: Steffen fullname: Steffen, Bernhard organization: University of Dortmund, Dortmund, Germany – sequence: 11 givenname: Madhu surname: Sudan fullname: Sudan, Madhu organization: Massachusetts Institute of Technology, USA – sequence: 12 givenname: Demetri surname: Terzopoulos fullname: Terzopoulos, Demetri organization: University of California, Los Angeles, USA – sequence: 13 givenname: Doug surname: Tygar fullname: Tygar, Doug organization: University of California, Berkeley, USA – sequence: 14 givenname: Moshe Y. surname: Vardi fullname: Vardi, Moshe Y. organization: Rice University, Houston, USA – sequence: 15 givenname: Gerhard surname: Weikum fullname: Weikum, Gerhard organization: Max-Planck Institute of Computer Science, Saarbrücken, Germany
SSID	ssj0002792 ssj0000609487
Score	1.3899432
Snippet	A huge portion of today’s Web consists of web pages filled with information from myriads of online databases. This part of the Web, known as the deep Web, is...
SourceID	springer
SourceType	Publisher
StartPage	331
SubjectTerms	deep Web DNS load balancing Host-IP clustering national web domain random sampling virtual hosting web characterization web databases
Title	Sampling the National Deep Web
URI	http://link.springer.com/10.1007/978-3-642-23088-2_24
hasFullText	1
inHoldings	1
isFullTextHit
isPrint
link	http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwnV3LToQwFG2ccWNc-I7vsHBHMEALHRYujI9MjM7G547QUjYmM8ZBF369py1FdCYmuiGEEGjvKZd7b3tOCTkSlCuGZDYQYZkFLFE0EAmyVpnEg0RFIitMMedmlA7v2dVT8uT2km_YJbU4lh9zeSX_QRXXgKtmyf4B2fahuIBz4IsjEMbxR_D7vcxq17IUdaH_Qab8bxSLa6c_bmPLbjHOijAiEnyevFs3YxVF2uFyW-il5Q11auQqhOdKvfiPSljfozWRpyfXzazDaFKbxVy-2xjC-YluIcHQ6bqFBFdI9H_R2TKcD4aIQuvgdzwVhVtFYmI9lbKeNNX6iNTqkTbe0ZGz7I-WWp2mGR_eXbaRavIQ3gbc85j1SI9zuLHF04ur64e2lBamyFEHbZ4dak1EO3lkW6UpPa7V1IouffWiQ6ec98qZCXITd9ytkmXNRfE0SQQGXiMLarxOVpzBvcbgG-TQwecBPs_B52n4PMC3Se4vL-7OhkGz80UwjSLKgpgXCKUkYwIfepVyGUvEbSWaJRGBVlzQWJY0qSoEYwMjeVhKoWgVijATSHDpFumPJ2O1TbwsZIVgopJZLFnGo6xkRal4pDg-xVjKHeK7DuZ6LE9zJ2QNc-Q0hzlyY45cm2P3T3fvkaWvgbZP-vXrmzpADFeLwwbDTxj9OSk
linkProvider	Library Specific Holdings
openUrl	ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Abook&rft.genre=bookitem&rft.title=Database+and+Expert+Systems+Applications&rft.au=Shestakov%2C+Denis&rft.atitle=Sampling+the+National+Deep+Web&rft.series=Lecture+Notes+in+Computer+Science&rft.date=2011-01-01&rft.pub=Springer+Berlin+Heidelberg&rft.isbn=9783642230875&rft.issn=0302-9743&rft.eissn=1611-3349&rft.spage=331&rft.epage=340&rft_id=info:doi/10.1007%2F978-3-642-23088-2_24
thumbnail_l	http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=0302-9743&client=summon
thumbnail_m	http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=0302-9743&client=summon
thumbnail_s	http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=0302-9743&client=summon