Sampling the National Deep Web
A huge portion of today’s Web consists of web pages filled with information from myriads of online databases. This part of the Web, known as the deep Web, is to date relatively unexplored and even major characteristics such as number of searchable databases on the Web or databases’ subject distribut...
Saved in:
Published in | Database and Expert Systems Applications pp. 331 - 340 |
---|---|
Main Author | |
Format | Book Chapter |
Language | English |
Published |
Berlin, Heidelberg
Springer Berlin Heidelberg
2011
|
Series | Lecture Notes in Computer Science |
Subjects | |
Online Access | Get full text |
ISBN | 9783642230875 3642230873 |
ISSN | 0302-9743 1611-3349 |
DOI | 10.1007/978-3-642-23088-2_24 |
Cover
Abstract | A huge portion of today’s Web consists of web pages filled with information from myriads of online databases. This part of the Web, known as the deep Web, is to date relatively unexplored and even major characteristics such as number of searchable databases on the Web or databases’ subject distribution are somewhat disputable. In this paper, we revisit a problem of deep Web characterization: how to estimate the total number of online databases on the Web? We propose the Host-IP clustering sampling method to address the drawbacks of existing approaches for deep Web characterization and report our findings based on the survey of Russian Web. Obtained estimates together with a proposed sampling technique could be useful for further studies to handle data in the deep Web. |
---|---|
AbstractList | A huge portion of today’s Web consists of web pages filled with information from myriads of online databases. This part of the Web, known as the deep Web, is to date relatively unexplored and even major characteristics such as number of searchable databases on the Web or databases’ subject distribution are somewhat disputable. In this paper, we revisit a problem of deep Web characterization: how to estimate the total number of online databases on the Web? We propose the Host-IP clustering sampling method to address the drawbacks of existing approaches for deep Web characterization and report our findings based on the survey of Russian Web. Obtained estimates together with a proposed sampling technique could be useful for further studies to handle data in the deep Web. |
Author | Shestakov, Denis |
Author_xml | – sequence: 1 givenname: Denis surname: Shestakov fullname: Shestakov, Denis email: denis.shestakov@aalto.fi organization: Department of Media Technology, Aalto University, Espoo, Finland |
BookMark | eNpVkMtOwzAQRQ0Uibb0DxDKDxjGM05sL1F5ShUsALG0bMeGQkiiuv8v3MKG1UhnpDtz7oxN-qGPjJ0JuBAA6tIozYk3EjkSaM3Rojxgi4KpwD3DQzYVjRCcSJqjfztVT9gUCJAbJemEzXL-BABUBqfs_Nl9j926f6-2H7F6dNv10Luuuo5xrN6iP2XHyXU5Lv7mnL3e3rws7_nq6e5hebXiWQiSHJVD0kFKH12bGhUwGBBtuR4aUSflCUNLdUqNIA1KI7TBR0rgwXghNc0Z_ubmcVOeiRvrh-ErWwF214AtOpZsEbJ7W7trgH4A0HBI7g |
ContentType | Book Chapter |
Copyright | Springer-Verlag Berlin Heidelberg 2011 |
Copyright_xml | – notice: Springer-Verlag Berlin Heidelberg 2011 |
DOI | 10.1007/978-3-642-23088-2_24 |
DatabaseTitleList | |
DeliveryMethod | fulltext_linktorsrc |
Discipline | Computer Science |
EISBN | 9783642230882 3642230881 |
EISSN | 1611-3349 |
Editor | Schewe, Klaus-Dieter Liddle, Stephen W. Hameurlain, Abdelkader Zhou, Xiaofang |
Editor_xml | – sequence: 1 givenname: Abdelkader surname: Hameurlain fullname: Hameurlain, Abdelkader email: hameur@irit.fr – sequence: 2 givenname: Stephen W. surname: Liddle fullname: Liddle, Stephen W. email: liddle@byu.edu – sequence: 3 givenname: Klaus-Dieter surname: Schewe fullname: Schewe, Klaus-Dieter email: kd.schewe@scch.at – sequence: 4 givenname: Xiaofang surname: Zhou fullname: Zhou, Xiaofang email: zxf@uq.edu.au |
EndPage | 340 |
GroupedDBID | -DT -GH -~X 1SB 29L 2HA 2HV 5QI 875 AASHB ABMNI ACGFS ADCXD AEFIE ALMA_UNASSIGNED_HOLDINGS EJD F5P FEDTE HVGLF LAS LDH P2P RNI RSU SVGTG VI1 ~02 |
ID | FETCH-LOGICAL-s1134-27a238c44beadf67c2c901d230c615f7b32cd35ff613807820dcbe3f0b09b1483 |
ISBN | 9783642230875 3642230873 |
ISSN | 0302-9743 |
IngestDate | Wed Sep 17 03:31:05 EDT 2025 |
IsPeerReviewed | false |
IsScholarly | false |
Language | English |
LinkModel | OpenURL |
MergedId | FETCHMERGED-LOGICAL-s1134-27a238c44beadf67c2c901d230c615f7b32cd35ff613807820dcbe3f0b09b1483 |
PageCount | 10 |
ParticipantIDs | springer_books_10_1007_978_3_642_23088_2_24 |
PublicationCentury | 2000 |
PublicationDate | 2011 |
PublicationDateYYYYMMDD | 2011-01-01 |
PublicationDate_xml | – year: 2011 text: 2011 |
PublicationDecade | 2010 |
PublicationPlace | Berlin, Heidelberg |
PublicationPlace_xml | – name: Berlin, Heidelberg |
PublicationSeriesTitle | Lecture Notes in Computer Science |
PublicationSubtitle | 22nd International Conference, DEXA 2011, Toulouse, France, August 29 - September 2, 2011. Proceedings, Part I |
PublicationTitle | Database and Expert Systems Applications |
PublicationYear | 2011 |
Publisher | Springer Berlin Heidelberg |
Publisher_xml | – name: Springer Berlin Heidelberg |
RelatedPersons | Kleinberg, Jon M. Mattern, Friedemann Nierstrasz, Oscar Steffen, Bernhard Kittler, Josef Vardi, Moshe Y. Weikum, Gerhard Sudan, Madhu Naor, Moni Mitchell, John C. Terzopoulos, Demetri Pandu Rangan, C. Kanade, Takeo Hutchison, David Tygar, Doug |
RelatedPersons_xml | – sequence: 1 givenname: David surname: Hutchison fullname: Hutchison, David organization: Lancaster University, Lancaster, UK – sequence: 2 givenname: Takeo surname: Kanade fullname: Kanade, Takeo organization: Carnegie Mellon University, Pittsburgh, USA – sequence: 3 givenname: Josef surname: Kittler fullname: Kittler, Josef organization: University of Surrey, Guildford, UK – sequence: 4 givenname: Jon M. surname: Kleinberg fullname: Kleinberg, Jon M. organization: Cornell University, Ithaca, USA – sequence: 5 givenname: Friedemann surname: Mattern fullname: Mattern, Friedemann organization: ETH Zurich, Zurich, Switzerland – sequence: 6 givenname: John C. surname: Mitchell fullname: Mitchell, John C. organization: Stanford University, Stanford, USA – sequence: 7 givenname: Moni surname: Naor fullname: Naor, Moni organization: Weizmann Institute of Science, Rehovot, Israel – sequence: 8 givenname: Oscar surname: Nierstrasz fullname: Nierstrasz, Oscar organization: University of Bern, Bern, Switzerland – sequence: 9 givenname: C. surname: Pandu Rangan fullname: Pandu Rangan, C. organization: Indian Institute of Technology, Madras, India – sequence: 10 givenname: Bernhard surname: Steffen fullname: Steffen, Bernhard organization: University of Dortmund, Dortmund, Germany – sequence: 11 givenname: Madhu surname: Sudan fullname: Sudan, Madhu organization: Massachusetts Institute of Technology, USA – sequence: 12 givenname: Demetri surname: Terzopoulos fullname: Terzopoulos, Demetri organization: University of California, Los Angeles, USA – sequence: 13 givenname: Doug surname: Tygar fullname: Tygar, Doug organization: University of California, Berkeley, USA – sequence: 14 givenname: Moshe Y. surname: Vardi fullname: Vardi, Moshe Y. organization: Rice University, Houston, USA – sequence: 15 givenname: Gerhard surname: Weikum fullname: Weikum, Gerhard organization: Max-Planck Institute of Computer Science, Saarbrücken, Germany |
SSID | ssj0002792 ssj0000609487 |
Score | 1.3899432 |
Snippet | A huge portion of today’s Web consists of web pages filled with information from myriads of online databases. This part of the Web, known as the deep Web, is... |
SourceID | springer |
SourceType | Publisher |
StartPage | 331 |
SubjectTerms | deep Web DNS load balancing Host-IP clustering national web domain random sampling virtual hosting web characterization web databases |
Title | Sampling the National Deep Web |
URI | http://link.springer.com/10.1007/978-3-642-23088-2_24 |
hasFullText | 1 |
inHoldings | 1 |
isFullTextHit | |
isPrint | |
link | http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwnV3LToQwFG2ccWNc-I7vsHBHMEALHRYujI9MjM7G547QUjYmM8ZBF369py1FdCYmuiGEEGjvKZd7b3tOCTkSlCuGZDYQYZkFLFE0EAmyVpnEg0RFIitMMedmlA7v2dVT8uT2km_YJbU4lh9zeSX_QRXXgKtmyf4B2fahuIBz4IsjEMbxR_D7vcxq17IUdaH_Qab8bxSLa6c_bmPLbjHOijAiEnyevFs3YxVF2uFyW-il5Q11auQqhOdKvfiPSljfozWRpyfXzazDaFKbxVy-2xjC-YluIcHQ6bqFBFdI9H_R2TKcD4aIQuvgdzwVhVtFYmI9lbKeNNX6iNTqkTbe0ZGz7I-WWp2mGR_eXbaRavIQ3gbc85j1SI9zuLHF04ur64e2lBamyFEHbZ4dak1EO3lkW6UpPa7V1IouffWiQ6ec98qZCXITd9ytkmXNRfE0SQQGXiMLarxOVpzBvcbgG-TQwecBPs_B52n4PMC3Se4vL-7OhkGz80UwjSLKgpgXCKUkYwIfepVyGUvEbSWaJRGBVlzQWJY0qSoEYwMjeVhKoWgVijATSHDpFumPJ2O1TbwsZIVgopJZLFnGo6xkRal4pDg-xVjKHeK7DuZ6LE9zJ2QNc-Q0hzlyY45cm2P3T3fvkaWvgbZP-vXrmzpADFeLwwbDTxj9OSk |
linkProvider | Library Specific Holdings |
openUrl | ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Abook&rft.genre=bookitem&rft.title=Database+and+Expert+Systems+Applications&rft.au=Shestakov%2C+Denis&rft.atitle=Sampling+the+National+Deep+Web&rft.series=Lecture+Notes+in+Computer+Science&rft.date=2011-01-01&rft.pub=Springer+Berlin+Heidelberg&rft.isbn=9783642230875&rft.issn=0302-9743&rft.eissn=1611-3349&rft.spage=331&rft.epage=340&rft_id=info:doi/10.1007%2F978-3-642-23088-2_24 |
thumbnail_l | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=0302-9743&client=summon |
thumbnail_m | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=0302-9743&client=summon |
thumbnail_s | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=0302-9743&client=summon |