A Semantic Searching Scheme in Heterogeneous Unstructured P2P Networks
Semantic-based searching in peer-to-peer (P2P) networks has drawn significant attention recently. A number of semantic searching schemes, such as GES proposed by Zhu Y et al., employ search models in Information Retrieval (IR). All these IR-based schemes use one vector to summarize semantic contents...
        Saved in:
      
    
          | Published in | Journal of computer science and technology Vol. 26; no. 6; pp. 925 - 941 | 
|---|---|
| Main Author | |
| Format | Journal Article | 
| Language | English | 
| Published | 
        Boston
          Springer US
    
        01.11.2011
     Springer Nature B.V Shanghai Hewlett-Packard Co., Ltd., No. 889 Yishan Road Xuhui, Shanghai 201206, China%Department of Computer Science and Mathematics, University of North Carolina at Pembroke, Pembroke, U.S.A.%Department of Computer and Information Sciences, Temple University, Philadelphia, U.S.A  | 
| Subjects | |
| Online Access | Get full text | 
| ISSN | 1000-9000 1860-4749  | 
| DOI | 10.1007/s11390-011-1190-z | 
Cover
| Summary: | Semantic-based searching in peer-to-peer (P2P) networks has drawn significant attention recently. A number of semantic searching schemes, such as GES proposed by Zhu Y et al., employ search models in Information Retrieval (IR). All these IR-based schemes use one vector to summarize semantic contents of all documents on a single node. For example, GES derives a node vector based on the IR model: VSM (Vector Space Model). A topology adaptation algorithm and a search protocol are then designed according to the similarity between node vectors of different nodes. Although the single semantic vector is suitable when the distribution of documents in each node is uniform, it may not be efficient when the distribution is diverse. When there are many categories of documents at each node, the node vector representation may be inaccurate. We extend the idea of GES and present a new class-based semantic searching scheme (CSS) specifically designed for unstructured P2P networks with heterogeneous single-node document collection. It makes use of a state-of-the-art data clustering algorithm, online spherical k-means clustering (OSKM), to cluster all documents on a node into several classes. Each class can be viewed as a virtual node. Virtual nodes are connected through virtual links. As a result, the class vector replaces the node vector and plays an important role in the class-based topology adaptation and search process. This makes CSS very efficient. Our simulation using the IR benchmark TREC collection demonstrates that CSS outperforms GES in terms of higher recall, higher precision, and lower search cost. | 
|---|---|
| Bibliography: | Semantic-based searching in peer-to-peer (P2P) networks has drawn significant attention recently. A number of semantic searching schemes, such as GES proposed by Zhu Y et al., employ search models in Information Retrieval (IR). All these IR-based schemes use one vector to summarize semantic contents of all documents on a single node. For example, GES derives a node vector based on the IR model: VSM (Vector Space Model). A topology adaptation algorithm and a search protocol are then designed according to the similarity between node vectors of different nodes. Although the single semantic vector is suitable when the distribution of documents in each node is uniform, it may not be efficient when the distribution is diverse. When there are many categories of documents at each node, the node vector representation may be inaccurate. We extend the idea of GES and present a new class-based semantic searching scheme (CSS) specifically designed for unstructured P2P networks with heterogeneous single-node document collection. It makes use of a state-of-the-art data clustering algorithm, online spherical k-means clustering (OSKM), to cluster all documents on a node into several classes. Each class can be viewed as a virtual node. Virtual nodes are connected through virtual links. As a result, the class vector replaces the node vector and plays an important role in the class-based topology adaptation and search process. This makes CSS very efficient. Our simulation using the IR benchmark TREC collection demonstrates that CSS outperforms GES in terms of higher recall, higher precision, and lower search cost. 11-2296/TP Jun-Cheng Huang Xiu-Qi Li Jie WU 1Shanghai Hewlett-Packard Co., Ltd., No. 889 Yishan Road Xuhui, Shanghai 201206, China 2Department of Computer Science and Mathematics, University of North Carolina at Pembroke, Pembroke, U.S.A 3Department of Computer and Information Sciences, Temple University, Philadelphia, U.S.A. class-based search, GES, semantic clustering, topology adaptation, P2P networks ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14 ObjectType-Article-2 ObjectType-Feature-1 content type line 23  | 
| ISSN: | 1000-9000 1860-4749  | 
| DOI: | 10.1007/s11390-011-1190-z |