A MapReduce-Based Approach for Prefix-Based Labeling of Large XML Data
A massive amount of XML (Extensible Markup Language) data is available on the web, which can be viewed as tree data. One of the fundamental building blocks of information retrieval from tree data is answering structural queries. Various labeling schemes have been suggested for rapid structural query...
        Saved in:
      
    
          | Published in | Semantic Technology Vol. 10055; pp. 83 - 98 | 
|---|---|
| Main Authors | , , | 
| Format | Book Chapter | 
| Language | English | 
| Published | 
        Switzerland
          Springer International Publishing AG
    
        2016
     Springer International Publishing  | 
| Series | Lecture Notes in Computer Science | 
| Subjects | |
| Online Access | Get full text | 
| ISBN | 9783319501116 3319501119  | 
| ISSN | 0302-9743 1611-3349  | 
| DOI | 10.1007/978-3-319-50112-3_7 | 
Cover
| Summary: | A massive amount of XML (Extensible Markup Language) data is available on the web, which can be viewed as tree data. One of the fundamental building blocks of information retrieval from tree data is answering structural queries. Various labeling schemes have been suggested for rapid structural query processing. We focus on the prefix-based labeling scheme that labels each node with a concatenation of its parent’s label and its child order. This scheme has been adapted in RDF (Resource Description Framework) data management systems that index RDF data in tree by grouping subjects. Recently, a MapReduce-based algorithm for the prefix-based labeling scheme was suggested. We observe that this algorithm fails to keep label size minimized, which makes the prefix-based labeling scheme difficult for massive real-world XML datasets. To address this issue, we propose a MapReduce-based algorithm for prefix-based labeling of XML data that reduces label size by adjusting the order of label assignments based on the structural information of the XML data. Experiments with real-world XML datasets show that the proposed approach is more effective than previous works. | 
|---|---|
| ISBN: | 9783319501116 3319501119  | 
| ISSN: | 0302-9743 1611-3349  | 
| DOI: | 10.1007/978-3-319-50112-3_7 |