A cooperative crowdsourcing framework for knowledge extraction in digital humanities – cases on Tang poetry
PurposeThe purpose of this paper is to propose a knowledge extraction framework to extract knowledge, including entities and relationships between them, from unstructured texts in digital humanities (DH).Design/methodology/approachThe proposed cooperative crowdsourcing framework (CCF) uses both huma...
        Saved in:
      
    
          | Published in | Aslib journal of information management Vol. 72; no. 2; pp. 243 - 261 | 
|---|---|
| Main Authors | , , , | 
| Format | Journal Article | 
| Language | English | 
| Published | 
        Bradford
          Emerald Publishing Limited
    
        20.04.2020
     Emerald Group Publishing Limited  | 
| Subjects | |
| Online Access | Get full text | 
| ISSN | 2050-3806 1758-3748  | 
| DOI | 10.1108/AJIM-07-2019-0192 | 
Cover
| Summary: | PurposeThe purpose of this paper is to propose a knowledge extraction framework to extract knowledge, including entities and relationships between them, from unstructured texts in digital humanities (DH).Design/methodology/approachThe proposed cooperative crowdsourcing framework (CCF) uses both human–computer cooperation and crowdsourcing to achieve high-quality and scalable knowledge extraction. CCF integrates active learning with a novel category-based crowdsourcing mechanism to facilitate domain experts labeling and verifying extracted knowledge.FindingsThe case study shows that CCF can effectively and efficiently extract knowledge from multi-sourced heterogeneous data in the field of Tang poetry. Specifically, CCF achieves higher accuracy of knowledge extraction than the state-of-the-art methods, the contribution of feedbacks to the training model can be maximized by the active learning mechanism and the proposed category-based crowdsourcing mechanism can scale up the effective human–computer collaboration by considering the specialization of workers in different categories of tasks.Research limitations/implicationsThis research proposes CCF to enable high-quality and scalable knowledge extraction in the field of Tang poetry. CCF can be generalized to other fields of DH by introducing domain knowledge and experts.Practical implicationsThe extracted knowledge is machine-understandable and can support the research of Tang poetry and knowledge-driven intelligent applications in DH.Originality/valueCCF is the first human-in-the-loop knowledge extraction framework that integrates active learning and crowdsourcing mechanisms; he human–computer cooperation method uses the feedback of domain experts through the active learning mechanism; the category-based crowdsourcing mechanism considers the matching of categories of DH data and especially of domain experts. | 
|---|---|
| Bibliography: | ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14  | 
| ISSN: | 2050-3806 1758-3748  | 
| DOI: | 10.1108/AJIM-07-2019-0192 |