An Ontological Framework for Information Extraction From Diverse Scientific Sources

Automatic information extraction from online published scientific documents is useful in various applications such as tagging, web indexing and search engine optimization. As a result, automatic information extraction has become among the hottest areas of research in text mining. Although various in...

Full description

Saved in:
Bibliographic Details
Published inIEEE access Vol. 9; pp. 42111 - 42124
Main Authors Zaman, Gohar, Mahdin, Hairulnizam, Hussain, Khalid, Atta-Ur-Rahman, Abawajy, Jemal, Mostafa, Salama A.
Format Journal Article
LanguageEnglish
Published Piscataway IEEE 2021
The Institute of Electrical and Electronics Engineers, Inc. (IEEE)
Subjects
Online AccessGet full text
ISSN2169-3536
2169-3536
DOI10.1109/ACCESS.2021.3063181

Cover

More Information
Summary:Automatic information extraction from online published scientific documents is useful in various applications such as tagging, web indexing and search engine optimization. As a result, automatic information extraction has become among the hottest areas of research in text mining. Although various information extraction techniques have been proposed in the literature, their efficiency demands domain specific documents with static and well-defined format. Furthermore, their accuracy is challenged with a slight modification in the format. To overcome these issues, a novel ontological framework for information extraction (OFIE) using fuzzy rule-base (FRB) and word sense disambiguation (WSD) is proposed. The proposed approach is validated with a significantly wider document domains sourced from well-known publishing services such as IEEE, ACM, Elsevier, and Springer. We have also compared the proposed information extraction approach against state-of-the-art techniques. The results of the experiment show that the proposed approach is less sensitive to changes in the document format and has a significantly better average accuracy of 89.14% and F-score as 89%.
Bibliography:ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 14
ISSN:2169-3536
2169-3536
DOI:10.1109/ACCESS.2021.3063181