Using standoff properties for marking-up historical documents in the humanities

Markup in the form of tags is often embedded into documents to describe formatting structures and other features, as in HTML on the Web. But in the humanities, the use of embedded markup for the transcription of historical documents leads to problems in the representation of overlapping features, an...

Full description

Saved in:
Bibliographic Details
Published inInformation technology (Munich, Germany) Vol. 58; no. 2; pp. 63 - 69
Main Author Schmidt, Desmond Allan
Format Journal Article
LanguageEnglish
Published De Gruyter Oldenbourg 01.03.2016
Subjects
Online AccessGet full text
ISSN1611-2776
2196-7032
2196-7032
DOI10.1515/itit-2015-0030

Cover

More Information
Summary:Markup in the form of tags is often embedded into documents to describe formatting structures and other features, as in HTML on the Web. But in the humanities, the use of embedded markup for the transcription of historical documents leads to problems in the representation of overlapping features, and subjective variation in the use of different markup tags for the same features compromises interoperability of the transcriptions. “Standoff” techniques, in which the markup and the text it describes are stored separately, can help alleviate these problems. “Standoff properties” is a technique for recording textual properties that do not conform to a context-free grammar, and can freely overlap. This allows a divide-and-conquer approach to markup, whereby sets of markup properties can record different aspects of a text, which can then be recombined as needed. Despite these advantages, standoff techniques are usually considered impractical when both the underlying text and its markup are subject to change. To circumvent this problem, this paper describes a practical algorithm for updating a set of standoff markup properties separately from the text.
ISSN:1611-2776
2196-7032
2196-7032
DOI:10.1515/itit-2015-0030