Using standoff properties for marking-up historical documents in the humanities
Markup in the form of tags is often embedded into documents to describe formatting structures and other features, as in HTML on the Web. But in the humanities, the use of embedded markup for the transcription of historical documents leads to problems in the representation of overlapping features, an...
Saved in:
Published in | Information technology (Munich, Germany) Vol. 58; no. 2; pp. 63 - 69 |
---|---|
Main Author | |
Format | Journal Article |
Language | English |
Published |
De Gruyter Oldenbourg
01.03.2016
|
Subjects | |
Online Access | Get full text |
ISSN | 1611-2776 2196-7032 2196-7032 |
DOI | 10.1515/itit-2015-0030 |
Cover
Summary: | Markup in the form of tags is often embedded into documents to describe formatting structures and other features, as in HTML on the
Web. But in the humanities, the use of embedded markup for the transcription of historical documents leads to problems in the
representation of overlapping features, and subjective variation in the use of different markup tags for the same features compromises
interoperability of the transcriptions. “Standoff” techniques, in which the markup and the text it describes are stored separately, can
help alleviate these problems. “Standoff properties” is a technique for recording textual properties that do not conform to
a context-free grammar, and can freely overlap. This allows a divide-and-conquer approach to markup, whereby sets of markup properties
can record different aspects of a text, which can then be recombined as needed. Despite these advantages, standoff techniques are
usually considered impractical when both the underlying text and its markup are subject to change. To circumvent this problem, this
paper describes a practical algorithm for updating a set of standoff markup properties separately from the text. |
---|---|
ISSN: | 1611-2776 2196-7032 2196-7032 |
DOI: | 10.1515/itit-2015-0030 |