A cross-lingual approach to automatic ICD-10 coding of death certificates by exploring machine translation

[Display omitted] •Clinical diagnoses are usually sparse and biased, implying reduced interoperability.•This paper proposes a cross-lingual approach based on exploiting foreign language data.•Translations provide lexical diversity, but subject to mistakes and nuance losses.•Large amounts of translat...

Full description

Saved in:
Bibliographic Details
Published inJournal of biomedical informatics Vol. 94; p. 103207
Main Authors Almagro, Mario, Martínez, Raquel, Montalvo, Soto, Fresno, Víctor
Format Journal Article
LanguageEnglish
Published United States Elsevier Inc 01.06.2019
Subjects
Online AccessGet full text
ISSN1532-0464
1532-0480
1532-0480
DOI10.1016/j.jbi.2019.103207

Cover

More Information
Summary:[Display omitted] •Clinical diagnoses are usually sparse and biased, implying reduced interoperability.•This paper proposes a cross-lingual approach based on exploiting foreign language data.•Translations provide lexical diversity, but subject to mistakes and nuance losses.•Large amounts of translated data can be used for improving Machine Learninig methods. Automatic ICD-10 coding is an unresolved challenge in terms of Machine Learning tasks. Despite hospitals generating an enormous amount of clinical documents, data is considerably sparse, associated with a very skewed and unbalanced code distribution, what entails reduced interoperability. In addition, in some languages the availability of coded documents is very limited. This paper proposes a cross-lingual approach based on Machine Translation methods to code death certificates with ICD-10 using supervised learning. The aim of this approach is to increase the availability of coded documents by combining collections of different languages, which may also contribute to reduce their possible bias in the ICD distribution, i.e. to avoid the promotion of a subset of codes due to service or environmental factors. A significant improvement in system performance is achieved for those labels with few occurrences.
Bibliography:ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 23
ISSN:1532-0464
1532-0480
1532-0480
DOI:10.1016/j.jbi.2019.103207