A cross-lingual approach to automatic ICD-10 coding of death certificates by exploring machine translation
[Display omitted] •Clinical diagnoses are usually sparse and biased, implying reduced interoperability.•This paper proposes a cross-lingual approach based on exploiting foreign language data.•Translations provide lexical diversity, but subject to mistakes and nuance losses.•Large amounts of translat...
Saved in:
| Published in | Journal of biomedical informatics Vol. 94; p. 103207 |
|---|---|
| Main Authors | , , , |
| Format | Journal Article |
| Language | English |
| Published |
United States
Elsevier Inc
01.06.2019
|
| Subjects | |
| Online Access | Get full text |
| ISSN | 1532-0464 1532-0480 1532-0480 |
| DOI | 10.1016/j.jbi.2019.103207 |
Cover
| Summary: | [Display omitted]
•Clinical diagnoses are usually sparse and biased, implying reduced interoperability.•This paper proposes a cross-lingual approach based on exploiting foreign language data.•Translations provide lexical diversity, but subject to mistakes and nuance losses.•Large amounts of translated data can be used for improving Machine Learninig methods.
Automatic ICD-10 coding is an unresolved challenge in terms of Machine Learning tasks. Despite hospitals generating an enormous amount of clinical documents, data is considerably sparse, associated with a very skewed and unbalanced code distribution, what entails reduced interoperability. In addition, in some languages the availability of coded documents is very limited. This paper proposes a cross-lingual approach based on Machine Translation methods to code death certificates with ICD-10 using supervised learning. The aim of this approach is to increase the availability of coded documents by combining collections of different languages, which may also contribute to reduce their possible bias in the ICD distribution, i.e. to avoid the promotion of a subset of codes due to service or environmental factors. A significant improvement in system performance is achieved for those labels with few occurrences. |
|---|---|
| Bibliography: | ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 23 |
| ISSN: | 1532-0464 1532-0480 1532-0480 |
| DOI: | 10.1016/j.jbi.2019.103207 |