Automated data capture from free-text radiology reports to enhance accuracy of hospital inpatient stroke codes
Purpose Much potentially useful clinical information for pharmacoepidemiological research is contained in unstructured free‐text documents and is not readily available for analysis. Routine health data such as Scottish Morbidity Records (SMR01) frequently use generic ‘stroke’ codes. Free‐text Comput...
Saved in:
| Published in | Pharmacoepidemiology and drug safety Vol. 19; no. 8; pp. 843 - 847 |
|---|---|
| Main Authors | , , , , |
| Format | Journal Article |
| Language | English |
| Published |
Chichester, UK
John Wiley & Sons, Ltd
01.08.2010
|
| Subjects | |
| Online Access | Get full text |
| ISSN | 1053-8569 1099-1557 1099-1557 |
| DOI | 10.1002/pds.1981 |
Cover
| Summary: | Purpose
Much potentially useful clinical information for pharmacoepidemiological research is contained in unstructured free‐text documents and is not readily available for analysis. Routine health data such as Scottish Morbidity Records (SMR01) frequently use generic ‘stroke’ codes. Free‐text Computerised Radiology Information System (CRIS) reports have potential to provide this missing detail. We aimed to increase the number of stroke‐type‐specific diagnoses by augmenting SMR01 with data derived from CRIS reports and to assess the accuracy of this methodology.
Methods
SMR01 codes describing first‐ever‐stroke admissions in Tayside, Scotland from 1994 to 2005 were linked to CRIS CT‐brain scan reports occurring with 14 days of admission. Software was developed to parse the text and elicit details of stroke type using keyword matching. An algorithm was iteratively developed to differentiate intracerebral haemorrhage (ICH) from ischaemic stroke (IS) against a training set of reports with pathophysiologically precise SMR01 codes. This algorithm was then applied to CRIS reports associated with generic SMR01 codes. To establish the accuracy of the algorithm a sample of 150 ICH and 150 IS reports were independently classified by a stroke physician.
Results
There were 8419 SMR01 coded first‐ever strokes. The proportion of patients with pathophysiologically clear diagnoses doubled from 2745 (32.6%) to 5614 (66.7%). The positive predictive value was 94.7% (95%CI 89.8–97.3) for IS and 76.7% (95%CI 69.3–82.7) for haemorrhagic stroke.
Conclusions
A free‐text processing approach was acceptably accurate at identifying IS, but not ICH. This approach could be adapted to other studies where radiology reports may be informative. Copyright © 2010 John Wiley & Sons, Ltd. |
|---|---|
| Bibliography: | Scottish Executive Health Department Chief Scientist Office The authors declare that they have no conflict of interest. ArticleID:PDS1981 istex:D8B835BE72790AD03D18C495FFABC67F95B236F3 ark:/67375/WNG-BBXBGTNR-4 ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 23 |
| ISSN: | 1053-8569 1099-1557 1099-1557 |
| DOI: | 10.1002/pds.1981 |