Automated data capture from free-text radiology reports to enhance accuracy of hospital inpatient stroke codes

Purpose Much potentially useful clinical information for pharmacoepidemiological research is contained in unstructured free‐text documents and is not readily available for analysis. Routine health data such as Scottish Morbidity Records (SMR01) frequently use generic ‘stroke’ codes. Free‐text Comput...

Full description

Saved in:
Bibliographic Details
Published inPharmacoepidemiology and drug safety Vol. 19; no. 8; pp. 843 - 847
Main Authors Flynn, Robert W. V., Macdonald, Thomas M., Schembri, Nicola, Murray, Gordon D., Doney, Alexander S. F.
Format Journal Article
LanguageEnglish
Published Chichester, UK John Wiley & Sons, Ltd 01.08.2010
Subjects
Online AccessGet full text
ISSN1053-8569
1099-1557
1099-1557
DOI10.1002/pds.1981

Cover

More Information
Summary:Purpose Much potentially useful clinical information for pharmacoepidemiological research is contained in unstructured free‐text documents and is not readily available for analysis. Routine health data such as Scottish Morbidity Records (SMR01) frequently use generic ‘stroke’ codes. Free‐text Computerised Radiology Information System (CRIS) reports have potential to provide this missing detail. We aimed to increase the number of stroke‐type‐specific diagnoses by augmenting SMR01 with data derived from CRIS reports and to assess the accuracy of this methodology. Methods SMR01 codes describing first‐ever‐stroke admissions in Tayside, Scotland from 1994 to 2005 were linked to CRIS CT‐brain scan reports occurring with 14 days of admission. Software was developed to parse the text and elicit details of stroke type using keyword matching. An algorithm was iteratively developed to differentiate intracerebral haemorrhage (ICH) from ischaemic stroke (IS) against a training set of reports with pathophysiologically precise SMR01 codes. This algorithm was then applied to CRIS reports associated with generic SMR01 codes. To establish the accuracy of the algorithm a sample of 150 ICH and 150 IS reports were independently classified by a stroke physician. Results There were 8419 SMR01 coded first‐ever strokes. The proportion of patients with pathophysiologically clear diagnoses doubled from 2745 (32.6%) to 5614 (66.7%). The positive predictive value was 94.7% (95%CI 89.8–97.3) for IS and 76.7% (95%CI 69.3–82.7) for haemorrhagic stroke. Conclusions A free‐text processing approach was acceptably accurate at identifying IS, but not ICH. This approach could be adapted to other studies where radiology reports may be informative. Copyright © 2010 John Wiley & Sons, Ltd.
Bibliography:Scottish Executive Health Department
Chief Scientist Office
The authors declare that they have no conflict of interest.
ArticleID:PDS1981
istex:D8B835BE72790AD03D18C495FFABC67F95B236F3
ark:/67375/WNG-BBXBGTNR-4
ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 23
ISSN:1053-8569
1099-1557
1099-1557
DOI:10.1002/pds.1981