Development and evaluation of a Naïve Bayesian model for coding causation of workers’ compensation claims

Tracking and trending rates of injuries and illnesses classified as musculoskeletal disorders caused by ergonomic risk factors such as overexertion and repetitive motion (MSDs) and slips, trips, or falls (STFs) in different industry sectors is of high interest to many researchers. Unfortunately, ide...

Full description

Saved in:
Bibliographic Details
Published inJournal of safety research Vol. 43; no. 5-6; pp. 327 - 332
Main Authors Bertke, S.J., Meyers, A.R., Wurzelbacher, S.J., Bell, J., Lampl, M.L., Robins, D.
Format Journal Article
LanguageEnglish
Published United States Elsevier Ltd 01.12.2012
Elsevier Science Ltd
Subjects
Online AccessGet full text
ISSN0022-4375
1879-1247
1879-1247
DOI10.1016/j.jsr.2012.10.012

Cover

More Information
Summary:Tracking and trending rates of injuries and illnesses classified as musculoskeletal disorders caused by ergonomic risk factors such as overexertion and repetitive motion (MSDs) and slips, trips, or falls (STFs) in different industry sectors is of high interest to many researchers. Unfortunately, identifying the cause of injuries and illnesses in large datasets such as workers’ compensation systems often requires reading and coding the free form accident text narrative for potentially millions of records. To alleviate the need for manual coding, this paper describes and evaluates a computer auto-coding algorithm that demonstrated the ability to code millions of claims quickly and accurately by learning from a set of previously manually coded claims. The auto-coding program was able to code claims as a musculoskeletal disorders, STF or other with approximately 90% accuracy. The program developed and discussed in this paper provides an accurate and efficient method for identifying the causation of workers' compensation claims as a STF or MSD in a large database based on the unstructured text narrative and resulting injury diagnoses. The program coded thousands of claims in minutes. The method described in this paper can be used by researchers and practitioners to relieve the manual burden of reading and identifying the causation of claims as a STF or MSD. Furthermore, the method can be easily generalized to code/classify other unstructured text narratives. ► Coding causation for millions of workers compensation claims was unfeasible ► A machine learning program was developed to aid in coding the causation of claims ► The program demonstrated the ability to code with about 90% accuracy ► This program can be generalized to categorize any set of unstructured text data
Bibliography:SourceType-Scholarly Journals-1
ObjectType-Feature-1
content type line 14
ObjectType-Article-2
ObjectType-Undefined-1
ObjectType-Feature-3
content type line 23
ISSN:0022-4375
1879-1247
1879-1247
DOI:10.1016/j.jsr.2012.10.012