O-114 Natural language processing as a tool for developing and updating job exposure matrices for chemical exposures in the general population
Workplaces are dynamic environments, in which temporal changes in conditions and exposures frequently occur. Such changes are rarely captured by existing Job Exposure Matrices (JEMs), which are typically developed using information available at a certain point in time. As such, they are unable to ta...
        Saved in:
      
    
          | Published in | Abstracts Vol. 80; no. Suppl 1; p. A8 | 
|---|---|
| Main Authors | , , , , , , , , , , , , , , | 
| Format | Journal Article | 
| Language | English | 
| Published | 
        London
          BMJ Publishing Group Ltd
    
        01.03.2023
     BMJ Publishing Group LTD  | 
| Subjects | |
| Online Access | Get full text | 
| ISSN | 1351-0711 1470-7926  | 
| DOI | 10.1136/OEM-2023-EPICOH.19 | 
Cover
| Abstract | Workplaces are dynamic environments, in which temporal changes in conditions and exposures frequently occur. Such changes are rarely captured by existing Job Exposure Matrices (JEMs), which are typically developed using information available at a certain point in time. As such, they are unable to take into account potential future changes, which could negatively impact the reliability of JEMs when used outside their development period. Moreover, the process of developing JEMs for emerging or new exposure factors is a laborious, time-consuming process. Within the Exposome Project for Health and Occupational Research (EPHOR; https://www.ephor-project.eu/), we have been exploring the use of Natural Language Processing (NLP) as a vehicle for streamlining the update of existing JEMs and the development of new JEMs. Specifically, we will develop named entity recognition (NER) tools to automatically detect mentions of exposure-related concepts in literature, thus increasing the efficiency of locating relevant information for JEM update and development.Accordingly, we have developed a novel annotated corpus, i.e., 50 literature articles concerning workplace exposure to diesel exhaust, in which exposure assessment experts used guidelines to annotate all mentions of six different named entity categories (substance, occupation, industry/workplace, job task/activity, measurement device and sample type) occurring in the abstract, methods and results sections. The corpus will be used to train machine learning NER algorithms. Each article was annotated independently by two experts, and Inter-Annotator Agreement (IAA) scores were calculated to assess annotation quality. Exact matching scores (requiring agreement of semantic category and exact annotation span) ranged from 0.38 to 0.79 F1 for individual categories (average: 0.56). Relaxed matching scores (requiring agreement of category and partially overlapping spans) ranged from 0.63 to 0.87 F1 (average: 0.72). These results suggest that annotation quality is sufficient for machine learning. We will present the annotation scheme, guidelines and preliminary analysis of the results. | 
    
|---|---|
| AbstractList | Workplaces are dynamic environments, in which temporal changes in conditions and exposures frequently occur. Such changes are rarely captured by existing Job Exposure Matrices (JEMs), which are typically developed using information available at a certain point in time. As such, they are unable to take into account potential future changes, which could negatively impact the reliability of JEMs when used outside their development period. Moreover, the process of developing JEMs for emerging or new exposure factors is a laborious, time-consuming process. Within the Exposome Project for Health and Occupational Research (EPHOR; https://www.ephor-project.eu/), we have been exploring the use of Natural Language Processing (NLP) as a vehicle for streamlining the update of existing JEMs and the development of new JEMs. Specifically, we will develop named entity recognition (NER) tools to automatically detect mentions of exposure-related concepts in literature, thus increasing the efficiency of locating relevant information for JEM update and development.Accordingly, we have developed a novel annotated corpus, i.e., 50 literature articles concerning workplace exposure to diesel exhaust, in which exposure assessment experts used guidelines to annotate all mentions of six different named entity categories (substance, occupation, industry/workplace, job task/activity, measurement device and sample type) occurring in the abstract, methods and results sections. The corpus will be used to train machine learning NER algorithms. Each article was annotated independently by two experts, and Inter-Annotator Agreement (IAA) scores were calculated to assess annotation quality. Exact matching scores (requiring agreement of semantic category and exact annotation span) ranged from 0.38 to 0.79 F1 for individual categories (average: 0.56). Relaxed matching scores (requiring agreement of category and partially overlapping spans) ranged from 0.63 to 0.87 F1 (average: 0.72). These results suggest that annotation quality is sufficient for machine learning. We will present the annotation scheme, guidelines and preliminary analysis of the results. | 
    
| Author | Cramer, Christine Kuijpers, Eelco Thompson, Paul Brinchmann, Bendik Ge, Calvin Schlunssen, Vivi Tinnerberg, Hakan Annaniadou, Sophia Stockholm, Zara Ann Taher, Evana Amir Galea, Karen S Xie, Qianqian Tonger, Martie van Basinas, Ioannis Kirkeleit, Jorunn  | 
    
| Author_xml | – sequence: 1 givenname: Ioannis surname: Basinas fullname: Basinas, Ioannis organization: Centre for Occupational and Environmental Health, University of Manchester, Oxford Road, Manchester, M13 9PL, UK – sequence: 2 givenname: Paul surname: Thompson fullname: Thompson, Paul organization: National Centre for Text Mining, Department of Computer Science, Manchester Institute of Biotechnology, The University of Manchester, 131 Princess Street, Manchester, M1 7DN, UK – sequence: 3 givenname: Qianqian surname: Xie fullname: Xie, Qianqian organization: National Centre for Text Mining, Department of Computer Science, Manchester Institute of Biotechnology, The University of Manchester, 131 Princess Street, Manchester, M1 7DN, UK – sequence: 4 givenname: Sophia surname: Annaniadou fullname: Annaniadou, Sophia organization: National Centre for Text Mining, Department of Computer Science, Manchester Institute of Biotechnology, The University of Manchester, 131 Princess Street, Manchester, M1 7DN, UK – sequence: 5 givenname: Calvin surname: Ge fullname: Ge, Calvin organization: TNO, Utrecht, the Netherlands – sequence: 6 givenname: Eelco surname: Kuijpers fullname: Kuijpers, Eelco organization: TNO, Utrecht, the Netherlands – sequence: 7 givenname: Hakan surname: Tinnerberg fullname: Tinnerberg, Hakan organization: University of Gothenburg, Institute of Medicine, Sahlgrenska Academy, School of Public Health and Community Medicine, Gothenburg, Sweden – sequence: 8 givenname: Zara Ann surname: Stockholm fullname: Stockholm, Zara Ann organization: Department of Public Health, Research Unit for Work, Environment and Health, Danish Ramazzini Centre, Aarhus University, DK-8000 Aarhus C, Denmark – sequence: 9 givenname: Jorunn surname: Kirkeleit fullname: Kirkeleit, Jorunn organization: Department of Global Public Health and Primary Care, University of Bergen, Bergen, Norway – sequence: 10 givenname: Karen S surname: Galea fullname: Galea, Karen S organization: Institute of Occupational Medicine (IOM), Edinburgh, UK – sequence: 11 givenname: Bendik surname: Brinchmann fullname: Brinchmann, Bendik organization: Department of Air Pollution and Noise, Domain of Infection Control, Environment and Health, Norwegian Institute of Public Health, Oslo, Norway, Department of Occupational Medicine and Epidemiology, National Institute of Occupational Health, Oslo, Norway – sequence: 12 givenname: Christine surname: Cramer fullname: Cramer, Christine organization: Department of Public Health, Research Unit for Work, Environment and Health, Danish Ramazzini Centre, Aarhus University, DK-8000 Aarhus C, Denmark – sequence: 13 givenname: Evana Amir surname: Taher fullname: Taher, Evana Amir organization: Karolinska Institute, Stockholm, Sweden – sequence: 14 givenname: Vivi surname: Schlunssen fullname: Schlunssen, Vivi organization: Department of Public Health, Research Unit for Work, Environment and Health, Danish Ramazzini Centre, Aarhus University, DK-8000 Aarhus C, Denmark – sequence: 15 givenname: Martie van surname: Tonger fullname: Tonger, Martie van organization: Centre for Occupational and Environmental Health, University of Manchester, Oxford Road, Manchester, M13 9PL, UK  | 
    
| BookMark | eNpFkc1O6zAQhS0EEr8vwMoS6_R6bLeJl6gqPxL3lgWsrUnitqkS28QJ0B0b9vcZeRIcAmI1tuacz54zx2TfOmsIOQc2ARCzP8vF34QzLpLF_e18eTMBtUeOQKYsSRWf7cezmELCUoBDchzCljEQqeBH5P8yAZAfb-__sOtbrGmNdt3j2lDfusKEUNk1xUCRds7VdOVaWppnUzv_1bAl7X2J3XDZupyaV-9C3xraYNdW0f_lKDamqYoI_2kHWlnabQxdG2uGV73zfR0xzp6SgxXWwZx91xPyeLV4mN8kd8vr2_nlXZIDgIpjlVxhiVKVkjMxzVaQi1zKHDJMkaMq8jhsLCkTPJtKnClkTMlylvGMCyZOiBi5vfW4e8G61r6tGmx3GpgeQtXONHoIVRtfFW6jQUXXxeiK6Tz1JnR66_rWxo9qnmYZsCmXEFWTUZU321_BDzbuasSOuxqwn1aIiwY | 
    
| ContentType | Journal Article | 
    
| Copyright | Author(s) (or their employer(s)) 2023. No commercial re-use. See rights and permissions. Published by BMJ. 2023 Author(s) (or their employer(s)) 2023. No commercial re-use. See rights and permissions. Published by BMJ.  | 
    
| Copyright_xml | – notice: Author(s) (or their employer(s)) 2023. No commercial re-use. See rights and permissions. Published by BMJ. – notice: 2023 Author(s) (or their employer(s)) 2023. No commercial re-use. See rights and permissions. Published by BMJ.  | 
    
| DBID | K9. NAPCQ ADTOC UNPAY  | 
    
| DOI | 10.1136/OEM-2023-EPICOH.19 | 
    
| DatabaseName | ProQuest Health & Medical Complete (Alumni) Nursing & Allied Health Premium Unpaywall for CDI: Periodical Content Unpaywall  | 
    
| DatabaseTitle | ProQuest Health & Medical Complete (Alumni) Nursing & Allied Health Premium  | 
    
| DatabaseTitleList | ProQuest Health & Medical Complete (Alumni)  | 
    
| Database_xml | – sequence: 1 dbid: UNPAY name: Unpaywall url: https://proxy.k.utb.cz/login?url=https://unpaywall.org/ sourceTypes: Open Access Repository  | 
    
| DeliveryMethod | fulltext_linktorsrc | 
    
| Discipline | Medicine Occupational Therapy & Rehabilitation  | 
    
| EISSN | 1470-7926 | 
    
| EndPage | A8 | 
    
| ExternalDocumentID | 10.1136/oem-2023-epicoh.19 oemed  | 
    
| Genre | Conference Proceeding | 
    
| GroupedDBID | --- .-4 ..I .55 .GJ .VT 0R~ 123 18M 29N 2WC 39C 3O- 4.4 40O 53G 5RE 5VS 7RV 7X7 7XC 7~S 88E 8C1 8FE 8FG 8FH 8FI 8FJ 8R4 8R5 AACGO AAHLL AAIKC AAKAS AAMNW AANCE AAOJX AAWJN AAWTL ABAAH ABBHK ABJCF ABJNI ABKDF ABMQD ABPLY ABTLG ABUWG ABVAJ ABXSQ ACGFO ACGFS ACGTL ACHIC ACHTP ACIWK ACMFJ ACOAB ACOFX ACQSR ACTZY ADBBV ADCEG ADQXQ ADULT ADZCM AENEX AEUPB AEUYN AEXZC AFKRA AFRAH AFWFF AGQPQ AHMBA AHNKE AHQMW AJYBZ ALIPV ALMA_UNASSIGNED_HOLDINGS ANHSF AQVQM ATCPS AZFZN BAWUL BENPR BGLVJ BHPHI BLJBA BOMFT BPHCQ BTFSW BTHHO BVXVI C45 CAG CCPQU COF CS3 CXRWF DCCCD DIK DU5 E3Z EBS EJD EX3 F5P FYUFA GX1 H13 HAJ HCIFZ HMCUK HQ3 HTVGU HYE HZ~ IAO IEA IEP IHR INH INR IOF IPSME ITC JAAYA JBMMH JENOY JHFFW JKQEH JLS JLXEF JPM JSG JST KQ8 L6V L7B M1P M7S N9A NAPCQ NTWIH NXWIF O9- OK1 OVD P2P PATMY PCD PHGZT PQQKQ PROAC PSQYO PTHSS PYCSY Q2X R53 RHI RMJ RPM RV8 SA0 TEORI TR2 UAP UAW UKHRP UYXKK V24 VM9 W8F WH7 WOW X7M XVN YFH YHZ YOC YQY ZGI ZXP ACQHZ AERUA K9. PHGZM PJZUB PPXIY PQGLB ADTOC UNPAY  | 
    
| ID | FETCH-LOGICAL-b1119-79d29ada49d420358f1b3b44b18a7a2a9cb711a9c7032854a69a0094d68282303 | 
    
| IEDL.DBID | UNPAY | 
    
| ISSN | 1351-0711 | 
    
| IngestDate | Tue Aug 19 16:39:35 EDT 2025 Tue Oct 07 07:08:31 EDT 2025 Thu Apr 24 22:49:51 EDT 2025  | 
    
| IsDoiOpenAccess | false | 
    
| IsOpenAccess | true | 
    
| IsPeerReviewed | true | 
    
| IsScholarly | true | 
    
| Issue | Suppl 1 | 
    
| Language | English | 
    
| LinkModel | DirectLink | 
    
| MergedId | FETCHMERGED-LOGICAL-b1119-79d29ada49d420358f1b3b44b18a7a2a9cb711a9c7032854a69a0094d68282303 | 
    
| Notes | 29th International Symposium on Epidemiology in Occupational Health (EPICOH 2023), Mumbai, India, Hosted by the Indian Association of Occupational Health, Mumbai Branch & Tata Memorial Centre Exposure assessment ObjectType-Conference Proceeding-1 SourceType-Scholarly Journals-1 content type line 14  | 
    
| OpenAccessLink | https://proxy.k.utb.cz/login?url=https://oem.bmj.com/content/oemed/80/Suppl_1/A8.1.full.pdf | 
    
| PQID | 2788105241 | 
    
| PQPubID | 2041056 | 
    
| ParticipantIDs | unpaywall_primary_10_1136_oem_2023_epicoh_19 proquest_journals_2788105241 bmj_journals_10_1136_OEM_2023_EPICOH_19  | 
    
| PublicationCentury | 2000 | 
    
| PublicationDate | 20230300 20230301  | 
    
| PublicationDateYYYYMMDD | 2023-03-01 | 
    
| PublicationDate_xml | – month: 3 year: 2023 text: 20230300  | 
    
| PublicationDecade | 2020 | 
    
| PublicationPlace | London | 
    
| PublicationPlace_xml | – name: London | 
    
| PublicationTitle | Abstracts | 
    
| PublicationTitleAbbrev | Occup Environ Med | 
    
| PublicationYear | 2023 | 
    
| Publisher | BMJ Publishing Group Ltd BMJ Publishing Group LTD  | 
    
| Publisher_xml | – name: BMJ Publishing Group Ltd – name: BMJ Publishing Group LTD  | 
    
| SSID | ssj0013732 | 
    
| Score | 2.3925602 | 
    
| Snippet | Workplaces are dynamic environments, in which temporal changes in conditions and exposures frequently occur. Such changes are rarely captured by existing Job... | 
    
| SourceID | unpaywall proquest bmj  | 
    
| SourceType | Open Access Repository Aggregation Database Publisher  | 
    
| StartPage | A8 | 
    
| SubjectTerms | Abstracts Algorithms Annotations Diesel engines Exposure Guidelines Language Learning algorithms Machine learning Matching Natural language processing Occupational exposure Quality assessment Streamlining Workplaces  | 
    
| Title | O-114 Natural language processing as a tool for developing and updating job exposure matrices for chemical exposures in the general population | 
    
| URI | https://oem.bmj.com/content/80/Suppl_1/A8.1.full https://www.proquest.com/docview/2788105241 https://oem.bmj.com/content/oemed/80/Suppl_1/A8.1.full.pdf  | 
    
| UnpaywallVersion | publishedVersion | 
    
| Volume | 80 | 
    
| hasFullText | 1 | 
    
| inHoldings | 1 | 
    
| isFullTextHit | |
| isPrint | |
| journalDatabaseRights | – providerCode: PRVPQU databaseName: Health & Medical Collection customDbUrl: eissn: 1470-7926 dateEnd: 20250503 omitProxy: true ssIdentifier: ssj0013732 issn: 1351-0711 databaseCode: 7X7 dateStart: 19940101 isFulltext: true titleUrlDefault: https://search.proquest.com/healthcomplete providerName: ProQuest – providerCode: PRVPQU databaseName: ProQuest Central customDbUrl: http://www.proquest.com/pqcentral?accountid=15518 eissn: 1470-7926 dateEnd: 20250503 omitProxy: true ssIdentifier: ssj0013732 issn: 1351-0711 databaseCode: BENPR dateStart: 19940101 isFulltext: true titleUrlDefault: https://www.proquest.com/central providerName: ProQuest – providerCode: PRVPQU databaseName: ProQuest Technology Collection customDbUrl: eissn: 1470-7926 dateEnd: 20250503 omitProxy: true ssIdentifier: ssj0013732 issn: 1351-0711 databaseCode: 8FG dateStart: 19940101 isFulltext: true titleUrlDefault: https://search.proquest.com/technologycollection1 providerName: ProQuest – providerCode: PRVPQU databaseName: Public Health Database customDbUrl: eissn: 1470-7926 dateEnd: 20250503 omitProxy: true ssIdentifier: ssj0013732 issn: 1351-0711 databaseCode: 8C1 dateStart: 19940101 isFulltext: true titleUrlDefault: https://search.proquest.com/publichealth providerName: ProQuest  | 
    
| link | http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwnV1Nb9NAEF2R5MAJigCRqkR7qMSFdbLx-uuYRkkDUpMIESmIgzXj3QJtYlvEEW1PvXDnN_aXdNZ2UuAGl7Vs745WO2vvm9HMG8aOJQKhWE_bsHJXKEikgL6fCEySADEkDKutQ_9s6k8W6v3SW9ZRlTYXJjNrB9cXVTaD5WdKC_vM6G7Y65aFLmPZHYSOdKx72sn1eYO1fI9weJO1FtP54NMuM8b17UBhy4MLk9OyfnUsnU6DpP-BJh9v0xyuf8Bq9dvBMn7KPu-mVMWTXDrbAp3k5i-2xv-b8wF7UuNNPqg2yDP2yKTP2a-ZIAPl7vbnFEraDb7zWvK8yhug84zDhgMvsmzFCdfyh-QqDqnm29ymRdDNRYbcXOWZdTTydcn3bzbliKSmIti_3vBvKSe8yb9UVNc83xcPe8EW49HH4UTUpRkE0s8xEgGpMAINKtKq33O98Fyii0qhDCGAPkQJBlLSJbB8fZ4CPwIbxKh9svDI6nFfsmaapeYV456vEE2ok8AgyQKQWpOZFUWgQiS02mZvaHHj-tPaxKXV4vrxbHQWW-XGo_m74WwSy6jNjnZqfejet3z5PY-ASpu93as6zisuj7000lclrdoqJO3w37ofsWbxfWteE0QpsMMawTKgNhxK245PO6x1MprOP3TqXXoP9-TzIg | 
    
| linkProvider | Unpaywall | 
    
| linkToUnpaywall | http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwnV1Nb9NAEF3R9MCJFgEiqKA5IHHpOtl4_XWMqlQBqUkPjdSKgzXj3QIlsa3GUUtPvXDvb-wv6aztpMCNnizbu6PVztj7ZjTzRoiPipBRbGBcWrkvNWZK4iDMJGVZRBQzhjUuoH80Cccz_eU0OG2zKl0tTGEXHi0ummoGx8-UV-6ZNb2436sbXaaqN4w95bnwtFea8y2xHQaMwztiezY5Hp6tK2P80E2Urj24tCVv63fP0elssfS_0OTzVV7iryucz_84WA53xNf1kpp8kp_eqiIvu_mHrfFpa94VL1q8CcPGQF6KZzZ_Je6mkh2U-9vfE6xpN2AdtYSyqRvg8wxwCQhVUcyBcS08FlcB5gZWpSuL4JuLgsBel4ULNMKi5vu3y3pG1lIRbF4v4UcOjDfhW0N1DeWmedhrMTscnRyMZduaQRL_HBMZsQoTNKgTowd9P4jPFfmkNakYIxxgklGkFF8ix9cXaAwTdEmMJmQPj70e_43o5EVu3woIQk1kY5NFllgWojKG3awkQR0To9Wu-MSbm7af1jKtvRY_TKejo9QpNx0dfz6YjlOVdMXeWq2PwweOL78fMFDpiv2NqtOy4fLYSGN9NdIaU2Fp7_5v-J7oVJcr-54hSkUfWlt8AIww7u4 | 
    
| openUrl | ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=O-114%E2%80%85Natural+language+processing+as+a+tool+for+developing+and+updating+job+exposure+matrices+for+chemical+exposures+in+the+general+population&rft.jtitle=Occupational+and+environmental+medicine+%28London%2C+England%29&rft.au=Basinas%2C+Ioannis&rft.au=Thompson%2C+Paul&rft.au=Xie%2C+Qianqian&rft.au=Annaniadou%2C+Sophia&rft.date=2023-03-01&rft.pub=BMJ+Publishing+Group+LTD&rft.issn=1351-0711&rft.eissn=1470-7926&rft.volume=80&rft.issue=Suppl+1&rft.spage=A8&rft.epage=A8&rft_id=info:doi/10.1136%2FOEM-2023-EPICOH.19&rft.externalDBID=HAS_PDF_LINK | 
    
| thumbnail_l | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=1351-0711&client=summon | 
    
| thumbnail_m | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=1351-0711&client=summon | 
    
| thumbnail_s | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=1351-0711&client=summon |