O-114 Natural language processing as a tool for developing and updating job exposure matrices for chemical exposures in the general population

Workplaces are dynamic environments, in which temporal changes in conditions and exposures frequently occur. Such changes are rarely captured by existing Job Exposure Matrices (JEMs), which are typically developed using information available at a certain point in time. As such, they are unable to ta...

Full description

Saved in:
Bibliographic Details
Published inAbstracts Vol. 80; no. Suppl 1; p. A8
Main Authors Basinas, Ioannis, Thompson, Paul, Xie, Qianqian, Annaniadou, Sophia, Ge, Calvin, Kuijpers, Eelco, Tinnerberg, Hakan, Stockholm, Zara Ann, Kirkeleit, Jorunn, Galea, Karen S, Brinchmann, Bendik, Cramer, Christine, Taher, Evana Amir, Schlunssen, Vivi, Tonger, Martie van
Format Journal Article
LanguageEnglish
Published London BMJ Publishing Group Ltd 01.03.2023
BMJ Publishing Group LTD
Subjects
Online AccessGet full text
ISSN1351-0711
1470-7926
DOI10.1136/OEM-2023-EPICOH.19

Cover

Abstract Workplaces are dynamic environments, in which temporal changes in conditions and exposures frequently occur. Such changes are rarely captured by existing Job Exposure Matrices (JEMs), which are typically developed using information available at a certain point in time. As such, they are unable to take into account potential future changes, which could negatively impact the reliability of JEMs when used outside their development period. Moreover, the process of developing JEMs for emerging or new exposure factors is a laborious, time-consuming process. Within the Exposome Project for Health and Occupational Research (EPHOR; https://www.ephor-project.eu/), we have been exploring the use of Natural Language Processing (NLP) as a vehicle for streamlining the update of existing JEMs and the development of new JEMs. Specifically, we will develop named entity recognition (NER) tools to automatically detect mentions of exposure-related concepts in literature, thus increasing the efficiency of locating relevant information for JEM update and development.Accordingly, we have developed a novel annotated corpus, i.e., 50 literature articles concerning workplace exposure to diesel exhaust, in which exposure assessment experts used guidelines to annotate all mentions of six different named entity categories (substance, occupation, industry/workplace, job task/activity, measurement device and sample type) occurring in the abstract, methods and results sections. The corpus will be used to train machine learning NER algorithms. Each article was annotated independently by two experts, and Inter-Annotator Agreement (IAA) scores were calculated to assess annotation quality. Exact matching scores (requiring agreement of semantic category and exact annotation span) ranged from 0.38 to 0.79 F1 for individual categories (average: 0.56). Relaxed matching scores (requiring agreement of category and partially overlapping spans) ranged from 0.63 to 0.87 F1 (average: 0.72). These results suggest that annotation quality is sufficient for machine learning. We will present the annotation scheme, guidelines and preliminary analysis of the results.
AbstractList Workplaces are dynamic environments, in which temporal changes in conditions and exposures frequently occur. Such changes are rarely captured by existing Job Exposure Matrices (JEMs), which are typically developed using information available at a certain point in time. As such, they are unable to take into account potential future changes, which could negatively impact the reliability of JEMs when used outside their development period. Moreover, the process of developing JEMs for emerging or new exposure factors is a laborious, time-consuming process. Within the Exposome Project for Health and Occupational Research (EPHOR; https://www.ephor-project.eu/), we have been exploring the use of Natural Language Processing (NLP) as a vehicle for streamlining the update of existing JEMs and the development of new JEMs. Specifically, we will develop named entity recognition (NER) tools to automatically detect mentions of exposure-related concepts in literature, thus increasing the efficiency of locating relevant information for JEM update and development.Accordingly, we have developed a novel annotated corpus, i.e., 50 literature articles concerning workplace exposure to diesel exhaust, in which exposure assessment experts used guidelines to annotate all mentions of six different named entity categories (substance, occupation, industry/workplace, job task/activity, measurement device and sample type) occurring in the abstract, methods and results sections. The corpus will be used to train machine learning NER algorithms. Each article was annotated independently by two experts, and Inter-Annotator Agreement (IAA) scores were calculated to assess annotation quality. Exact matching scores (requiring agreement of semantic category and exact annotation span) ranged from 0.38 to 0.79 F1 for individual categories (average: 0.56). Relaxed matching scores (requiring agreement of category and partially overlapping spans) ranged from 0.63 to 0.87 F1 (average: 0.72). These results suggest that annotation quality is sufficient for machine learning. We will present the annotation scheme, guidelines and preliminary analysis of the results.
Author Cramer, Christine
Kuijpers, Eelco
Thompson, Paul
Brinchmann, Bendik
Ge, Calvin
Schlunssen, Vivi
Tinnerberg, Hakan
Annaniadou, Sophia
Stockholm, Zara Ann
Taher, Evana Amir
Galea, Karen S
Xie, Qianqian
Tonger, Martie van
Basinas, Ioannis
Kirkeleit, Jorunn
Author_xml – sequence: 1
  givenname: Ioannis
  surname: Basinas
  fullname: Basinas, Ioannis
  organization: Centre for Occupational and Environmental Health, University of Manchester, Oxford Road, Manchester, M13 9PL, UK
– sequence: 2
  givenname: Paul
  surname: Thompson
  fullname: Thompson, Paul
  organization: National Centre for Text Mining, Department of Computer Science, Manchester Institute of Biotechnology, The University of Manchester, 131 Princess Street, Manchester, M1 7DN, UK
– sequence: 3
  givenname: Qianqian
  surname: Xie
  fullname: Xie, Qianqian
  organization: National Centre for Text Mining, Department of Computer Science, Manchester Institute of Biotechnology, The University of Manchester, 131 Princess Street, Manchester, M1 7DN, UK
– sequence: 4
  givenname: Sophia
  surname: Annaniadou
  fullname: Annaniadou, Sophia
  organization: National Centre for Text Mining, Department of Computer Science, Manchester Institute of Biotechnology, The University of Manchester, 131 Princess Street, Manchester, M1 7DN, UK
– sequence: 5
  givenname: Calvin
  surname: Ge
  fullname: Ge, Calvin
  organization: TNO, Utrecht, the Netherlands
– sequence: 6
  givenname: Eelco
  surname: Kuijpers
  fullname: Kuijpers, Eelco
  organization: TNO, Utrecht, the Netherlands
– sequence: 7
  givenname: Hakan
  surname: Tinnerberg
  fullname: Tinnerberg, Hakan
  organization: University of Gothenburg, Institute of Medicine, Sahlgrenska Academy, School of Public Health and Community Medicine, Gothenburg, Sweden
– sequence: 8
  givenname: Zara Ann
  surname: Stockholm
  fullname: Stockholm, Zara Ann
  organization: Department of Public Health, Research Unit for Work, Environment and Health, Danish Ramazzini Centre, Aarhus University, DK-8000 Aarhus C, Denmark
– sequence: 9
  givenname: Jorunn
  surname: Kirkeleit
  fullname: Kirkeleit, Jorunn
  organization: Department of Global Public Health and Primary Care, University of Bergen, Bergen, Norway
– sequence: 10
  givenname: Karen S
  surname: Galea
  fullname: Galea, Karen S
  organization: Institute of Occupational Medicine (IOM), Edinburgh, UK
– sequence: 11
  givenname: Bendik
  surname: Brinchmann
  fullname: Brinchmann, Bendik
  organization: Department of Air Pollution and Noise, Domain of Infection Control, Environment and Health, Norwegian Institute of Public Health, Oslo, Norway, Department of Occupational Medicine and Epidemiology, National Institute of Occupational Health, Oslo, Norway
– sequence: 12
  givenname: Christine
  surname: Cramer
  fullname: Cramer, Christine
  organization: Department of Public Health, Research Unit for Work, Environment and Health, Danish Ramazzini Centre, Aarhus University, DK-8000 Aarhus C, Denmark
– sequence: 13
  givenname: Evana Amir
  surname: Taher
  fullname: Taher, Evana Amir
  organization: Karolinska Institute, Stockholm, Sweden
– sequence: 14
  givenname: Vivi
  surname: Schlunssen
  fullname: Schlunssen, Vivi
  organization: Department of Public Health, Research Unit for Work, Environment and Health, Danish Ramazzini Centre, Aarhus University, DK-8000 Aarhus C, Denmark
– sequence: 15
  givenname: Martie van
  surname: Tonger
  fullname: Tonger, Martie van
  organization: Centre for Occupational and Environmental Health, University of Manchester, Oxford Road, Manchester, M13 9PL, UK
BookMark eNpFkc1O6zAQhS0EEr8vwMoS6_R6bLeJl6gqPxL3lgWsrUnitqkS28QJ0B0b9vcZeRIcAmI1tuacz54zx2TfOmsIOQc2ARCzP8vF34QzLpLF_e18eTMBtUeOQKYsSRWf7cezmELCUoBDchzCljEQqeBH5P8yAZAfb-__sOtbrGmNdt3j2lDfusKEUNk1xUCRds7VdOVaWppnUzv_1bAl7X2J3XDZupyaV-9C3xraYNdW0f_lKDamqYoI_2kHWlnabQxdG2uGV73zfR0xzp6SgxXWwZx91xPyeLV4mN8kd8vr2_nlXZIDgIpjlVxhiVKVkjMxzVaQi1zKHDJMkaMq8jhsLCkTPJtKnClkTMlylvGMCyZOiBi5vfW4e8G61r6tGmx3GpgeQtXONHoIVRtfFW6jQUXXxeiK6Tz1JnR66_rWxo9qnmYZsCmXEFWTUZU321_BDzbuasSOuxqwn1aIiwY
ContentType Journal Article
Copyright Author(s) (or their employer(s)) 2023. No commercial re-use. See rights and permissions. Published by BMJ.
2023 Author(s) (or their employer(s)) 2023. No commercial re-use. See rights and permissions. Published by BMJ.
Copyright_xml – notice: Author(s) (or their employer(s)) 2023. No commercial re-use. See rights and permissions. Published by BMJ.
– notice: 2023 Author(s) (or their employer(s)) 2023. No commercial re-use. See rights and permissions. Published by BMJ.
DBID K9.
NAPCQ
ADTOC
UNPAY
DOI 10.1136/OEM-2023-EPICOH.19
DatabaseName ProQuest Health & Medical Complete (Alumni)
Nursing & Allied Health Premium
Unpaywall for CDI: Periodical Content
Unpaywall
DatabaseTitle ProQuest Health & Medical Complete (Alumni)
Nursing & Allied Health Premium
DatabaseTitleList
ProQuest Health & Medical Complete (Alumni)
Database_xml – sequence: 1
  dbid: UNPAY
  name: Unpaywall
  url: https://proxy.k.utb.cz/login?url=https://unpaywall.org/
  sourceTypes: Open Access Repository
DeliveryMethod fulltext_linktorsrc
Discipline Medicine
Occupational Therapy & Rehabilitation
EISSN 1470-7926
EndPage A8
ExternalDocumentID 10.1136/oem-2023-epicoh.19
oemed
Genre Conference Proceeding
GroupedDBID ---
.-4
..I
.55
.GJ
.VT
0R~
123
18M
29N
2WC
39C
3O-
4.4
40O
53G
5RE
5VS
7RV
7X7
7XC
7~S
88E
8C1
8FE
8FG
8FH
8FI
8FJ
8R4
8R5
AACGO
AAHLL
AAIKC
AAKAS
AAMNW
AANCE
AAOJX
AAWJN
AAWTL
ABAAH
ABBHK
ABJCF
ABJNI
ABKDF
ABMQD
ABPLY
ABTLG
ABUWG
ABVAJ
ABXSQ
ACGFO
ACGFS
ACGTL
ACHIC
ACHTP
ACIWK
ACMFJ
ACOAB
ACOFX
ACQSR
ACTZY
ADBBV
ADCEG
ADQXQ
ADULT
ADZCM
AENEX
AEUPB
AEUYN
AEXZC
AFKRA
AFRAH
AFWFF
AGQPQ
AHMBA
AHNKE
AHQMW
AJYBZ
ALIPV
ALMA_UNASSIGNED_HOLDINGS
ANHSF
AQVQM
ATCPS
AZFZN
BAWUL
BENPR
BGLVJ
BHPHI
BLJBA
BOMFT
BPHCQ
BTFSW
BTHHO
BVXVI
C45
CAG
CCPQU
COF
CS3
CXRWF
DCCCD
DIK
DU5
E3Z
EBS
EJD
EX3
F5P
FYUFA
GX1
H13
HAJ
HCIFZ
HMCUK
HQ3
HTVGU
HYE
HZ~
IAO
IEA
IEP
IHR
INH
INR
IOF
IPSME
ITC
JAAYA
JBMMH
JENOY
JHFFW
JKQEH
JLS
JLXEF
JPM
JSG
JST
KQ8
L6V
L7B
M1P
M7S
N9A
NAPCQ
NTWIH
NXWIF
O9-
OK1
OVD
P2P
PATMY
PCD
PHGZT
PQQKQ
PROAC
PSQYO
PTHSS
PYCSY
Q2X
R53
RHI
RMJ
RPM
RV8
SA0
TEORI
TR2
UAP
UAW
UKHRP
UYXKK
V24
VM9
W8F
WH7
WOW
X7M
XVN
YFH
YHZ
YOC
YQY
ZGI
ZXP
ACQHZ
AERUA
K9.
PHGZM
PJZUB
PPXIY
PQGLB
ADTOC
UNPAY
ID FETCH-LOGICAL-b1119-79d29ada49d420358f1b3b44b18a7a2a9cb711a9c7032854a69a0094d68282303
IEDL.DBID UNPAY
ISSN 1351-0711
IngestDate Tue Aug 19 16:39:35 EDT 2025
Tue Oct 07 07:08:31 EDT 2025
Thu Apr 24 22:49:51 EDT 2025
IsDoiOpenAccess false
IsOpenAccess true
IsPeerReviewed true
IsScholarly true
Issue Suppl 1
Language English
LinkModel DirectLink
MergedId FETCHMERGED-LOGICAL-b1119-79d29ada49d420358f1b3b44b18a7a2a9cb711a9c7032854a69a0094d68282303
Notes 29th International Symposium on Epidemiology in Occupational Health (EPICOH 2023), Mumbai, India, Hosted by the Indian Association of Occupational Health, Mumbai Branch & Tata Memorial Centre
Exposure assessment
ObjectType-Conference Proceeding-1
SourceType-Scholarly Journals-1
content type line 14
OpenAccessLink https://proxy.k.utb.cz/login?url=https://oem.bmj.com/content/oemed/80/Suppl_1/A8.1.full.pdf
PQID 2788105241
PQPubID 2041056
ParticipantIDs unpaywall_primary_10_1136_oem_2023_epicoh_19
proquest_journals_2788105241
bmj_journals_10_1136_OEM_2023_EPICOH_19
PublicationCentury 2000
PublicationDate 20230300
20230301
PublicationDateYYYYMMDD 2023-03-01
PublicationDate_xml – month: 3
  year: 2023
  text: 20230300
PublicationDecade 2020
PublicationPlace London
PublicationPlace_xml – name: London
PublicationTitle Abstracts
PublicationTitleAbbrev Occup Environ Med
PublicationYear 2023
Publisher BMJ Publishing Group Ltd
BMJ Publishing Group LTD
Publisher_xml – name: BMJ Publishing Group Ltd
– name: BMJ Publishing Group LTD
SSID ssj0013732
Score 2.3925602
Snippet Workplaces are dynamic environments, in which temporal changes in conditions and exposures frequently occur. Such changes are rarely captured by existing Job...
SourceID unpaywall
proquest
bmj
SourceType Open Access Repository
Aggregation Database
Publisher
StartPage A8
SubjectTerms Abstracts
Algorithms
Annotations
Diesel engines
Exposure
Guidelines
Language
Learning algorithms
Machine learning
Matching
Natural language processing
Occupational exposure
Quality assessment
Streamlining
Workplaces
Title O-114 Natural language processing as a tool for developing and updating job exposure matrices for chemical exposures in the general population
URI https://oem.bmj.com/content/80/Suppl_1/A8.1.full
https://www.proquest.com/docview/2788105241
https://oem.bmj.com/content/oemed/80/Suppl_1/A8.1.full.pdf
UnpaywallVersion publishedVersion
Volume 80
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
journalDatabaseRights – providerCode: PRVPQU
  databaseName: Health & Medical Collection
  customDbUrl:
  eissn: 1470-7926
  dateEnd: 20250503
  omitProxy: true
  ssIdentifier: ssj0013732
  issn: 1351-0711
  databaseCode: 7X7
  dateStart: 19940101
  isFulltext: true
  titleUrlDefault: https://search.proquest.com/healthcomplete
  providerName: ProQuest
– providerCode: PRVPQU
  databaseName: ProQuest Central
  customDbUrl: http://www.proquest.com/pqcentral?accountid=15518
  eissn: 1470-7926
  dateEnd: 20250503
  omitProxy: true
  ssIdentifier: ssj0013732
  issn: 1351-0711
  databaseCode: BENPR
  dateStart: 19940101
  isFulltext: true
  titleUrlDefault: https://www.proquest.com/central
  providerName: ProQuest
– providerCode: PRVPQU
  databaseName: ProQuest Technology Collection
  customDbUrl:
  eissn: 1470-7926
  dateEnd: 20250503
  omitProxy: true
  ssIdentifier: ssj0013732
  issn: 1351-0711
  databaseCode: 8FG
  dateStart: 19940101
  isFulltext: true
  titleUrlDefault: https://search.proquest.com/technologycollection1
  providerName: ProQuest
– providerCode: PRVPQU
  databaseName: Public Health Database
  customDbUrl:
  eissn: 1470-7926
  dateEnd: 20250503
  omitProxy: true
  ssIdentifier: ssj0013732
  issn: 1351-0711
  databaseCode: 8C1
  dateStart: 19940101
  isFulltext: true
  titleUrlDefault: https://search.proquest.com/publichealth
  providerName: ProQuest
link http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwnV1Nb9NAEF2R5MAJigCRqkR7qMSFdbLx-uuYRkkDUpMIESmIgzXj3QJtYlvEEW1PvXDnN_aXdNZ2UuAGl7Vs745WO2vvm9HMG8aOJQKhWE_bsHJXKEikgL6fCEySADEkDKutQ_9s6k8W6v3SW9ZRlTYXJjNrB9cXVTaD5WdKC_vM6G7Y65aFLmPZHYSOdKx72sn1eYO1fI9weJO1FtP54NMuM8b17UBhy4MLk9OyfnUsnU6DpP-BJh9v0xyuf8Bq9dvBMn7KPu-mVMWTXDrbAp3k5i-2xv-b8wF7UuNNPqg2yDP2yKTP2a-ZIAPl7vbnFEraDb7zWvK8yhug84zDhgMvsmzFCdfyh-QqDqnm29ymRdDNRYbcXOWZdTTydcn3bzbliKSmIti_3vBvKSe8yb9UVNc83xcPe8EW49HH4UTUpRkE0s8xEgGpMAINKtKq33O98Fyii0qhDCGAPkQJBlLSJbB8fZ4CPwIbxKh9svDI6nFfsmaapeYV456vEE2ok8AgyQKQWpOZFUWgQiS02mZvaHHj-tPaxKXV4vrxbHQWW-XGo_m74WwSy6jNjnZqfejet3z5PY-ASpu93as6zisuj7000lclrdoqJO3w37ofsWbxfWteE0QpsMMawTKgNhxK245PO6x1MprOP3TqXXoP9-TzIg
linkProvider Unpaywall
linkToUnpaywall http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwnV1Nb9NAEF3R9MCJFgEiqKA5IHHpOtl4_XWMqlQBqUkPjdSKgzXj3QIlsa3GUUtPvXDvb-wv6aztpMCNnizbu6PVztj7ZjTzRoiPipBRbGBcWrkvNWZK4iDMJGVZRBQzhjUuoH80Cccz_eU0OG2zKl0tTGEXHi0ummoGx8-UV-6ZNb2436sbXaaqN4w95bnwtFea8y2xHQaMwztiezY5Hp6tK2P80E2Urj24tCVv63fP0elssfS_0OTzVV7iryucz_84WA53xNf1kpp8kp_eqiIvu_mHrfFpa94VL1q8CcPGQF6KZzZ_Je6mkh2U-9vfE6xpN2AdtYSyqRvg8wxwCQhVUcyBcS08FlcB5gZWpSuL4JuLgsBel4ULNMKi5vu3y3pG1lIRbF4v4UcOjDfhW0N1DeWmedhrMTscnRyMZduaQRL_HBMZsQoTNKgTowd9P4jPFfmkNakYIxxgklGkFF8ix9cXaAwTdEmMJmQPj70e_43o5EVu3woIQk1kY5NFllgWojKG3awkQR0To9Wu-MSbm7af1jKtvRY_TKejo9QpNx0dfz6YjlOVdMXeWq2PwweOL78fMFDpiv2NqtOy4fLYSGN9NdIaU2Fp7_5v-J7oVJcr-54hSkUfWlt8AIww7u4
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=O-114%E2%80%85Natural+language+processing+as+a+tool+for+developing+and+updating+job+exposure+matrices+for+chemical+exposures+in+the+general+population&rft.jtitle=Occupational+and+environmental+medicine+%28London%2C+England%29&rft.au=Basinas%2C+Ioannis&rft.au=Thompson%2C+Paul&rft.au=Xie%2C+Qianqian&rft.au=Annaniadou%2C+Sophia&rft.date=2023-03-01&rft.pub=BMJ+Publishing+Group+LTD&rft.issn=1351-0711&rft.eissn=1470-7926&rft.volume=80&rft.issue=Suppl+1&rft.spage=A8&rft.epage=A8&rft_id=info:doi/10.1136%2FOEM-2023-EPICOH.19&rft.externalDBID=HAS_PDF_LINK
thumbnail_l http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=1351-0711&client=summon
thumbnail_m http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=1351-0711&client=summon
thumbnail_s http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=1351-0711&client=summon