Efficient Management of Safety Documents Using Text-Based Analytics to Extract Safety Attributes From Construction Accident Reports

The time-intensive extraction of insights from textual safety documents using conventional methods causes delays and inaccuracies, hindering proactive incident prevention in construction projects. While the architecture of large language models (LLMs) were well-studied, their deployment efficiencies...

Full description

Saved in:
Bibliographic Details
Published inIEEE access Vol. 13; pp. 99758 - 99777
Main Authors Togan, Vedat, Mostofi, Fatemeh, Behzat Tokdemir, Onur, Kadioglu, Fethi
Format Journal Article
LanguageEnglish
Published Piscataway IEEE 2025
The Institute of Electrical and Electronics Engineers, Inc. (IEEE)
Subjects
Online AccessGet full text
ISSN2169-3536
2169-3536
DOI10.1109/ACCESS.2025.3576442

Cover

More Information
Summary:The time-intensive extraction of insights from textual safety documents using conventional methods causes delays and inaccuracies, hindering proactive incident prevention in construction projects. While the architecture of large language models (LLMs) were well-studied, their deployment efficiencies were often overlooked. This study proposes DistilBERT as a more efficient text management method for extracting safety text from construction safety documents. To maintain the relevance of the extracted safety text, a dataset of 5,224 construction accident cases from 73 projects across the Euro-Asia region was compiled, where incidents were analyzed through detailed questionnaires to identify safety attributes, with term frequency-inverse document frequency (TF-IDF) analysis applied for validation. When benchmarked against conventional NLP methods and state-of-the-art LLMs such as BERT, RoBERTa, and XLNet, DistilBERT demonstrated comparable accuracy with significantly reduced computational time. Specifically, DistilBERT achieved an accuracy of 79% in severity scale classification with an F1 score of 0.72, while reducing processing time by approximately 50% compared to BERT (from 2,918.28 seconds to 1,492.08 seconds). By offering rapid inference speeds with negligible accuracy trade-offs, DistilBERT emerges as a practical tool for automating safety text extraction, making it ideal for settings with limited computational capabilities and urgent decision-making requirements. This study examines how DistilBERT can be integrated into construction safety management systems without modifying the underlying platforms. Future work should focus on API creation, secure machine learning pipelines, and optimized deployment of LLMs, particularly in complex contexts.
Bibliography:ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 14
ISSN:2169-3536
2169-3536
DOI:10.1109/ACCESS.2025.3576442