A general integrated machine learning pipeline: Its concept, main steps and application in shear strength prediction of RC beams strengthened with FRCM
•A generic working pipeline is proposed to address concerns when using machine learning in civil engineering.•The shear capacity of FRCM strengthened RC beams is predicted as a demonstration.•The problem of limited training data is alleviated.•The efficacy of machine learning models is boosted throu...
Saved in:
| Published in | Engineering structures Vol. 281; p. 115749 |
|---|---|
| Main Authors | , |
| Format | Journal Article |
| Language | English |
| Published |
Elsevier Ltd
15.04.2023
|
| Subjects | |
| Online Access | Get full text |
| ISSN | 0141-0296 |
| DOI | 10.1016/j.engstruct.2023.115749 |
Cover
| Summary: | •A generic working pipeline is proposed to address concerns when using machine learning in civil engineering.•The shear capacity of FRCM strengthened RC beams is predicted as a demonstration.•The problem of limited training data is alleviated.•The efficacy of machine learning models is boosted through transfer learning.
Data are the fibers out of which the extraordinary capabilities of machine learning (ML) are woven. But despite having gained widespread attention in civil engineering, ML still faces several overarching issues: small and incomplete training datasets, questionable generalization ability, and lack of physical interpretability. In fact, over-fitting is almost inescapable for models trained with limited data. Furthermore, the “black box” effect sharply restricts the use of ML. To deal with those issues, we here put forward a general integrated ML pipeline that, with the help of transfer learning and synthetic data augmentation, remains not only robust to highly limited data, but also enables engineers to carry transparent and interpretable analysis. To showcase the resulting advantages, the shear capacity of reinforced concrete (RC) beams strengthened with fiber-reinforced cementitious matrix (FRCM) is predicted as a demonstration. An experimental dataset containing only 91 tests is supplemented with data from a synthetic data generator, Synthpop. Six baseline ML algorithms are evaluated to seek out the most suitable model for predicting the shear capacity. It turns out that the GBDT model performs the best among all the algorithms assessed. To further boost its efficacy, a transfer learning algorithm, two-stage TrAdaboost, is modified to enhance the GBDT model which is taken as a base learner via re-weighting the synthetic data points. The resulting emulator is termed TrAGBDT. Thereafter, the SHapley Additive exPlanation (SHAP) approach is utilized which lends itself ideally to deciphering the mechanism of the TrAGBDT. A graphic user interface is also provided along with the pertinent Python-based source codes, thereby allowing users and developers to access the aforesaid emulator without any barrier. The findings of this study indicate that over-fitting is largely balanced out for the TrAGBDT, which greatly improves the extendibility and interpretability of the proposed ML-centric pipeline. Hence it can be a viable solution to promote ML applications in civil engineering practice. |
|---|---|
| ISSN: | 0141-0296 |
| DOI: | 10.1016/j.engstruct.2023.115749 |