Selecting, optimizing and externally validating a preexisting machine-learning regression algorithm for estimating waist circumference
Obesity, typically defined by the body mass index (BMI), has well known negative health effects. However, the BMI has serious deficiencies in predicting the adverse risks associated to obesity. Waist circumference (WC) is an alternative to define obesity and a better disease predictor according to t...
Saved in:
| Published in | Computers in biology and medicine Vol. 169; p. 107909 |
|---|---|
| Main Author | |
| Format | Journal Article |
| Language | English |
| Published |
United States
Elsevier Ltd
01.02.2024
Elsevier Limited |
| Subjects | |
| Online Access | Get full text |
| ISSN | 0010-4825 1879-0534 1879-0534 |
| DOI | 10.1016/j.compbiomed.2023.107909 |
Cover
| Summary: | Obesity, typically defined by the body mass index (BMI), has well known negative health effects. However, the BMI has serious deficiencies in predicting the adverse risks associated to obesity. Waist circumference (WC) is an alternative to define obesity and a better disease predictor according to the literature. However, old databases often lack this information, it is inaccurate (collected via self-report) or it is incomplete. Thus, this study accurately assesses WC using machine learning. The novel approaches are: 1) predictor variables (weight, height, age and sex) likely to appear in most data sets are used. 2) Publicly available data (including non-adults) and algorithms are used. 3) Systematic methods for data cleanup, model selection, hyperparameter optimization and external validation are performed.
one variable per column, no special codes, missing values or outliers. Preexisting regression algorithms are gaged by cross-validation, using one data set. The hyperparameters of the best performing algorithm are optimized. The tuned algorithm is externally validated with other data sets by cross-validation. In spite of the limited number of features, the tuned algorithm outperforms prior WC approximations, using the same or similar predictor variables. The tuned algorithm enables using data where WC is not measured, is incomplete or is unreliable. A similar approach would be useful to estimate other variables of interest.
[Display omitted]
•Predictors (weight, height, age, sex) likely to appear in most data sets are used.•Publicly available data (including non-adults) and algorithms are used.•Novel data cleanup, model selection, hyperparameter tuning and external validation.•The tuned algorithm outperforms prior WC estimates, using the same variables. |
|---|---|
| Bibliography: | ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14 content type line 23 |
| ISSN: | 0010-4825 1879-0534 1879-0534 |
| DOI: | 10.1016/j.compbiomed.2023.107909 |