Selecting, optimizing and externally validating a preexisting machine-learning regression algorithm for estimating waist circumference

Obesity, typically defined by the body mass index (BMI), has well known negative health effects. However, the BMI has serious deficiencies in predicting the adverse risks associated to obesity. Waist circumference (WC) is an alternative to define obesity and a better disease predictor according to t...

Full description

Saved in:
Bibliographic Details
Published inComputers in biology and medicine Vol. 169; p. 107909
Main Author Phillips-Farfán, Bryan V.
Format Journal Article
LanguageEnglish
Published United States Elsevier Ltd 01.02.2024
Elsevier Limited
Subjects
Online AccessGet full text
ISSN0010-4825
1879-0534
1879-0534
DOI10.1016/j.compbiomed.2023.107909

Cover

More Information
Summary:Obesity, typically defined by the body mass index (BMI), has well known negative health effects. However, the BMI has serious deficiencies in predicting the adverse risks associated to obesity. Waist circumference (WC) is an alternative to define obesity and a better disease predictor according to the literature. However, old databases often lack this information, it is inaccurate (collected via self-report) or it is incomplete. Thus, this study accurately assesses WC using machine learning. The novel approaches are: 1) predictor variables (weight, height, age and sex) likely to appear in most data sets are used. 2) Publicly available data (including non-adults) and algorithms are used. 3) Systematic methods for data cleanup, model selection, hyperparameter optimization and external validation are performed. one variable per column, no special codes, missing values or outliers. Preexisting regression algorithms are gaged by cross-validation, using one data set. The hyperparameters of the best performing algorithm are optimized. The tuned algorithm is externally validated with other data sets by cross-validation. In spite of the limited number of features, the tuned algorithm outperforms prior WC approximations, using the same or similar predictor variables. The tuned algorithm enables using data where WC is not measured, is incomplete or is unreliable. A similar approach would be useful to estimate other variables of interest. [Display omitted] •Predictors (weight, height, age, sex) likely to appear in most data sets are used.•Publicly available data (including non-adults) and algorithms are used.•Novel data cleanup, model selection, hyperparameter tuning and external validation.•The tuned algorithm outperforms prior WC estimates, using the same variables.
Bibliography:ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 14
content type line 23
ISSN:0010-4825
1879-0534
1879-0534
DOI:10.1016/j.compbiomed.2023.107909