Enhancing body fat prediction with WGAN-GP data augmentation and XGBoost algorithm
Background and Objective Machine learning models offer a practical approach for estimating body fat percentage from simple anthropometric data. However, the scarcity of biomedical data frequently leads to model overfitting, compromising predictive accuracy. Generative data augmentation presents a pr...
Saved in:
| Published in | Science progress (1916) Vol. 108; no. 3; p. 368504251366850 |
|---|---|
| Main Authors | , |
| Format | Journal Article |
| Language | English |
| Published |
London, England
SAGE Publications
01.07.2025
Sage Publications Ltd |
| Subjects | |
| Online Access | Get full text |
| ISSN | 0036-8504 2047-7163 2047-7163 |
| DOI | 10.1177/00368504251366850 |
Cover
| Summary: | Background and Objective
Machine learning models offer a practical approach for estimating body fat percentage from simple anthropometric data. However, the scarcity of biomedical data frequently leads to model overfitting, compromising predictive accuracy. Generative data augmentation presents a promising strategy to address this limitation. This study develops and evaluates a generative data augmentation framework to enhance body fat prediction from limited anthropometric data.
Methods
A public dataset comprising 249 male subjects was partitioned into development (80%) and test (20%) sets. The fidelity of Wasserstein Generative Adversarial Network with Gradient Penalty (WGAN-GP), random noise injection, and mixup was compared to select the optimal method. Subsequently, XGBoost, Support Vector Regression, and Multi-layer Perceptron models were trained and validated, comparing performance with and without the selected augmentation. Final model generalization was assessed on the independent test set using the coefficient of determination (R²), Mean Absolute Error, and Root Mean Squared Error.
Results
Among the evaluated augmentation techniques, the WGAN-GP generated synthetic data with the highest fidelity. On the original data, the baseline XGBoost model achieved a R² of 0.67; this performance increased to 0.77 on the test set when using WGAN-GP augmentation. Feature importance analysis of the final model identified abdominal circumference as the most significant predictor of body fat percentage.
Conclusion
The WGAN-GP is a highly effective method for generating realistic synthetic anthropometric data. Integrating these synthetic samples into the training pipeline substantially improves the generalization and predictive accuracy of machine learning models. This methodology offers a robust solution for developing more accurate and accessible predictive health models in data-scarce environments. |
|---|---|
| Bibliography: | ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14 content type line 23 |
| ISSN: | 0036-8504 2047-7163 2047-7163 |
| DOI: | 10.1177/00368504251366850 |