Enhancing body fat prediction with WGAN-GP data augmentation and XGBoost algorithm

Background and Objective Machine learning models offer a practical approach for estimating body fat percentage from simple anthropometric data. However, the scarcity of biomedical data frequently leads to model overfitting, compromising predictive accuracy. Generative data augmentation presents a pr...

Full description

Saved in:

Bibliographic Details
Published in	Science progress (1916) Vol. 108; no. 3; p. 368504251366850
Main Authors	Wang, Xiangyu, Chang, Shuai
Format	Journal Article
Language	English
Published	London, England SAGE Publications 01.07.2025 Sage Publications Ltd
Subjects	Accuracy Adipose Tissue Adult Algorithms Anthropometry Biomedical data Body fat Boosting Machine Learning Algorithms Data augmentation Generative adversarial networks Humans Learning algorithms Machine Learning Male Multilayer perceptrons Multilayers Neural Networks, Computer Random noise Regression analysis Support vector machines Synthetic data Test sets Body fat percentage generative adversarial network data augmentation XGBoost anthropometry
Online Access	Get full text
ISSN	0036-8504 2047-7163 2047-7163
DOI	10.1177/00368504251366850

Cover

More Information
Summary:	Background and Objective Machine learning models offer a practical approach for estimating body fat percentage from simple anthropometric data. However, the scarcity of biomedical data frequently leads to model overfitting, compromising predictive accuracy. Generative data augmentation presents a promising strategy to address this limitation. This study develops and evaluates a generative data augmentation framework to enhance body fat prediction from limited anthropometric data. Methods A public dataset comprising 249 male subjects was partitioned into development (80%) and test (20%) sets. The fidelity of Wasserstein Generative Adversarial Network with Gradient Penalty (WGAN-GP), random noise injection, and mixup was compared to select the optimal method. Subsequently, XGBoost, Support Vector Regression, and Multi-layer Perceptron models were trained and validated, comparing performance with and without the selected augmentation. Final model generalization was assessed on the independent test set using the coefficient of determination (R²), Mean Absolute Error, and Root Mean Squared Error. Results Among the evaluated augmentation techniques, the WGAN-GP generated synthetic data with the highest fidelity. On the original data, the baseline XGBoost model achieved a R² of 0.67; this performance increased to 0.77 on the test set when using WGAN-GP augmentation. Feature importance analysis of the final model identified abdominal circumference as the most significant predictor of body fat percentage. Conclusion The WGAN-GP is a highly effective method for generating realistic synthetic anthropometric data. Integrating these synthetic samples into the training pipeline substantially improves the generalization and predictive accuracy of machine learning models. This methodology offers a robust solution for developing more accurate and accessible predictive health models in data-scarce environments.
Bibliography:	ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14 content type line 23
ISSN:	0036-8504 2047-7163 2047-7163
DOI:	10.1177/00368504251366850