Chen Kening, Zheng Fangjieyi, Zhang Xiaoqian, Wang Qiong, Zhang Zhixin, Niu Wenquan
China-Japan Friendship Hospital, Chinese Academy of Medical Sciences & Peking Union Medical College, Beijing, China.
Center for Evidence-Based Medicine, Capital Institute of Pediatrics, Beijing, China.
J Glob Health. 2025 Feb 7;15:04013. doi: 10.7189/jogh.15.04013.
Factors underlying the development of childhood underweight, overweight, and obesity are not fully understood. Traditional models have drawbacks in handling large-scale, high-dimensional, and nonlinear data. In this study, we aimed to identify factors responsible for underweight, overweight, and obesity using machine learning methods among Chinese children.
Our study participants were children aged 3-14 from 30 kindergartens and 26 schools in Beijing and Tangshan. Weight status was defined per the World Health Organization criteria. We implemented three ensemble learning algorithms and compared their performance and ranked the contributing factors by importance and identified an optimal set. A user-friendly web application was developed to calculate the predicted probability of childhood underweight, overweight, and obesity.
We analysed data from 18 503 children aged 3-14, including 1798 underweight, 10 579 of normal weight, 3257 overweight, and 2869 with obesity. Of all algorithms, random forest performed the best, with the area under the receiver operating characteristic reaching 0.759 for underweight, 0.806 for overweight, and 0.849 for obesity, with other metrics also reinforcing this algorithm. Further cumulative analyses showed that, for underweight, the optimal set of six factors included maternal body mass index (BMI), age, paternal BMI, maternal reproductive age, paternal reproductive age, and birth weight. The optimal set for overweight comprised of five factors: age, fast food intake, maternal BMI, paternal BMI, and sedentary time. For obesity, the optimal set included six factors: age, fast food intake, maternal BMI, paternal BMI, sedentary time, and maternal reproductive age. Further logistic regression analyses confirmed the predictive capability of individual top factors.
Our findings indicate that random forest is the best ensemble learning algorithm for predicting underweight, overweight, and obesity in children aged 3-14 years. We identified the optimal set of significant factors for each malnutrition status and incorporated them into a web application to support the application of this study's findings.
儿童体重过轻、超重和肥胖发展的潜在因素尚未完全明确。传统模型在处理大规模、高维及非线性数据方面存在缺陷。在本研究中,我们旨在运用机器学习方法在中国儿童中识别导致体重过轻、超重和肥胖的因素。
我们的研究参与者为来自北京和唐山30所幼儿园及26所学校的3至14岁儿童。体重状况依据世界卫生组织标准定义。我们实施了三种集成学习算法,比较它们的性能,并按重要性对影响因素进行排名,确定了最优组合。开发了一个用户友好的网络应用程序来计算儿童体重过轻、超重和肥胖的预测概率。
我们分析了18503名3至14岁儿童的数据,其中包括1798名体重过轻儿童、10579名体重正常儿童、3257名超重儿童和2869名肥胖儿童。在所有算法中,随机森林表现最佳,其在体重过轻方面的受试者工作特征曲线下面积达到0.759,超重方面为0.806,肥胖方面为0.849,其他指标也支持该算法。进一步的累积分析表明,对于体重过轻,六个因素的最优组合包括母亲体重指数(BMI)、年龄、父亲BMI、母亲生育年龄、父亲生育年龄和出生体重。超重的最优组合由五个因素组成:年龄、快餐摄入量、母亲BMI、父亲BMI和久坐时间。对于肥胖症,最优组合包括六个因素:年龄、快餐摄入量、母亲BMI、父亲BMI、久坐时间和母亲生育年龄。进一步的逻辑回归分析证实了各个首要因素的预测能力。
我们的研究结果表明,随机森林是预测3至14岁儿童体重过轻、超重和肥胖的最佳集成学习算法。我们确定了每种营养不良状况的显著因素最优组合,并将其纳入网络应用程序,以支持本研究结果的应用。