Department of Management Science and Engineering, Stanford University School of Engineering, Stanford, California.
Department of Preoperative Services, Lucile Packard Children's Hospital Stanford, Stanford, California.
JAMA Netw Open. 2019 Apr 5;2(4):e192884. doi: 10.1001/jamanetworkopen.2019.2884.
Obesity is a leading cause of high health care expenditures, disability, and premature mortality. Previous studies have documented geographic disparities in obesity prevalence.
To identify county-level factors associated with obesity using traditional epidemiologic and machine learning methods.
DESIGN, SETTING, AND PARTICIPANTS: Cross-sectional study using linear regression models and machine learning models to evaluate the associations between county-level obesity and county-level demographic, socioeconomic, health care, and environmental factors from summarized statistical data extracted from the 2018 Robert Wood Johnson Foundation County Health Rankings and merged with US Census data from each of 3138 US counties. The explanatory power of the linear multivariate regression and the top performing machine learning model were compared using mean R2 measured in 30-fold cross validation.
County-level demographic factors (population; rural status; census region; and race/ethnicity, sex, and age composition), socioeconomic factors (median income, unemployment rate, and percentage of population with some college education), health care factors (rate of uninsured adults and primary care physicians), and environmental factors (access to healthy foods and access to exercise opportunities).
County-level obesity prevalence in 2018, its association with each county-level factor, and the percentage of variation in county-level obesity prevalence explained by linear multivariate and gradient boosting machine regression measured with R2.
Among the 3138 counties studied, the mean (range) obesity prevalence was 31.5% (12.8%-47.8%). In multivariate regressions, demographic factors explained 44.9% of variation in obesity prevalence; socioeconomic factors, 33.0%; environmental factors, 15.5%; and health care factors, 9.1%. The county-level factors with the strongest association with obesity were census region, median household income, and percentage of population with some college education. R2 values of univariate regressions of obesity prevalence were 0.238 for census region, 0.218 for median household income, and 0.160 for percentage of population with some college education. Multivariate linear regression and gradient boosting machine regression (the best-performing machine learning model) of obesity prevalence using all county-level demographic, socioeconomic, health care, and environmental factors had R2 values of 0.58 and 0.66, respectively (P < .001).
Obesity prevalence varies significantly between counties. County-level demographic, socioeconomic, health care, and environmental factors explain the majority of variation in county-level obesity prevalence. Using machine learning models may explain significantly more of the variation in obesity prevalence..
肥胖是导致高额医疗支出、残疾和过早死亡的主要原因之一。先前的研究记录了肥胖流行在地理上的差异。
使用传统的流行病学和机器学习方法确定与肥胖相关的县级因素。
设计、设置和参与者:使用线性回归模型和机器学习模型的横断面研究,评估从 2018 年罗伯特伍德约翰逊基金会县健康排名中提取的汇总统计数据和与美国每 3138 个县的美国人口普查数据合并的县级肥胖与县级人口统计学、社会经济、医疗保健和环境因素之间的关联。使用 30 折交叉验证测量的平均 R2 比较了线性多元回归和表现最佳的机器学习模型的解释能力。
县级人口统计学因素(人口;农村状况;人口普查区域;以及种族/族裔、性别和年龄构成)、社会经济因素(中位数收入、失业率和具有一定大学教育程度的人口比例)、医疗保健因素(未参保成年人和初级保健医生的比例)和环境因素(获得健康食品和获得锻炼机会的机会)。
2018 年县级肥胖流行率,其与每个县级因素的关联,以及线性多元回归和梯度提升机回归测量的县级肥胖流行率变化的百分比用 R2 表示。
在所研究的 3138 个县中,肥胖流行率的平均值(范围)为 31.5%(12.8%-47.8%)。在多元回归中,人口统计学因素解释了肥胖流行率变化的 44.9%;社会经济因素占 33.0%;环境因素占 15.5%;医疗保健因素占 9.1%。与肥胖关系最密切的县级因素是人口普查区、家庭收入中位数和具有一定大学教育程度的人口比例。肥胖流行率的单变量回归的 R2 值分别为人口普查区 0.238、家庭收入中位数 0.218 和具有一定大学教育程度的人口比例 0.160。使用所有县级人口统计学、社会经济、医疗保健和环境因素的肥胖流行率的多元线性回归和梯度提升机回归(表现最佳的机器学习模型)的 R2 值分别为 0.58 和 0.66(P<.001)。
县与县之间的肥胖流行率差异显著。县级人口统计学、社会经济、医疗保健和环境因素解释了县级肥胖流行率的大部分变化。使用机器学习模型可能会显著更多地解释肥胖流行率的变化。