Suppr超能文献

利用英国生物库数据揭示临床风险因素并预测严重 COVID-19 病例:机器学习方法。

Uncovering Clinical Risk Factors and Predicting Severe COVID-19 Cases Using UK Biobank Data: Machine Learning Approach.

机构信息

School of Biomedical Sciences, The Chinese University of Hong Kong, Hong Kong, China.

KIZ-CUHK Joint Laboratory of Bioresources and Molecular Research of Common Diseases, Kunming Institute of Zoology and The Chinese University of Hong Kong, Kunming, China.

出版信息

JMIR Public Health Surveill. 2021 Sep 30;7(9):e29544. doi: 10.2196/29544.

Abstract

BACKGROUND

COVID-19 is a major public health concern. Given the extent of the pandemic, it is urgent to identify risk factors associated with disease severity. More accurate prediction of those at risk of developing severe infections is of high clinical importance.

OBJECTIVE

Based on the UK Biobank (UKBB), we aimed to build machine learning models to predict the risk of developing severe or fatal infections, and uncover major risk factors involved.

METHODS

We first restricted the analysis to infected individuals (n=7846), then performed analysis at a population level, considering those with no known infection as controls (ncontrols=465,728). Hospitalization was used as a proxy for severity. A total of 97 clinical variables (collected prior to the COVID-19 outbreak) covering demographic variables, comorbidities, blood measurements (eg, hematological/liver/renal function/metabolic parameters), anthropometric measures, and other risk factors (eg, smoking/drinking) were included as predictors. We also constructed a simplified (lite) prediction model using 27 covariates that can be more easily obtained (demographic and comorbidity data). XGboost (gradient-boosted trees) was used for prediction and predictive performance was assessed by cross-validation. Variable importance was quantified by Shapley values (ShapVal), permutation importance (PermImp), and accuracy gain. Shapley dependency and interaction plots were used to evaluate the pattern of relationships between risk factors and outcomes.

RESULTS

A total of 2386 severe and 477 fatal cases were identified. For analyses within infected individuals (n=7846), our prediction model achieved area under the receiving-operating characteristic curve (AUC-ROC) of 0.723 (95% CI 0.711-0.736) and 0.814 (95% CI 0.791-0.838) for severe and fatal infections, respectively. The top 5 contributing factors (sorted by ShapVal) for severity were age, number of drugs taken (cnt_tx), cystatin C (reflecting renal function), waist-to-hip ratio (WHR), and Townsend deprivation index (TDI). For mortality, the top features were age, testosterone, cnt_tx, waist circumference (WC), and red cell distribution width. For analyses involving the whole UKBB population, AUCs for severity and fatality were 0.696 (95% CI 0.684-0.708) and 0.825 (95% CI 0.802-0.848), respectively. The same top 5 risk factors were identified for both outcomes, namely, age, cnt_tx, WC, WHR, and TDI. Apart from the above, age, cystatin C, TDI, and cnt_tx were among the top 10 across all 4 analyses. Other diseases top ranked by ShapVal or PermImp were type 2 diabetes mellitus (T2DM), coronary artery disease, atrial fibrillation, and dementia, among others. For the "lite" models, predictive performances were broadly similar, with estimated AUCs of 0.716, 0.818, 0.696, and 0.830, respectively. The top ranked variables were similar to above, including age, cnt_tx, WC, sex (male), and T2DM.

CONCLUSIONS

We identified numerous baseline clinical risk factors for severe/fatal infection by XGboost. For example, age, central obesity, impaired renal function, multiple comorbidities, and cardiometabolic abnormalities may predispose to poorer outcomes. The prediction models may be useful at a population level to identify those susceptible to developing severe/fatal infections, facilitating targeted prevention strategies. A risk-prediction tool is also available online. Further replications in independent cohorts are required to verify our findings.

摘要

背景

COVID-19 是一个主要的公共卫生问题。鉴于大流行的程度,迫切需要确定与疾病严重程度相关的风险因素。更准确地预测那些有发展为严重感染风险的人具有重要的临床意义。

目的

基于英国生物库(UKBB),我们旨在建立机器学习模型来预测发生严重或致命感染的风险,并揭示涉及的主要风险因素。

方法

我们首先将分析仅限于感染个体(n=7846),然后在人群水平上进行分析,将没有已知感染的个体视为对照(ncontrols=465728)。住院被用作严重程度的替代指标。总共纳入了 97 个临床变量(在 COVID-19 爆发前收集),包括人口统计学变量、合并症、血液测量(例如,血液学/肝脏/肾脏功能/代谢参数)、人体测量学指标和其他风险因素(例如,吸烟/饮酒)作为预测因子。我们还构建了一个简化(lite)预测模型,使用 27 个更容易获得的协变量(人口统计学和合并症数据)。使用 XGBoost(梯度增强树)进行预测,并通过交叉验证评估预测性能。通过 Shapley 值(ShapVal)、排列重要性(PermImp)和准确性增益来量化变量的重要性。使用 Shapley 依赖和交互图来评估风险因素和结果之间的关系模式。

结果

总共确定了 2386 例严重感染和 477 例致命感染病例。对于感染个体内的分析(n=7846),我们的预测模型在严重感染和致命感染方面的接收者操作特征曲线下面积(AUC-ROC)分别为 0.723(95% CI 0.711-0.736)和 0.814(95% CI 0.791-0.838)。严重程度的前 5 个主要贡献因素(按 ShapVal 排序)是年龄、服用的药物数量(cnt_tx)、半胱氨酸蛋白酶抑制剂 C(反映肾功能)、腰臀比(WHR)和汤森剥夺指数(TDI)。对于死亡率,前 5 个主要特征是年龄、睾丸激素、cnt_tx、腰围(WC)和红细胞分布宽度。对于整个 UKBB 人群的分析,严重程度和死亡率的 AUC 分别为 0.696(95% CI 0.684-0.708)和 0.825(95% CI 0.802-0.848)。严重和致命感染的相同前 5 个风险因素被确定,即年龄、cnt_tx、WC、WHR 和 TDI。除了上述因素外,年龄、半胱氨酸蛋白酶抑制剂 C、TDI 和 cnt_tx 也是所有 4 项分析中排名前 10 的因素。ShapVal 或 PermImp 排名较高的其他疾病包括 2 型糖尿病(T2DM)、冠状动脉疾病、心房颤动和痴呆症等。对于“lite”模型,预测性能大致相似,估计的 AUC 分别为 0.716、0.818、0.696 和 0.830。排名最高的变量与上述变量相似,包括年龄、cnt_tx、WC、性别(男性)和 T2DM。

结论

我们通过 XGBoost 确定了许多严重/致命感染的基线临床风险因素。例如,年龄、中心性肥胖、肾功能受损、多种合并症和心脏代谢异常可能导致不良结局。预测模型可能在人群水平上有用,以识别那些容易发生严重/致命感染的人,从而促进有针对性的预防策略。也提供了在线风险预测工具。需要在独立队列中进行进一步验证,以验证我们的发现。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ac26/8485986/ab345e3b5416/publichealth_v7i9e29544_fig1.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验