School of Biodiversity, One Health & Veterinary Medicine, College of Medical, Veterinary, and Life Sciences, University of Glasgow, Glasgow, United Kingdom.
Medical Research Council - University of Glasgow Centre for Virus Research, Glasgow, United Kingdom.
Elife. 2022 Nov 23;11:e80329. doi: 10.7554/eLife.80329.
Transmission of SARS-CoV-2 from humans to other species threatens wildlife conservation and may create novel sources of viral diversity for future zoonotic transmission. A variety of computational heuristics have been developed to pre-emptively identify susceptible host species based on variation in the angiotensin-converting enzyme 2 (ACE2) receptor used for viral entry. However, the predictive performance of these heuristics remains unknown. Using a newly compiled database of 96 species, we show that, while variation in ACE2 can be used by machine learning models to accurately predict animal susceptibility to sarbecoviruses (accuracy = 80.2%, binomial confidence interval [CI]: 70.8-87.6%), the sites informing predictions have no known involvement in virus binding and instead recapitulate host phylogeny. Models trained on host phylogeny alone performed equally well (accuracy = 84.4%, CI: 75.5-91.0%) and at a level equivalent to retrospective assessments of accuracy for previously published models. These results suggest that the predictive power of ACE2-based models derives from strong correlations with host phylogeny rather than processes which can be mechanistically linked to infection biology. Further, biased availability of ACE2 sequences misleads projections of the number and geographic distribution of at-risk species. Models based on host phylogeny reduce this bias, but identify a very large number of susceptible species, implying that model predictions must be combined with local knowledge of exposure risk to practically guide surveillance. Identifying barriers to viral infection or onward transmission beyond receptor binding and incorporating data which are independent of host phylogeny will be necessary to manage the ongoing risk of establishment of novel animal reservoirs of SARS-CoV-2.
SARS-CoV-2 从人类传播到其他物种威胁着野生动物的保护,并可能为未来的人畜共患病传播创造新的病毒多样性来源。已经开发了各种计算启发式方法,以便根据用于病毒进入的血管紧张素转换酶 2(ACE2)受体的变异,预先识别易感宿主物种。然而,这些启发式方法的预测性能仍然未知。使用新编制的 96 个物种数据库,我们表明,尽管 ACE2 的变异可以被机器学习模型用于准确预测 SARS-CoV-2 对动物的易感性(准确率=80.2%,二项式置信区间[CI]:70.8-87.6%),但提供预测信息的位点与病毒结合没有已知的关联,而是重现了宿主的系统发育。仅基于宿主系统发育训练的模型表现同样出色(准确率=84.4%,CI:75.5-91.0%),并且与以前发表的模型的回顾性评估准确率相当。这些结果表明,基于 ACE2 的模型的预测能力源于与宿主系统发育的强相关性,而不是可以与感染生物学机制相关联的过程。此外,ACE2 序列的选择性可用性会误导对高风险物种的数量和地理分布的预测。基于宿主系统发育的模型减少了这种偏差,但识别出了大量易感物种,这意味着模型预测必须与对暴露风险的局部了解相结合,以便实际指导监测。除了受体结合之外,识别病毒感染或传播的障碍,以及纳入独立于宿主系统发育的数据,对于管理 SARS-CoV-2 新型动物储存库建立的持续风险是必要的。