Yang Yang, Shao Aibin, Vihinen Mauno
School of Computer Science and Technology, Soochow University, Suzhou, China.
Collaborative Innovation Center of Novel Software Technology and Industrialization, Nanjing, China.
Front Mol Biosci. 2022 Jun 16;9:867572. doi: 10.3389/fmolb.2022.867572. eCollection 2022.
Genetic variations are investigated in human and many other organisms for many purposes (e.g., to aid in clinical diagnosis). Interpretation of the identified variations can be challenging. Although some dedicated prediction methods have been developed and some tools for human variants can also be used for other organisms, the performance and species range have been limited. We developed a novel variant pathogenicity/tolerance predictor for amino acid substitutions in any organism. The method, PON-All, is a machine learning tool trained on human, animal, and plant variants. Two versions are provided, one with Gene Ontology (GO) annotations and another without these details. GO annotations are not available or are partial for many organisms of interest. The methods provide predictions for three classes: pathogenic, benign, and variants of unknown significance. On the blind test, when using GO annotations, accuracy was 0.913 and MCC 0.827. When GO features were not used, accuracy was 0.856 and MCC 0.712. The performance is the best for human and plant variants and somewhat lower for animal variants because the number of known disease-causing variants in animals is rather small. The method was compared to several other tools and was found to have superior performance. PON-All is freely available at http://structure.bmc.lu.se/PON-All and http://8.133.174.28:8999/.
为了多种目的(例如辅助临床诊断),人们对人类和许多其他生物体中的基因变异进行了研究。对已识别变异的解释可能具有挑战性。尽管已经开发了一些专门的预测方法,并且一些用于人类变异的工具也可用于其他生物体,但性能和物种范围一直有限。我们开发了一种针对任何生物体中氨基酸替换的新型变异致病性/耐受性预测器。该方法名为PON-All,是一种基于人类、动物和植物变异进行训练的机器学习工具。提供了两个版本,一个带有基因本体(GO)注释,另一个没有这些详细信息。对于许多感兴趣的生物体,GO注释不可用或不完整。该方法提供三种分类的预测:致病、良性和意义未明的变异。在盲测中,使用GO注释时,准确率为0.913,马修斯相关系数(MCC)为0.827。不使用GO特征时,准确率为0.856,MCC为0.712。该方法对人类和植物变异的性能最佳,对动物变异的性能略低,因为动物中已知致病变异的数量相当少。该方法与其他几种工具进行了比较,发现具有卓越的性能。PON-All可在http://structure.bmc.lu.se/PON-All和http://8.133.174.28:8999/免费获取。