Faculty of Pharmacy, Medical University-Sofia, 1000 Sofia, Bulgaria.
Int J Mol Sci. 2024 Mar 3;25(5):2949. doi: 10.3390/ijms25052949.
Since viruses are one of the main causes of infectious illnesses, prophylaxis is essential for efficient disease control. Vaccines play a pivotal role in mitigating the transmission of various viral infections and fortifying our defenses against them. The initial step in modern vaccine design and development involves the identification of potential vaccine targets through computational techniques. Here, using datasets of 1588 known viral immunogens and 468 viral non-immunogens, we apply machine learning algorithms to develop models for the prediction of protective immunogens of viral origin. The datasets are split into training and test sets in a 4:1 ratio. The protein structures are encoded by E-descriptors and transformed into uniform vectors by the auto- and cross-covariance methods. The most relevant descriptors are selected by the gain/ratio technique. The models generated by Random Forest, Multilayer Perceptron, and XGBoost algorithms demonstrate superior predictive performance on the test sets, surpassing predictions made by VaxiJen 2.0-an established gold standard in viral immunogenicity prediction. The key attributes determining immunogenicity in viral proteins are specific fingerprints in hydrophobicity and steric properties.
由于病毒是传染病的主要原因之一,因此预防是有效控制疾病的关键。疫苗在减轻各种病毒感染的传播和增强我们对它们的防御方面起着关键作用。现代疫苗设计和开发的第一步是通过计算技术识别潜在的疫苗靶标。在这里,我们使用了 1588 种已知的病毒免疫原和 468 种病毒非免疫原的数据集,应用机器学习算法来开发预测病毒来源的保护性免疫原的模型。数据集以 4:1 的比例分为训练集和测试集。蛋白质结构由 E-描述符编码,并通过自协方差和互协方差方法转换为均匀向量。通过增益/比率技术选择最相关的描述符。随机森林、多层感知机和 XGBoost 算法生成的模型在测试集上表现出优越的预测性能,超过了 VaxiJen 2.0 的预测——这是病毒免疫原性预测的既定黄金标准。决定病毒蛋白免疫原性的关键属性是疏水性和空间性质的特定指纹。