Department of Biotechnology, Institute of Engeneering & Technology, UP Technical University, Lucknow, India.
Adv Exp Med Biol. 2011;696:223-9. doi: 10.1007/978-1-4419-7046-6_22.
Fundamental step of an adaptive immune response to pathogen or vaccine is the binding of short peptides (also called epitopes) to major histocompatibility complex (MHC) molecules. The various prediction algorithms are being used to capture the MHC peptide binding preference, allowing the rapid scan of entire pathogen proteomes for peptide likely to bind MHC, saving the cost, effort, and time. However, the number of known binders/non-binders (BNB) to a specific MHC molecule is limited in many cases, which still poses a computational challenge for prediction. The training data should be adequate to predict BNB using any machine learning approach. In this study, variable learning rate has been demonstrated for training artificial neural network and predicting BNB for small datasets. The approach can be used for large datasets as well. The dataset for different MHC class I alleles for SARS Corona virus (Tor2 Replicase polyprotein 1ab) has been used for training and prediction of BNB. A total of 90 datasets (nine different MHC class I alleles with tenfold cross validation) have been retrieved from IEDB database for BNB. For fixed learning rate approach, the best value of AROC is 0.65, and in most of the cases it is 0.5, which shows the poor predictions. In case of variable learning rate, of the 90 datasets the value of AROC for 76 datasets is between 0.806 and 1.0 and for 7 datasets the value is between 0.7 and 0.8 and for rest of 7 datasets it is between 0.5 and 0.7, which indicates very good performance in most of the cases.
适应性免疫反应针对病原体或疫苗的基本步骤是短肽(也称为表位)与主要组织相容性复合体(MHC)分子的结合。各种预测算法被用于捕获 MHC 肽结合偏好,允许快速扫描整个病原体蛋白质组中可能与 MHC 结合的肽,从而节省成本、精力和时间。然而,在许多情况下,特定 MHC 分子的已知结合物/非结合物(BNB)的数量是有限的,这仍然对预测构成计算挑战。使用任何机器学习方法进行预测,训练数据都应该足够充分。在这项研究中,已经证明可变学习率可用于训练人工神经网络并预测小数据集的 BNB。该方法也可用于大型数据集。使用来自 IEDB 数据库的 SARS 冠状病毒(Tor2 复制酶多蛋白 1ab)不同 MHC 类 I 等位基因的数据集进行 BNB 的训练和预测。总共从 IEDB 数据库检索了 90 个数据集(九个不同的 MHC 类 I 等位基因,进行十倍交叉验证)用于 BNB。对于固定学习率方法,最佳 AROC 值为 0.65,在大多数情况下为 0.5,这表明预测效果不佳。对于可变学习率,在 90 个数据集中,76 个数据集的 AROC 值在 0.806 到 1.0 之间,7 个数据集的 AROC 值在 0.7 到 0.8 之间,其余 7 个数据集的 AROC 值在 0.5 到 0.7 之间,这表明在大多数情况下性能非常好。