Li Shaokang, Li Zheng, Zhang Peijian, Qu Aili
College of Computer Science and Technology, Qingdao University, Qingdao 266071, China.
School of Economics, Qingdao University, Qingdao 266071, China.
Int J Mol Sci. 2025 Aug 29;26(17):8423. doi: 10.3390/ijms26178423.
Cathepsin L (CatL) is a critical protease involved in cleaving the spike protein of severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2), facilitating viral entry into host cells. Inhibition of CatL is essential for preventing SARS-CoV-2 cell entry, making it a potential therapeutic target for drug development. Six QSAR models were established to predict the inhibitory activity (expressed as IC values) of candidate compounds against CatL. These models were developed using statistical method heuristic methods (HMs), the evolutionary algorithm gene expression programming (GEP), and the ensemble method random forest (RF), along with the kernel-based machine learning algorithm support vector regression (SVR) configured with various kernels: radial basis function (RBF), linear-RBF hybrid (LMIX2-SVR), and linear-RBF-polynomial hybrid (LMIX3-SVR). The particle swarm optimization algorithm was applied to optimize multi-parameter SVM models, ensuring low complexity and fast convergence. The properties of novel CatL inhibitors were explored through molecular docking analysis. The LMIX3-SVR model exhibited the best performance, with an R2 of 0.9676 and 0.9632 for the training set and test set and RMSE values of 0.0834 and 0.0322. Five-fold cross-validation R5-fold2 = 0.9043 and leave-one-out cross-validation Rloo2 = 0.9525 demonstrated the strong prediction ability and robustness of the model, which fully proved the correctness of the five selected descriptors. Based on these results, the IC values of 578 newly designed compounds were predicted using the HM model, and the top five candidate compounds with the best physicochemical properties were further verified by Property Explorer Applet (PEA). The LMIX3-SVR model significantly advances QSAR modeling for drug discovery, providing a robust tool for designing and screening new drug molecules. This study contributes to the identification of novel CatL inhibitors, which aids in the development of effective therapeutics for SARS-CoV-2.
组织蛋白酶L(CatL)是一种关键蛋白酶,参与切割严重急性呼吸综合征冠状病毒2(SARS-CoV-2)的刺突蛋白,促进病毒进入宿主细胞。抑制CatL对于防止SARS-CoV-2进入细胞至关重要,使其成为药物开发的潜在治疗靶点。建立了六个定量构效关系(QSAR)模型,以预测候选化合物对CatL的抑制活性(以IC值表示)。这些模型是使用统计方法启发式方法(HMs)、进化算法基因表达式编程(GEP)和集成方法随机森林(RF),以及配置有各种核的基于核的机器学习算法支持向量回归(SVR)开发的:径向基函数(RBF)、线性-RBF混合(LMIX2-SVR)和线性-RBF-多项式混合(LMIX3-SVR)。应用粒子群优化算法优化多参数支持向量机模型,确保低复杂度和快速收敛。通过分子对接分析探索新型CatL抑制剂的性质。LMIX3-SVR模型表现出最佳性能,训练集和测试集的R2分别为0.9676和0.9632,均方根误差(RMSE)值分别为0.0834和0.0322。五折交叉验证R5-fold2 = 0.9043和留一法交叉验证Rloo2 = 0.9525证明了该模型具有很强的预测能力和稳健性,充分证明了所选五个描述符的正确性。基于这些结果,使用HM模型预测了578种新设计化合物的IC值,并通过属性探索器小程序(PEA)进一步验证了具有最佳物理化学性质的前五种候选化合物。LMIX3-SVR模型显著推进了药物发现的QSAR建模,为设计和筛选新的药物分子提供了一个强大的工具。这项研究有助于鉴定新型CatL抑制剂,有助于开发针对SARS-CoV-2的有效疗法。