Department of Biological Sciences, Louisiana State University, Baton Rouge, LA, USA.
Center for Computation & Technology, Louisiana State University, Baton Rouge, LA, USA.
BMC Bioinformatics. 2018 Mar 9;19(1):91. doi: 10.1186/s12859-018-2109-2.
Detecting similar ligand-binding sites in globally unrelated proteins has a wide range of applications in modern drug discovery, including drug repurposing, the prediction of side effects, and drug-target interactions. Although a number of techniques to compare binding pockets have been developed, this problem still poses significant challenges.
We evaluate the performance of three algorithms to calculate similarities between ligand-binding sites, APoc, SiteEngine, and G-LoSA. Our assessment considers not only the capabilities to identify similar pockets and to construct accurate local alignments, but also the dependence of these alignments on the sequence order. We point out certain drawbacks of previously compiled datasets, such as the inclusion of structurally similar proteins, leading to an overestimated performance. To address these issues, a rigorous procedure to prepare unbiased, high-quality benchmarking sets is proposed. Further, we conduct a comparative assessment of techniques directly aligning binding pockets to indirect strategies employing structure-based virtual screening with AutoDock Vina and rDock.
Thorough benchmarks reveal that G-LoSA offers a fairly robust overall performance, whereas the accuracy of APoc and SiteEngine is satisfactory only against easy datasets. Moreover, combining various algorithms into a meta-predictor improves the performance of existing methods to detect similar binding sites in unrelated proteins by 5-10%. All data reported in this paper are freely available at https://osf.io/6ngbs/ .
在全球不相关的蛋白质中检测相似的配体结合位点在现代药物发现中有广泛的应用,包括药物再利用、预测副作用和药物-靶标相互作用。尽管已经开发了许多比较结合口袋的技术,但这个问题仍然存在很大的挑战。
我们评估了三种算法在计算配体结合位点相似性方面的性能,分别是 APoc、SiteEngine 和 G-LoSA。我们的评估不仅考虑了识别相似口袋和构建准确局部比对的能力,还考虑了这些比对与序列顺序的依赖性。我们指出了之前编译数据集的一些缺点,例如包含结构相似的蛋白质,导致性能高估。为了解决这些问题,提出了一种严格的程序来准备无偏、高质量的基准测试集。此外,我们还直接比较了结合口袋的技术与使用 AutoDock Vina 和 rDock 的基于结构的虚拟筛选的间接策略。
全面的基准测试表明,G-LoSA 提供了相当稳健的整体性能,而 APoc 和 SiteEngine 的准确性仅在简单数据集上是令人满意的。此外,将各种算法组合成一个元预测器,可以将现有方法检测不相关蛋白质中相似结合位点的性能提高 5-10%。本文报告的所有数据均可在 https://osf.io/6ngbs/ 上免费获得。