Suppr超能文献

评估残基残基接触预测方法:从回顾性到前瞻性。

Evaluation of residue-residue contact prediction methods: From retrospective to prospective.

机构信息

University of Chinese Academy of Sciences, Beijing, China.

Centre for High Performance Computing, Shenzhen Institutes of Advanced Technology, Chinese Academy of Sciences, Shenzhen, China.

出版信息

PLoS Comput Biol. 2021 May 24;17(5):e1009027. doi: 10.1371/journal.pcbi.1009027. eCollection 2021 May.

Abstract

Sequence-based residue contact prediction plays a crucial role in protein structure reconstruction. In recent years, the combination of evolutionary coupling analysis (ECA) and deep learning (DL) techniques has made tremendous progress for residue contact prediction, thus a comprehensive assessment of current methods based on a large-scale benchmark data set is very needed. In this study, we evaluate 18 contact predictors on 610 non-redundant proteins and 32 CASP13 targets according to a wide range of perspectives. The results show that different methods have different application scenarios: (1) DL methods based on multi-categories of inputs and large training sets are the best choices for low-contact-density proteins such as the intrinsically disordered ones and proteins with shallow multi-sequence alignments (MSAs). (2) With at least 5L (L is sequence length) effective sequences in the MSA, all the methods show the best performance, and methods that rely only on MSA as input can reach comparable achievements as methods that adopt multi-source inputs. (3) For top L/5 and L/2 predictions, DL methods can predict more hydrophobic interactions while ECA methods predict more salt bridges and disulfide bonds. (4) ECA methods can detect more secondary structure interactions, while DL methods can accurately excavate more contact patterns and prune isolated false positives. In general, multi-input DL methods with large training sets dominate current approaches with the best overall performance. Despite the great success of current DL methods must be stated the fact that there is still much room left for further improvement: (1) With shallow MSAs, the performance will be greatly affected. (2) Current methods show lower precisions for inter-domain compared with intra-domain contact predictions, as well as very high imbalances in precisions between intra-domains. (3) Strong prediction similarities between DL methods indicating more feature types and diversified models need to be developed. (4) The runtime of most methods can be further optimized.

摘要

基于序列的残基接触预测在蛋白质结构重建中起着至关重要的作用。近年来,进化耦合分析(ECA)和深度学习(DL)技术的结合在残基接触预测方面取得了巨大的进展,因此非常需要基于大规模基准数据集对当前方法进行全面评估。在这项研究中,我们根据广泛的视角评估了 18 种接触预测器在 610 个非冗余蛋白和 32 个 CASP13 靶标上的表现。结果表明,不同的方法有不同的应用场景:(1)基于多类别输入和大型训练集的 DL 方法是低接触密度蛋白(如无序蛋白和浅多序列比对(MSA)蛋白)的最佳选择。(2)在 MSA 中至少有 5L(L 是序列长度)有效序列时,所有方法的表现都最好,仅依赖 MSA 作为输入的方法可以达到与采用多源输入的方法相当的水平。(3)对于前 L/5 和 L/2 预测,DL 方法可以预测更多的疏水相互作用,而 ECA 方法可以预测更多的盐桥和二硫键。(4)ECA 方法可以检测更多的二级结构相互作用,而 DL 方法可以准确地挖掘更多的接触模式并修剪孤立的假阳性。总的来说,具有大型训练集的多输入 DL 方法具有最佳的整体性能,占据主导地位。尽管当前的 DL 方法取得了巨大的成功,但仍有很大的改进空间:(1)在浅 MSA 的情况下,性能将受到很大影响。(2)与域内接触预测相比,当前方法在域间接触预测上的精度较低,并且域内精度的不平衡性非常高。(3)DL 方法之间的预测相似度很高,表明需要开发更多的特征类型和多样化的模型。(4)大多数方法的运行时间可以进一步优化。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d514/8177648/0ba344310245/pcbi.1009027.g001.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验