School of Computer Science and Engineering, Nanjing University of Science and Technology, Nanjing 210094, China.
Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI 48109-2218, USA.
Bioinformatics. 2019 Nov 1;35(22):4647-4655. doi: 10.1093/bioinformatics/btz291.
Contact-map of a protein sequence dictates the global topology of structural fold. Accurate prediction of the contact-map is thus essential to protein 3D structure prediction, which is particularly useful for the protein sequences that do not have close homology templates in the Protein Data Bank.
We developed a new method, ResPRE, to predict residue-level protein contacts using inverse covariance matrix (or precision matrix) of multiple sequence alignments (MSAs) through deep residual convolutional neural network training. The approach was tested on a set of 158 non-homologous proteins collected from the CASP experiments and achieved an average accuracy of 50.6% in the top-L long-range contact prediction with L being the sequence length, which is 11.7% higher than the best of other state-of-the-art approaches ranging from coevolution coupling analysis to deep neural network training. Detailed data analyses show that the major advantage of ResPRE lies at the utilization of precision matrix that helps rule out transitional noises of contact-maps compared with the previously used covariance matrix. Meanwhile, the residual network with parallel shortcut layer connections increases the learning ability of deep neural network training. It was also found that appropriate collection of MSAs can further improve the accuracy of final contact-map predictions. The standalone package and online server of ResPRE are made freely available, which should bring important impact on protein structure and function modeling studies in particular for the distant- and non-homology protein targets.
https://zhanglab.ccmb.med.umich.edu/ResPRE and https://github.com/leeyang/ResPRE.
Supplementary data are available at Bioinformatics online.
蛋白质序列的接触图决定了结构折叠的全局拓扑。因此,准确预测接触图对于蛋白质 3D 结构预测至关重要,对于在蛋白质数据库中没有密切同源模板的蛋白质序列尤其有用。
我们开发了一种新方法 ResPRE,通过深度残差卷积神经网络训练,使用多重序列比对(MSA)的逆协方差矩阵(或精度矩阵)来预测残基水平的蛋白质接触。该方法在来自 CASP 实验的 158 个非同源蛋白质集合上进行了测试,在最长 L 距离的接触预测中平均准确率为 50.6%,L 为序列长度,比从共进化耦合分析到深度神经网络训练的其他最先进方法中的最佳方法高 11.7%。详细数据分析表明,ResPRE 的主要优势在于利用精度矩阵有助于排除接触图的过渡噪声,而不是之前使用的协方差矩阵。同时,具有并行快捷层连接的残差网络增加了深度神经网络训练的学习能力。还发现,适当收集 MSA 可以进一步提高最终接触图预测的准确性。ResPRE 的独立软件包和在线服务器是免费提供的,这将对蛋白质结构和功能建模研究特别是对远距离和非同源蛋白质靶标产生重要影响。
https://zhanglab.ccmb.med.umich.edu/ResPRE 和 https://github.com/leeyang/ResPRE。
补充数据可在 Bioinformatics 在线获取。