Science for Life Laboratory and Department of Biochemistry and Biophysics, Stockholm University, Stockholm 10691, Sweden.
Department of Chemistry, Vanderbilt University, Nashville, TN, USA.
Bioinformatics. 2017 Sep 15;33(18):2859-2866. doi: 10.1093/bioinformatics/btx332.
A few years ago it was shown that by using a maximum entropy approach to describe couplings between columns in a multiple sequence alignment it is possible to significantly increase the accuracy of residue contact predictions. For very large protein families with more than 1000 effective sequences the accuracy is sufficient to produce accurate models of proteins as well as complexes. Today, for about half of all Pfam domain families no structure is known, but unfortunately most of these families have at most a few hundred members, i.e. are too small for such contact prediction methods.
To extend accurate contact predictions to the thousands of smaller protein families we present PconsC3, a fast and improved method for protein contact predictions that can be used for families with even 100 effective sequence members. PconsC3 outperforms direct coupling analysis (DCA) methods significantly independent on family size, secondary structure content, contact range, or the number of selected contacts.
PconsC3 is available as a web server and downloadable version at http://c3.pcons.net . The downloadable version is free for all to use and licensed under the GNU General Public License, version 2. At this site contact predictions for most Pfam families are also available. We do estimate that more than 4000 contact maps for Pfam families of unknown structure have more than 50% of the top-ranked contacts predicted correctly.
Supplementary data are available at Bioinformatics online.
几年前,有人证明通过使用最大熵方法来描述多序列比对中列之间的耦合,可以显著提高残基接触预测的准确性。对于具有 1000 多个有效序列的非常大的蛋白质家族,其准确性足以产生蛋白质和复合物的准确模型。如今,大约一半的 Pfam 结构域家族都没有结构,但不幸的是,这些家族中的大多数只有几百个成员,即对于这种接触预测方法来说太小了。
为了将准确的接触预测扩展到数千个较小的蛋白质家族,我们提出了 PconsC3,这是一种快速改进的蛋白质接触预测方法,可用于具有 100 个有效序列成员的家族。PconsC3 在独立于家族大小、二级结构含量、接触范围或所选接触数量的情况下,明显优于直接耦合分析(DCA)方法。
PconsC3 可作为网络服务器和可下载版本在 http://c3.pcons.net 使用。可下载版本可供所有人免费使用,并根据 GNU 通用公共许可证第 2 版获得许可。在这个站点上,还可以为大多数 Pfam 家族提供接触预测。我们估计,对于未知结构的 Pfam 家族的 4000 多个接触图中,超过 50%的顶级接触预测是正确的。
补充数据可在在线生物信息学中获得。