Suppr超能文献

TCS:一种新的多重序列比对可靠性度量方法,用于估计比对准确性并改进系统发育树重建。

TCS: a new multiple sequence alignment reliability measure to estimate alignment accuracy and improve phylogenetic tree reconstruction.

机构信息

Comparative Bioinformatics, Bioinformatics and Genomics Programme, Centre for Genomic Regulation (CRG), Dr. Aiguader 88, 08003 Barcelona, SpainUniversitat Pompeu Fabra (UPF), Barcelona, Spain.

Comparative Bioinformatics, Bioinformatics and Genomics Programme, Centre for Genomic Regulation (CRG), Dr. Aiguader 88, 08003 Barcelona, SpainUniversitat Pompeu Fabra (UPF), Barcelona, Spain

出版信息

Mol Biol Evol. 2014 Jun;31(6):1625-37. doi: 10.1093/molbev/msu117. Epub 2014 Apr 1.

Abstract

Multiple sequence alignment (MSA) is a key modeling procedure when analyzing biological sequences. Homology and evolutionary modeling are the most common applications of MSAs. Both are known to be sensitive to the underlying MSA accuracy. In this work, we show how this problem can be partly overcome using the transitive consistency score (TCS), an extended version of the T-Coffee scoring scheme. Using this local evaluation function, we show that one can identify the most reliable portions of an MSA, as judged from BAliBASE and PREFAB structure-based reference alignments. We also show how this measure can be used to improve phylogenetic tree reconstruction using both an established simulated data set and a novel empirical yeast data set. For this purpose, we describe a novel lossless alternative to site filtering that involves overweighting the trustworthy columns. Our approach relies on the T-Coffee framework; it uses libraries of pairwise alignments to evaluate any third party MSA. Pairwise projections can be produced using fast or slow methods, thus allowing a trade-off between speed and accuracy. We compared TCS with Heads-or-Tails, GUIDANCE, Gblocks, and trimAl and found it to lead to significantly better estimates of structural accuracy and more accurate phylogenetic trees. The software is available from www.tcoffee.org/Projects/tcs.

摘要

多序列比对(MSA)是分析生物序列时的关键建模过程。同源建模和进化建模是 MSA 最常见的应用。两者都被认为对基础 MSA 的准确性很敏感。在这项工作中,我们展示了如何使用传递一致性评分(TCS)部分克服这个问题,TCS 是 T-Coffee 评分方案的扩展版本。使用这个局部评估函数,我们表明可以从 BAliBASE 和 PREFAB 基于结构的参考比对中识别 MSA 中最可靠的部分。我们还展示了如何使用经过验证的模拟数据集和新的酵母经验数据集来改进系统发育树重建。为此,我们描述了一种新颖的无损耗替代位点过滤的方法,涉及对可信列进行加权。我们的方法依赖于 T-Coffee 框架;它使用成对比对库来评估任何第三方 MSA。可以使用快速或慢速方法生成成对投影,从而在速度和准确性之间进行权衡。我们将 TCS 与 Heads-or-Tails、GUIDANCE、Gblocks 和 trimAl 进行了比较,发现它可以显著提高结构准确性的估计,并产生更准确的系统发育树。该软件可从 www.tcoffee.org/Projects/tcs 获得。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验