Liu Kevin, Raghavan Sindhu, Nelesen Serita, Linder C Randal, Warnow Tandy
Department of Computer Sciences, University of Texas at Austin, One University Station C0500, Austin, TX 78712, USA.
Science. 2009 Jun 19;324(5934):1561-4. doi: 10.1126/science.1171243.
Inferring an accurate evolutionary tree of life requires high-quality alignments of molecular sequence data sets from large numbers of species. However, this task is often difficult, slow, and idiosyncratic, especially when the sequences are highly diverged or include high rates of insertions and deletions (collectively known as indels). We present SATé (simultaneous alignment and tree estimation), an automated method to quickly and accurately estimate both DNA alignments and trees with the maximum likelihood criterion. In our study, it improved tree and alignment accuracy compared to the best two-phase methods currently available for data sets of up to 1000 sequences, showing that coestimation can be both rapid and accurate in phylogenetic studies.
推断一棵准确的生命进化树需要对来自大量物种的分子序列数据集进行高质量比对。然而,这项任务通常困难、耗时且具有特殊性,尤其是当序列高度分化或包含高比例的插入和缺失(统称为插入缺失)时。我们提出了SATé(同时比对和树估计),这是一种自动方法,可根据最大似然准则快速准确地估计DNA比对和进化树。在我们的研究中,与目前可用于多达1000个序列数据集的最佳两阶段方法相比,它提高了进化树和比对的准确性,表明在系统发育研究中共同估计既快速又准确。