Williams Anna V, Miller Joseph T, Small Ian, Nevill Paul G, Boykin Laura M
Australian Research Council Centre of Excellence in Plant Energy Biology, The University of Western Australia, Crawley, WA 6009, Australia; Kings Park and Botanic Garden, Fraser Ave, Kings Park, WA 6005, Australia; School of Plant Biology, The University of Western Australia, Crawley, WA 6009, Australia.
National Research Collections Australia, CSIRO National Facilities and Collections, GPO Box 1600, Canberra, ACT 2601, Australia; Division of Environmental Biology, National Science Foundation, 4201 Wilson Blvd, Arlington, VA 22230, USA.
Mol Phylogenet Evol. 2016 Mar;96:1-8. doi: 10.1016/j.ympev.2015.11.021. Epub 2015 Dec 15.
Combining whole genome data with previously obtained amplicon sequences has the potential to increase the resolution of phylogenetic analyses, particularly at low taxonomic levels or where recent divergence, rapid speciation or slow genome evolution has resulted in limited sequence variation. However, the integration of these types of data for large scale phylogenetic studies has rarely been investigated. Here we conduct a phylogenetic analysis of the whole chloroplast genome and two nuclear ribosomal loci for 65 Acacia species from across the most recent Acacia phylogeny. We then combine this data with previously generated amplicon sequences (four chloroplast loci and two nuclear ribosomal loci) for 508 Acacia species. We use several phylogenetic methods, including maximum likelihood bootstrapping (with and without constraint) and ExaBayes, in order to determine the success of combining a dataset of 4000bp with one of 189,000bp. The results of our study indicate that the inclusion of whole genome data gave a far better resolved and well supported representation of the phylogenetic relationships within Acacia than using only amplicon sequences, with the greatest support observed when using a whole genome phylogeny as a constraint on the amplicon sequences. Our study therefore provides methods for optimal integration of genomic and amplicon sequences.
将全基因组数据与先前获得的扩增子序列相结合,有可能提高系统发育分析的分辨率,特别是在低分类水平,或者在近期分化、快速物种形成或缓慢基因组进化导致序列变异有限的情况下。然而,对于大规模系统发育研究而言,整合这些类型的数据很少被研究。在这里,我们对来自最新金合欢系统发育的65种金合欢的整个叶绿体基因组和两个核糖体基因座进行了系统发育分析。然后,我们将这些数据与先前为508种金合欢生成的扩增子序列(四个叶绿体基因座和两个核糖体基因座)相结合。我们使用了几种系统发育方法,包括最大似然自展法(有约束和无约束)和ExaBayes,以确定将一个4000bp的数据集与一个189,000bp的数据集相结合的成功率。我们的研究结果表明,与仅使用扩增子序列相比,纳入全基因组数据能更好地解析和支持金合欢内部的系统发育关系,当使用全基因组系统发育作为扩增子序列的约束时,支持度最高。因此,我们的研究提供了基因组和扩增子序列最佳整合的方法。