Corander Jukka, Marttinen Pekka, Sirén Jukka, Tang Jing
Department of Mathematics, Fänriksgatan 3B, Abo Akademi University, Abo, Finland.
BMC Bioinformatics. 2008 Dec 16;9:539. doi: 10.1186/1471-2105-9-539.
During the most recent decade many Bayesian statistical models and software for answering questions related to the genetic structure underlying population samples have appeared in the scientific literature. Most of these methods utilize molecular markers for the inferences, while some are also capable of handling DNA sequence data. In a number of earlier works, we have introduced an array of statistical methods for population genetic inference that are implemented in the software BAPS. However, the complexity of biological problems related to genetic structure analysis keeps increasing such that in many cases the current methods may provide either inappropriate or insufficient solutions.
We discuss the necessity of enhancing the statistical approaches to face the challenges posed by the ever-increasing amounts of molecular data generated by scientists over a wide range of research areas and introduce an array of new statistical tools implemented in the most recent version of BAPS. With these methods it is possible, e.g., to fit genetic mixture models using user-specified numbers of clusters and to estimate levels of admixture under a genetic linkage model. Also, alleles representing a different ancestry compared to the average observed genomic positions can be tracked for the sampled individuals, and a priori specified hypotheses about genetic population structure can be directly compared using Bayes' theorem. In general, we have improved further the computational characteristics of the algorithms behind the methods implemented in BAPS facilitating the analyses of large and complex datasets. In particular, analysis of a single dataset can now be spread over multiple computers using a script interface to the software.
The Bayesian modelling methods introduced in this article represent an array of enhanced tools for learning the genetic structure of populations. Their implementations in the BAPS software are designed to meet the increasing need for analyzing large-scale population genetics data. The software is freely downloadable for Windows, Linux and Mac OS X systems at http://web.abo.fi/fak/mnf//mate/jc/software/baps.html.
在最近十年中,科学文献中出现了许多用于回答与群体样本潜在遗传结构相关问题的贝叶斯统计模型和软件。这些方法大多利用分子标记进行推断,而有些方法也能够处理DNA序列数据。在许多早期的研究中,我们已经介绍了一系列用于群体遗传推断的统计方法,这些方法在软件BAPS中得以实现。然而,与遗传结构分析相关的生物学问题的复杂性不断增加,以至于在许多情况下,当前的方法可能提供不合适或不充分的解决方案。
我们讨论了增强统计方法以应对科学家在广泛研究领域中产生的不断增加的分子数据所带来的挑战的必要性,并介绍了在最新版本的BAPS中实现的一系列新统计工具。使用这些方法,例如,可以使用用户指定的聚类数量来拟合遗传混合模型,并在遗传连锁模型下估计混合水平。此外,可以为抽样个体追踪与平均观察到的基因组位置相比代表不同祖先的等位基因,并且可以使用贝叶斯定理直接比较关于遗传群体结构的先验指定假设。总体而言,我们进一步改进了BAPS中实现的方法背后算法的计算特性,便于分析大型和复杂的数据集。特别是,现在可以使用该软件的脚本接口将单个数据集的分析分布在多台计算机上。
本文介绍的贝叶斯建模方法代表了一系列用于了解群体遗传结构的增强工具。它们在BAPS软件中的实现旨在满足分析大规模群体遗传学数据日益增长的需求。该软件可从http://web.abo.fi/fak/mnf//mate/jc/software/baps.html免费下载到Windows、Linux和Mac OS X系统。