Suppr超能文献

估计群体变异的进化概率。

On estimating evolutionary probabilities of population variants.

机构信息

Institute for Genomics and Evolutionary Medicine, Temple University, Philadelphia, PA, 19122, USA.

Department of Biology, Temple University, Philadelphia, PA, 19122, USA.

出版信息

BMC Evol Biol. 2019 Jun 25;19(1):133. doi: 10.1186/s12862-019-1455-7.

Abstract

BACKGROUND

The evolutionary probability (EP) of an allele in a DNA or protein sequence predicts evolutionarily permissible (ePerm; EP ≥ 0.05) and forbidden (eForb; EP < 0.05) variants. EP of an allele represents an independent evolutionary expectation of observing an allele in a population based solely on the long-term substitution patterns captured in a multiple sequence alignment. In the neutral theory, EP and population frequencies can be compared to identify neutral and non-neutral alleles. This approach has been used to discover candidate adaptive polymorphisms in humans, which are eForbs segregating with high frequencies. The original method to compute EP requires the evolutionary relationships and divergence times of species in the sequence alignment (a timetree), which are not known with certainty for most datasets. This requirement impedes a general use of the original EP formulation. Here, we present an approach in which the phylogeny and times are inferred from the sequence alignment itself prior to the EP calculation. We evaluate if the modified EP approach produces results that are similar to those from the original method.

RESULTS

We compared EP estimates from the original and the modified approaches by using more than 18,000 protein sequence alignments containing orthologous sequences from 46 vertebrate species. For the original EP calculations, we used species relationships from UCSC and divergence times from TimeTree web resource, and the resulting EP estimates were considered to be the ground truth. We found that the modified approaches produced reasonable EP estimates for HGMD disease missense variant and 1000 Genomes Project missense variant datasets. Our results showed that reliable estimates of EP can be obtained without a priori knowledge of the sequence phylogeny and divergence times. We also found that, in order to obtain robust EP estimates, it is important to assemble a dataset with many sequences, sampling from a diversity of species groups.

CONCLUSION

We conclude that the modified EP approach will be generally applicable for alignments and enable the detection of potentially neutral, deleterious, and adaptive alleles in populations.

摘要

背景

在 DNA 或蛋白质序列中,等位基因的进化概率 (EP) 可以预测进化上允许的 (ePerm; EP≥0.05) 和禁止的 (eForb; EP<0.05) 变体。等位基因的 EP 代表了基于多序列比对中捕获的长期替代模式,仅从种群中观察到等位基因的独立进化预期。在中性理论中,可以比较 EP 和种群频率以识别中性和非中性等位基因。这种方法已被用于发现人类中的候选适应性多态性,这些多态性是与高频率分离的 eForb。计算 EP 的原始方法需要序列比对中的物种进化关系和分歧时间(时标),但对于大多数数据集来说,这些信息并不确定。这一要求阻碍了原始 EP 公式的广泛应用。在这里,我们提出了一种在计算 EP 之前,从序列比对本身推断系统发育和时间的方法。我们评估了修改后的 EP 方法是否会产生与原始方法相似的结果。

结果

我们通过使用包含来自 46 种脊椎动物的同源序列的超过 18000 个蛋白质序列比对来比较原始和修改后的 EP 估计值。对于原始 EP 计算,我们使用 UCSC 的物种关系和 TimeTree 网络资源的分歧时间,并且将得到的 EP 估计值视为基准。我们发现,修改后的方法可以为 HGMD 疾病错义变异体和 1000 基因组计划错义变异体数据集生成合理的 EP 估计值。我们的结果表明,无需先验了解序列系统发育和分歧时间,也可以获得可靠的 EP 估计值。我们还发现,为了获得可靠的 EP 估计值,重要的是要组装一个具有许多序列的数据集,从多种物种群体中进行采样。

结论

我们得出结论,修改后的 EP 方法将普遍适用于比对,并能够检测种群中潜在的中性、有害和适应性等位基因。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/735a/6593550/49aba22d0a5e/12862_2019_1455_Fig1_HTML.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验