Suppr超能文献

估计人类蛋白质编码序列中特定等位基因的适合度效应及其对疾病的影响。

Estimation of allele-specific fitness effects across human protein-coding sequences and implications for disease.

机构信息

Simons Center for Quantitative Biology, Cold Spring Harbor Laboratory, Cold Spring Harbor, New York 11724, USA.

出版信息

Genome Res. 2019 Aug;29(8):1310-1321. doi: 10.1101/gr.245522.118. Epub 2019 Jun 27.

Abstract

A central challenge in human genomics is to understand the cellular, evolutionary, and clinical significance of genetic variants. Here, we introduce a unified population-genetic and machine-learning model, called inear llele-pecific election nferenc (), for estimating the fitness effects of all observed and potential single-nucleotide variants, based on polymorphism data and predictive genomic features. We applied LASSIE to 51 high-coverage genome sequences annotated with 33 genomic features and constructed a map of allele-specific selection coefficients across all protein-coding sequences in the human genome. This map is generally consistent with previous inferences of the bulk distribution of fitness effects but reveals pervasive weak negative selection against synonymous mutations. In addition, the estimated selection coefficients are highly predictive of inherited pathogenic variants and cancer driver mutations, outperforming state-of-the-art variant prioritization methods. By contrasting our estimated model with ultrahigh coverage ExAC exome-sequencing data, we identified 1118 genes under unusually strong negative selection, which tend to be exclusively expressed in the central nervous system or associated with autism spectrum disorder, as well as 773 genes under unusually weak selection, which tend to be associated with metabolism. This combination of classical population genetic theory with modern machine-learning and large-scale genomic data is a powerful paradigm for the study of both human evolution and disease.

摘要

人类基因组学的一个核心挑战是理解遗传变异的细胞、进化和临床意义。在这里,我们介绍了一种统一的群体遗传和机器学习模型,称为线性等位基因特异性选择推断(LASSIE),用于根据多态性数据和预测性基因组特征来估计所有观察到的和潜在的单核苷酸变异的适应度效应。我们将 LASSIE 应用于 51 个高覆盖率的基因组序列,这些序列被注释了 33 个基因组特征,并构建了人类基因组中所有蛋白质编码序列的等位基因特异性选择系数图谱。该图谱与先前关于适应度效应总体分布的推断基本一致,但揭示了普遍存在的对同义突变的弱负选择。此外,估计的选择系数高度预测了遗传致病性变异和癌症驱动突变,优于最先进的变异优先级方法。通过将我们估计的模型与超高覆盖率 ExAC 外显子组测序数据进行对比,我们鉴定出 1118 个受到异常强负选择的基因,这些基因往往只在中枢神经系统中表达或与自闭症谱系障碍有关,以及 773 个受到异常弱选择的基因,这些基因往往与代谢有关。这种将经典群体遗传理论与现代机器学习和大规模基因组数据相结合的方法是研究人类进化和疾病的强大范例。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/51d2/6673719/3bddece2a28e/1310f01.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验