Department of Epidemiology, Harvard T.H. Chan School of Public Health, Boston, MA 02115, USA; Program in Bioinformatics and Integrative Genomics, Harvard Graduate School of Arts and Sciences, Boston, MA 02115, USA.
Department of Epidemiology, Harvard T.H. Chan School of Public Health, Boston, MA 02115, USA.
Am J Hum Genet. 2019 Sep 5;105(3):456-476. doi: 10.1016/j.ajhg.2019.07.003. Epub 2019 Aug 8.
Complex traits and common diseases are extremely polygenic, their heritability spread across thousands of loci. One possible explanation is that thousands of genes and loci have similarly important biological effects when mutated. However, we hypothesize that for most complex traits, relatively few genes and loci are critical, and negative selection-purging large-effect mutations in these regions-leaves behind common-variant associations in thousands of less critical regions instead. We refer to this phenomenon as flattening. To quantify its effects, we introduce a mathematical definition of polygenicity, the effective number of independently associated SNPs (M), which describes how evenly the heritability of a trait is spread across the genome. We developed a method, stratified LD fourth moments regression (S-LD4M), to estimate M, validating that it produces robust estimates in simulations. Analyzing 33 complex traits (average N = 361k), we determined that heritability is spread ∼4× more evenly among common SNPs than among low-frequency SNPs. This difference, together with evolutionary modeling of new mutations, suggests that complex traits would be orders of magnitude less polygenic if not for the influence of negative selection. We also determined that heritability is spread more evenly within functionally important regions in proportion to their heritability enrichment; functionally important regions do not harbor common SNPs with greatly increased causal effect sizes, due to selective constraint. Our results suggest that for most complex traits, the genes and loci with the most critical biological effects often differ from those with the strongest common-variant associations.
复杂性状和常见疾病是高度多基因的,其遗传率分布在数千个基因座上。一种可能的解释是,数千个基因和基因座在突变时具有相似的重要生物学效应。然而,我们假设对于大多数复杂性状,相对较少的基因和基因座是关键的,而负选择——清除这些区域中的大效应突变——在数千个不太关键的区域中留下了常见变体的关联。我们将这种现象称为扁平化。为了量化其影响,我们引入了一个多基因性的数学定义,即独立相关 SNP 的有效数量(M),它描述了一个性状的遗传率在基因组中是如何均匀分布的。我们开发了一种方法,分层 LD 四阶矩回归(S-LD4M),来估计 M,并验证了它在模拟中产生了稳健的估计。对 33 个复杂性状(平均 N=361k)进行分析,我们确定遗传率在常见 SNP 中的分布比低频 SNP 均匀约 4 倍。这种差异,加上对新突变的进化建模,表明如果没有负选择的影响,复杂性状的多基因性将小几个数量级。我们还确定,遗传率在功能重要区域中的分布比其遗传率富集更加均匀;由于选择压力,功能重要区域不包含常见 SNP,其因果效应大小大大增加。我们的结果表明,对于大多数复杂性状,具有最关键生物学效应的基因和基因座通常与具有最强常见变体关联的基因和基因座不同。