Nature. 2012 Nov 1;491(7422):56-65. doi: 10.1038/nature11632.
By characterizing the geographic and functional spectrum of human genetic variation, the 1000 Genomes Project aims to build a resource to help to understand the genetic contribution to disease. Here we describe the genomes of 1,092 individuals from 14 populations, constructed using a combination of low-coverage whole-genome and exome sequencing. By developing methods to integrate information across several algorithms and diverse data sources, we provide a validated haplotype map of 38 million single nucleotide polymorphisms, 1.4 million short insertions and deletions, and more than 14,000 larger deletions. We show that individuals from different populations carry different profiles of rare and common variants, and that low-frequency variants show substantial geographic differentiation, which is further increased by the action of purifying selection. We show that evolutionary conservation and coding consequence are key determinants of the strength of purifying selection, that rare-variant load varies substantially across biological pathways, and that each individual contains hundreds of rare non-coding variants at conserved sites, such as motif-disrupting changes in transcription-factor-binding sites. This resource, which captures up to 98% of accessible single nucleotide polymorphisms at a frequency of 1% in related populations, enables analysis of common and low-frequency variants in individuals from diverse, including admixed, populations.
通过描述人类遗传变异的地理和功能谱,1000 基因组计划旨在构建一个资源,以帮助理解遗传对疾病的贡献。在这里,我们描述了来自 14 个群体的 1092 个人的基因组,这些基因组是使用低覆盖率全基因组和外显子组测序的组合构建的。通过开发整合多种算法和不同数据源信息的方法,我们提供了一个经过验证的 3800 万个单核苷酸多态性、140 万个短插入和缺失以及超过 14000 个更大缺失的单倍型图谱。我们表明,来自不同人群的个体携带不同的稀有和常见变异体谱,低频变异体显示出明显的地理分化,而纯化选择的作用进一步增加了这种分化。我们表明,进化保守性和编码后果是纯化选择强度的关键决定因素,稀有变异体负荷在不同的生物学途径中差异很大,每个个体在保守位点都含有数百个罕见的非编码变异体,如转录因子结合位点的基序破坏变化。这个资源捕获了高达 98%的可及性单核苷酸多态性,在相关人群中的频率为 1%,使来自不同人群(包括混合人群)的个体能够分析常见和低频变异体。