Granot-Hershkovitz Einat, Sun Quan, Argos Maria, Zhou Hufeng, Lin Xihong, Browning Sharon R, Sofer Tamar
Division of Sleep and Circadian Disorders, Brigham and Women's Hospital, Boston, MA 02115, USA.
Department of Medicine, Harvard Medical School, Boston, MA 02115, USA.
HGG Adv. 2022 Feb 24;3(2):100096. doi: 10.1016/j.xhgg.2022.100096. eCollection 2022 Apr 14.
Allele frequency estimates in admixed populations, such as Hispanics and Latinos, rely on the sample's specific admixture composition and thus may differ between two seemingly similar populations. However, ancestry-specific allele frequencies, i.e., pertaining to the ancestral populations of an admixed group, may be particularly useful for prioritizing genetic variants for genetic discovery and personalized genomic health. We developed a method, ancestry-specific allele frequency estimation in admixed populations (AFA), to estimate the frequencies of biallelic variants in admixed populations with an unlimited number of ancestries. AFA uses maximum-likelihood estimation by modeling the conditional probability of having an allele given proportions of genetic ancestries. It can be applied using either local ancestry interval proportions encompassing the variant (local-ancestry-specific allele frequency estimations in admixed populations [LAFAs]) or global proportions of genetic ancestries (global-ancestry-specific allele frequency estimations in admixed populations [GAFAs]), which are easier to compute and are more widely available. Simulations and comparisons to existing software demonstrated the high accuracy of the method. We implemented AFA on high-quality imputed data of ∼9,000 Hispanics and Latinos from the Hispanic Community Health Study/Study of Latinos (HCHS/SOL), an understudied, admixed population with three predominant continental ancestries: Amerindian, European, and African. Comparison of the European and African estimated frequencies to the respective gnomAD frequencies demonstrated high correlations (Pearson R = 0.97-0.99). We provide a genome-wide dataset of the estimated ancestry-specific allele frequencies for available variants with allele frequency between 5% and 95% in at least one of the three ancestral populations. Association analysis of Amerindian-enriched variants with cardiometabolic traits identified five loci associated with lipid traits in Hispanics and Latinos, demonstrating the utility of ancestry-specific allele frequencies in admixed populations.
在混血人群(如西班牙裔和拉丁裔)中,等位基因频率估计依赖于样本的特定混合组成,因此在两个看似相似的人群之间可能会有所不同。然而,特定祖先的等位基因频率,即与混血群体的祖先群体相关的频率,对于确定遗传发现和个性化基因组健康的遗传变异优先级可能特别有用。我们开发了一种方法——混血人群中特定祖先等位基因频率估计(AFA),用于估计具有无限数量祖先的混血人群中双等位基因变异的频率。AFA通过对给定遗传祖先比例下拥有等位基因的条件概率进行建模,使用最大似然估计。它既可以使用包含变异的局部祖先区间比例(混血人群中局部祖先特定等位基因频率估计[LAFAs]),也可以使用遗传祖先的全局比例(混血人群中全局祖先特定等位基因频率估计[GAFAs])来应用,后者计算起来更容易且更广泛可用。模拟以及与现有软件的比较证明了该方法的高精度。我们在西班牙裔社区健康研究/拉丁裔研究(HCHS/SOL)中约9000名西班牙裔和拉丁裔的高质量推断数据上实施了AFA,这是一个研究较少的混血人群,有三个主要的大陆祖先:美洲印第安人、欧洲人和非洲人。将欧洲和非洲的估计频率与各自的gnomAD频率进行比较,显示出高度相关性(皮尔逊相关系数R = 0.97 - 0.99)。我们提供了一个全基因组数据集,包含在三个祖先群体中至少一个群体中等位基因频率在5%至95%之间的可用变异的估计特定祖先等位基因频率。对富含美洲印第安人变异的基因与心脏代谢特征进行关联分析,在西班牙裔和拉丁裔中确定了五个与脂质特征相关的基因座,证明了混血人群中特定祖先等位基因频率的实用性。