Biophysics Graduate Group, University of California, Berkeley, CA, 94720, USA.
Center for Computational Biology, University of California, Berkeley, CA, 94720, USA.
Genome Med. 2023 Jul 13;15(1):51. doi: 10.1186/s13073-023-01199-y.
Curated databases of genetic variants assist clinicians and researchers in interpreting genetic variation. Yet, these databases contain some misclassified variants. It is unclear whether variant misclassification is abating as these databases rapidly grow and implement new guidelines.
Using archives of ClinVar and HGMD, we investigated how variant misclassification has changed over 6 years, across different ancestry groups. We considered inborn errors of metabolism (IEMs) screened in newborns as a model system because these disorders are often highly penetrant with neonatal phenotypes. We used samples from the 1000 Genomes Project (1KGP) to identify individuals with genotypes that were classified by the databases as pathogenic. Due to the rarity of IEMs, nearly all such classified pathogenic genotypes indicate likely variant misclassification in ClinVar or HGMD.
While the false-positive rates of both ClinVar and HGMD have improved over time, HGMD variants currently imply two orders of magnitude more affected individuals in 1KGP than ClinVar variants. We observed that African ancestry individuals have a significantly increased chance of being incorrectly indicated to be affected by a screened IEM when HGMD variants are used. However, this bias affecting genomes of African ancestry was no longer significant once common variants were removed in accordance with recent variant classification guidelines. We discovered that ClinVar variants classified as Pathogenic or Likely Pathogenic are reclassified sixfold more often than DM or DM? variants in HGMD, which has likely resulted in ClinVar's lower false-positive rate.
Considering misclassified variants that have since been reclassified reveals our increasing understanding of rare genetic variation. We found that variant classification guidelines and allele frequency databases comprising genetically diverse samples are important factors in reclassification. We also discovered that ClinVar variants common in European and South Asian individuals were more likely to be reclassified to a lower confidence category, perhaps due to an increased chance of these variants being classified by multiple submitters. We discuss features for variant classification databases that would support their continued improvement.
经过精心整理的遗传变异数据库可帮助临床医生和研究人员解读遗传变异。然而,这些数据库中存在一些错误分类的变异。目前尚不清楚随着这些数据库的快速增长和新指南的实施,变异错误分类是否会减少。
我们利用 ClinVar 和 HGMD 的档案,研究了在不同祖先群体中,变异错误分类在过去 6 年中是如何变化的。我们选择新生儿筛查的遗传性代谢疾病(IEM)作为模型系统,因为这些疾病的表型通常具有较高的外显率和新生儿表现。我们使用 1000 基因组计划(1KGP)的样本,确定了数据库中被归类为致病性的个体的基因型。由于 IEM 的罕见性,几乎所有被归类为致病性的基因型都表明 ClinVar 或 HGMD 中存在可能的变异错误分类。
尽管 ClinVar 和 HGMD 的假阳性率随着时间的推移有所改善,但 HGMD 变体目前在 1KGP 中暗示的受影响个体数量比 ClinVar 变体多两个数量级。我们观察到,当使用 HGMD 变体时,非洲裔个体被错误地指示为受筛选的 IEM 影响的可能性显著增加。然而,一旦按照最近的变异分类指南去除常见变异,这种影响非洲裔个体基因组的偏差就不再显著。我们发现,ClinVar 变体被归类为致病性或可能致病性的比例是 HGMD 中 DM 或 DM?变体的六倍,这可能导致 ClinVar 的假阳性率较低。
考虑到后来被重新分类的错误分类的变异,揭示了我们对罕见遗传变异的理解不断增加。我们发现,变异分类指南和包含遗传多样化样本的等位基因频率数据库是重新分类的重要因素。我们还发现,在欧洲和南亚个体中常见的 ClinVar 变体更有可能被重新归类为置信度较低的类别,这可能是由于这些变体被多个提交者分类的机会增加所致。我们讨论了支持变异分类数据库持续改进的特征。