Department of Genetics, Yale University School of Medicine, New Haven, CT, USA.
Center for Genomic Health, Yale University School of Medicine, New Haven, CT, USA.
Nature. 2023 May;617(7960):312-324. doi: 10.1038/s41586-023-05896-x. Epub 2023 May 10.
Here the Human Pangenome Reference Consortium presents a first draft of the human pangenome reference. The pangenome contains 47 phased, diploid assemblies from a cohort of genetically diverse individuals. These assemblies cover more than 99% of the expected sequence in each genome and are more than 99% accurate at the structural and base pair levels. Based on alignments of the assemblies, we generate a draft pangenome that captures known variants and haplotypes and reveals new alleles at structurally complex loci. We also add 119 million base pairs of euchromatic polymorphic sequences and 1,115 gene duplications relative to the existing reference GRCh38. Roughly 90 million of the additional base pairs are derived from structural variation. Using our draft pangenome to analyse short-read data reduced small variant discovery errors by 34% and increased the number of structural variants detected per haplotype by 104% compared with GRCh38-based workflows, which enabled the typing of the vast majority of structural variant alleles per sample.
在这里,人类泛基因组参考联盟(Human Pangenome Reference Consortium)呈现了人类泛基因组参考的首个草案。该泛基因组包含了来自遗传多样化个体队列的 47 个相位、二倍体组装。这些组装涵盖了每个基因组中超过 99%的预期序列,在结构和碱基对水平上的准确性超过 99%。基于这些组装的比对,我们生成了一个草案泛基因组,它捕获了已知的变体和单倍型,并揭示了结构复杂位点的新等位基因。与现有的参考基因组 GRCh38 相比,我们还增加了 1.19 亿个碱基对的常染色质多态性序列和 1115 个基因重复。大约 9000 万个额外的碱基对来自结构变异。使用我们的草案泛基因组来分析短读长数据,与基于 GRCh38 的工作流程相比,减少了小变异发现错误 34%,并将每个单倍型检测到的结构变异数量增加了 104%,从而能够对每个样本的绝大多数结构变异等位基因进行分型。