人类泛基因组参考草图。

A draft human pangenome reference.

机构信息

Department of Genetics, Yale University School of Medicine, New Haven, CT, USA.

Center for Genomic Health, Yale University School of Medicine, New Haven, CT, USA.

出版信息

Nature. 2023 May;617(7960):312-324. doi: 10.1038/s41586-023-05896-x. Epub 2023 May 10.

DOI:10.1038/s41586-023-05896-x

PMID:37165242

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC10172123/

Abstract

Here the Human Pangenome Reference Consortium presents a first draft of the human pangenome reference. The pangenome contains 47 phased, diploid assemblies from a cohort of genetically diverse individuals. These assemblies cover more than 99% of the expected sequence in each genome and are more than 99% accurate at the structural and base pair levels. Based on alignments of the assemblies, we generate a draft pangenome that captures known variants and haplotypes and reveals new alleles at structurally complex loci. We also add 119 million base pairs of euchromatic polymorphic sequences and 1,115 gene duplications relative to the existing reference GRCh38. Roughly 90 million of the additional base pairs are derived from structural variation. Using our draft pangenome to analyse short-read data reduced small variant discovery errors by 34% and increased the number of structural variants detected per haplotype by 104% compared with GRCh38-based workflows, which enabled the typing of the vast majority of structural variant alleles per sample.

摘要

在这里，人类泛基因组参考联盟（Human Pangenome Reference Consortium）呈现了人类泛基因组参考的首个草案。该泛基因组包含了来自遗传多样化个体队列的 47 个相位、二倍体组装。这些组装涵盖了每个基因组中超过 99%的预期序列，在结构和碱基对水平上的准确性超过 99%。基于这些组装的比对，我们生成了一个草案泛基因组，它捕获了已知的变体和单倍型，并揭示了结构复杂位点的新等位基因。与现有的参考基因组 GRCh38 相比，我们还增加了 1.19 亿个碱基对的常染色质多态性序列和 1115 个基因重复。大约 9000 万个额外的碱基对来自结构变异。使用我们的草案泛基因组来分析短读长数据，与基于 GRCh38 的工作流程相比，减少了小变异发现错误 34%，并将每个单倍型检测到的结构变异数量增加了 104%，从而能够对每个样本的绝大多数结构变异等位基因进行分型。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b4ab/10172123/33aaca4d3c91/41586_2023_5896_Fig1_HTML.jpg

相似文献

A draft human pangenome reference.

Nature. 2023 May;617(7960):312-324. doi: 10.1038/s41586-023-05896-x. Epub 2023 May 10.

A pangenome reference of 36 Chinese populations.

Nature. 2023 Jul;619(7968):112-121. doi: 10.1038/s41586-023-06173-7. Epub 2023 Jun 14.

Semi-automated assembly of high-quality diploid human reference genomes.

Nature. 2022 Nov;611(7936):519-531. doi: 10.1038/s41586-022-05325-5. Epub 2022 Oct 19.

Pangenome graph construction from genome alignments with Minigraph-Cactus.

Nat Biotechnol. 2024 Apr;42(4):663-673. doi: 10.1038/s41587-023-01793-w. Epub 2023 May 10.

De novo assembly and phasing of a Korean human genome.

Nature. 2016 Oct 13;538(7624):243-247. doi: 10.1038/nature20098. Epub 2016 Oct 5.

Genetic complexity of killer-cell immunoglobulin-like receptor genes in human pangenome assemblies.

Genome Res. 2024 Sep 20;34(8):1211-1223. doi: 10.1101/gr.278358.123.

Evaluation of GRCh38 and de novo haploid genome assemblies demonstrates the enduring quality of the reference assembly.

Genome Res. 2017 May;27(5):849-864. doi: 10.1101/gr.213611.116. Epub 2017 Apr 10.

The Human Pangenome Project: a global resource to map genomic diversity.

Nature. 2022 Apr;604(7906):437-446. doi: 10.1038/s41586-022-04601-8. Epub 2022 Apr 20.

A complete reference genome improves analysis of human genetic variation.

Science. 2022 Apr;376(6588):eabl3533. doi: 10.1126/science.abl3533. Epub 2022 Apr 1.

Personalized pangenome references.

Nat Methods. 2024 Nov;21(11):2017-2023. doi: 10.1038/s41592-024-02407-2. Epub 2024 Sep 11.

引用本文的文献

Finding easy regions for short-read variant calling from pangenome data.

Gigascience. 2025 Jan 6;14. doi: 10.1093/gigascience/giaf103.

TCR germline diversity reveals evidence of natural selection on variable and joining alpha chain genes.

bioRxiv. 2025 Aug 24:2025.08.20.671277. doi: 10.1101/2025.08.20.671277.

Determinants of chromosome-specific telomere lengths among 2,573 All of Us participants.

Res Sq. 2025 Aug 18:rs.3.rs-7293781. doi: 10.21203/rs.3.rs-7293781/v1.

Skin metatranscriptomics reveals a landscape of variation in microbial activity and gene expression across the human body.

Nat Biotechnol. 2025 Aug 28. doi: 10.1038/s41587-025-02797-4.

Genome analyses and breeding of polyploid crops.

Nat Plants. 2025 Aug 28. doi: 10.1038/s41477-025-02088-5.

Structural Variants: Mechanisms, Mapping, and Interpretation in Human Genetics.

Genes (Basel). 2025 Jul 29;16(8):905. doi: 10.3390/genes16080905.

A Hitchhiker Guide to Structural Variant Calling: A Comprehensive Benchmark Through Different Sequencing Technologies.

Biomedicines. 2025 Aug 9;13(8):1949. doi: 10.3390/biomedicines13081949.

Chromosome-level haplotype-resolved genome assembly provides insights into the highly heterozygous genome of Italian ryegrass (Lolium multiflorum Lam.).

Plant Genome. 2025 Sep;18(3):e70079. doi: 10.1002/tpg2.70079.

A comparison of 27 Arabidopsis thaliana genomes and the path toward an unbiased characterization of genetic polymorphism.

Nat Genet. 2025 Aug 19. doi: 10.1038/s41588-025-02293-0.

Identification of Single Nucleotide Polymorphism from Insect Genomic Data.

Methods Mol Biol. 2025;2935:29-49. doi: 10.1007/978-1-0716-4583-3_2.

本文引用的文献

Truvari: refined structural variant comparison preserves allelic diversity.

Genome Biol. 2022 Dec 27;23(1):271. doi: 10.1186/s13059-022-02840-6.

Benchmarking challenging small variants with linked and long reads.

Cell Genom. 2022 May;2(5). doi: 10.1016/j.xgen.2022.100128.

High-coverage whole-genome sequencing of the expanded 1000 Genomes Project cohort including 602 trios.

Cell. 2022 Sep 1;185(18):3426-3440.e19. doi: 10.1016/j.cell.2022.08.004.

PrecisionFDA Truth Challenge V2: Calling variants from short and long reads in difficult-to-map regions.

Cell Genom. 2022 May 11;2(5). doi: 10.1016/j.xgen.2022.100129. Epub 2022 Apr 27.

Genome evolution and diversity of wild and cultivated potatoes.

Nature. 2022 Jun;606(7914):535-541. doi: 10.1038/s41586-022-04822-x. Epub 2022 Jun 8.

Graph pangenome captures missing heritability and empowers tomato breeding.

Nature. 2022 Jun;606(7914):527-534. doi: 10.1038/s41586-022-04808-9. Epub 2022 Jun 8.

A spectrum of free software tools for processing the VCF variant call format: vcflib, bio-vcf, cyvcf2, hts-nim and slivar.

PLoS Comput Biol. 2022 May 31;18(5):e1009123. doi: 10.1371/journal.pcbi.1009123. eCollection 2022 May.

The Human Pangenome Project: a global resource to map genomic diversity.

Nature. 2022 Apr;604(7906):437-446. doi: 10.1038/s41586-022-04601-8. Epub 2022 Apr 20.

Pangenome-based genome inference allows efficient and accurate genotyping across a wide spectrum of variant classes.

Nat Genet. 2022 Apr;54(4):518-525. doi: 10.1038/s41588-022-01043-w. Epub 2022 Apr 11.

A complete reference genome improves analysis of human genetic variation.

Science. 2022 Apr;376(6588):eabl3533. doi: 10.1126/science.abl3533. Epub 2022 Apr 1.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

人类泛基因组参考草图。

A draft human pangenome reference.

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献