Department of Biostatistics, University of Michigan School of Public Health, Ann Arbor, MI, USA.
Center for Statistical Genetics, University of Michigan School of Public Health, Ann Arbor, MI, USA.
Nature. 2021 Feb;590(7845):290-299. doi: 10.1038/s41586-021-03205-y. Epub 2021 Feb 10.
The Trans-Omics for Precision Medicine (TOPMed) programme seeks to elucidate the genetic architecture and biology of heart, lung, blood and sleep disorders, with the ultimate goal of improving diagnosis, treatment and prevention of these diseases. The initial phases of the programme focused on whole-genome sequencing of individuals with rich phenotypic data and diverse backgrounds. Here we describe the TOPMed goals and design as well as the available resources and early insights obtained from the sequence data. The resources include a variant browser, a genotype imputation server, and genomic and phenotypic data that are available through dbGaP (Database of Genotypes and Phenotypes). In the first 53,831 TOPMed samples, we detected more than 400 million single-nucleotide and insertion or deletion variants after alignment with the reference genome. Additional previously undescribed variants were detected through assembly of unmapped reads and customized analysis in highly variable loci. Among the more than 400 million detected variants, 97% have frequencies of less than 1% and 46% are singletons that are present in only one individual (53% among unrelated individuals). These rare variants provide insights into mutational processes and recent human evolutionary history. The extensive catalogue of genetic variation in TOPMed studies provides unique opportunities for exploring the contributions of rare and noncoding sequence variants to phenotypic variation. Furthermore, combining TOPMed haplotypes with modern imputation methods improves the power and reach of genome-wide association studies to include variants down to a frequency of approximately 0.01%.
精准医学的跨组学研究(TOPMed)计划旨在阐明心脏、肺、血液和睡眠障碍的遗传结构和生物学,最终目标是改善这些疾病的诊断、治疗和预防。该计划的初始阶段侧重于具有丰富表型数据和多样化背景的个体的全基因组测序。在这里,我们描述了 TOPMed 的目标和设计以及可用资源和从测序数据中获得的早期见解。这些资源包括变体浏览器、基因型推断服务器以及通过 dbGaP(基因型和表型数据库)提供的基因组和表型数据。在最初的 53831 个 TOPMed 样本中,我们在与参考基因组对齐后检测到超过 4 亿个单核苷酸和插入或缺失变体。通过未映射读数的组装和高度变异位点的定制分析检测到了其他以前未描述的变体。在检测到的 4 亿多个变体中,97%的频率低于 1%,46%是仅存在于一个个体中的单倍型(53%是在无关个体中)。这些罕见的变体为突变过程和人类近期进化史提供了深入了解。TOPMed 研究中的广泛遗传变异目录为探索稀有和非编码序列变异对表型变异的贡献提供了独特的机会。此外,将 TOPMed 单倍型与现代推断方法相结合,提高了全基因组关联研究的能力和范围,使其能够包括频率约为 0.01%的变体。