Sidney Kimmel Comprehensive Cancer Center, Johns Hopkins University School of Medicine, Baltimore, MD 21287, USA.
Department of Medicine, Johns Hopkins University School of Medicine, Baltimore, MD 21287, USA.
Sci Transl Med. 2024 Mar 13;16(738):eadj9283. doi: 10.1126/scitranslmed.adj9283.
Genetic changes in repetitive sequences are a hallmark of cancer and other diseases, but characterizing these has been challenging using standard sequencing approaches. We developed a de novo kmer finding approach, called ARTEMIS (Analysis of RepeaT EleMents in dISease), to identify repeat elements from whole-genome sequencing. Using this method, we analyzed 1.2 billion kmers in 2837 tissue and plasma samples from 1975 patients, including those with lung, breast, colorectal, ovarian, liver, gastric, head and neck, bladder, cervical, thyroid, or prostate cancer. We identified tumor-specific changes in these patients in 1280 repeat element types from the LINE, SINE, LTR, transposable element, and human satellite families. These included changes to known repeats and 820 elements that were not previously known to be altered in human cancer. Repeat elements were enriched in regions of driver genes, and their representation was altered by structural changes and epigenetic states. Machine learning analyses of genome-wide repeat landscapes and fragmentation profiles in cfDNA detected patients with early-stage lung or liver cancer in cross-validated and externally validated cohorts. In addition, these repeat landscapes could be used to noninvasively identify the tissue of origin of tumors. These analyses reveal widespread changes in repeat landscapes of human cancers and provide an approach for their detection and characterization that could benefit early detection and disease monitoring of patients with cancer.
遗传重复序列的变化是癌症和其他疾病的标志,但使用标准测序方法对其进行特征描述一直具有挑战性。我们开发了一种从头发现 kmer 的方法,称为 ARTEMIS(疾病中重复元素的分析),用于从全基因组测序中识别重复元素。使用这种方法,我们分析了来自 1975 名患者的 2837 个组织和血浆样本中的 12 亿个 kmer,其中包括肺癌、乳腺癌、结直肠癌、卵巢癌、肝癌、胃癌、头颈部癌、膀胱癌、宫颈癌、甲状腺癌或前列腺癌患者。我们在 1280 种 LINE、SINE、LTR、转座元件和人类卫星家族的重复元件类型中,在这些患者中发现了肿瘤特异性变化。其中包括已知重复元件的变化和 820 个以前在人类癌症中未发现改变的元件。重复元件在驱动基因区域富集,其表达受结构变化和表观遗传状态的影响。cfDNA 中全基因组重复景观和碎片化特征的机器学习分析在交叉验证和外部验证队列中检测到了早期肺癌或肝癌患者。此外,这些重复景观可用于无创性识别肿瘤的起源组织。这些分析揭示了人类癌症中重复景观的广泛变化,并提供了一种用于检测和特征描述的方法,可能有益于癌症患者的早期检测和疾病监测。