State University of New York at Buffalo, Buffalo, New York, USA.
Mass General Research Institute, Boston, Massachusetts, USA.
Genet Epidemiol. 2024 Dec;48(8):455-467. doi: 10.1002/gepi.22565. Epub 2024 Apr 30.
Numerous studies over the past generation have identified germline variants that increase specific cancer risks. Simultaneously, a revolution in sequencing technology has permitted high-throughput annotations of somatic genomes characterizing individual tumors. However, examining the relationship between germline variants and somatic alteration patterns is hugely challenged by the large numbers of variants in a typical tumor, the rarity of most individual variants, and the heterogeneity of tumor somatic fingerprints. In this article, we propose statistical methodology that frames the investigation of germline-somatic relationships in an interpretable manner. The method uses meta-features embodying biological contexts of individual somatic alterations to implicitly group rare mutations. Our team has used this technique previously through a multilevel regression model to diagnose with high accuracy tumor site of origin. Herein, we further leverage topic models from computational linguistics to achieve interpretable lower-dimensional embeddings of the meta-features. We demonstrate how the method can identify distinctive somatic profiles linked to specific germline variants or environmental risk factors. We illustrate the method using The Cancer Genome Atlas whole-exome sequencing data to characterize somatic tumor fingerprints in breast cancer patients with germline BRCA1/2 mutations and in head and neck cancer patients exposed to human papillomavirus.
在过去的几十年中,许多研究已经确定了增加特定癌症风险的种系变体。同时,测序技术的革命使得高通量注释体细胞基因组成为可能,这些基因组能够描述个体肿瘤。然而,由于典型肿瘤中存在大量的变体、大多数个体变体的稀有性以及肿瘤体细胞指纹的异质性,研究种系变体与体细胞改变模式之间的关系极具挑战性。在本文中,我们提出了一种统计方法,以可解释的方式构建种系-体细胞关系的研究。该方法使用体现个体体细胞改变生物学背景的元特征来隐式地对罕见突变进行分组。我们的团队之前曾使用这种技术通过多层次回归模型来高精度诊断肿瘤起源部位。在这里,我们进一步利用计算语言学中的主题模型来实现元特征的可解释低维嵌入。我们展示了该方法如何识别与特定种系变体或环境风险因素相关的独特体细胞谱。我们使用癌症基因组图谱全外显子测序数据来说明该方法,以描述具有种系 BRCA1/2 突变的乳腺癌患者和暴露于人乳头瘤病毒的头颈部癌症患者的体细胞肿瘤指纹。