The Broad Institute of MIT and Harvard, Cambridge, MA, 02141, USA.
Dana-Farber Cancer Institute, Boston, MA, 02215, USA.
Nature. 2013 Jul 11;499(7457):214-218. doi: 10.1038/nature12213. Epub 2013 Jun 16.
Major international projects are underway that are aimed at creating a comprehensive catalogue of all the genes responsible for the initiation and progression of cancer. These studies involve the sequencing of matched tumour-normal samples followed by mathematical analysis to identify those genes in which mutations occur more frequently than expected by random chance. Here we describe a fundamental problem with cancer genome studies: as the sample size increases, the list of putatively significant genes produced by current analytical methods burgeons into the hundreds. The list includes many implausible genes (such as those encoding olfactory receptors and the muscle protein titin), suggesting extensive false-positive findings that overshadow true driver events. We show that this problem stems largely from mutational heterogeneity and provide a novel analytical methodology, MutSigCV, for resolving the problem. We apply MutSigCV to exome sequences from 3,083 tumour-normal pairs and discover extraordinary variation in mutation frequency and spectrum within cancer types, which sheds light on mutational processes and disease aetiology, and in mutation frequency across the genome, which is strongly correlated with DNA replication timing and also with transcriptional activity. By incorporating mutational heterogeneity into the analyses, MutSigCV is able to eliminate most of the apparent artefactual findings and enable the identification of genes truly associated with cancer.
目前正在进行一些重大的国际项目,旨在创建一个全面的目录,其中包含所有导致癌症起始和进展的基因。这些研究涉及对匹配的肿瘤-正常样本进行测序,然后进行数学分析,以确定那些发生突变的基因比随机机会更频繁。在这里,我们描述了癌症基因组研究中的一个基本问题:随着样本量的增加,当前分析方法产生的可疑显著基因列表会膨胀到数百个。该列表包括许多不太可能的基因(例如编码嗅觉受体和肌肉蛋白 titin 的基因),表明存在大量的假阳性发现,掩盖了真正的驱动事件。我们表明,这个问题主要源于突变异质性,并提供了一种新的分析方法 MutSigCV 来解决这个问题。我们将 MutSigCV 应用于 3083 对肿瘤-正常样本的外显子序列,并发现癌症类型内的突变频率和谱存在惊人的差异,这为突变过程和疾病发病机制提供了线索,并且在整个基因组中的突变频率与 DNA 复制时间和转录活性也有很强的相关性。通过将突变异质性纳入分析,MutSigCV 能够消除大多数明显的人为发现,并能够识别与癌症真正相关的基因。