Hua Xing, Hyland Paula L, Huang Jing, Song Lei, Zhu Bin, Caporaso Neil E, Landi Maria Teresa, Chatterjee Nilanjan, Shi Jianxin
Division of Cancer Epidemiology and Genetics, National Cancer Institute, NIH, Bethesda, MD 20892, USA.
Center for Cancer Research, National Cancer Institute, NIH, Bethesda, MD 20892, USA.
Am J Hum Genet. 2016 Mar 3;98(3):442-455. doi: 10.1016/j.ajhg.2015.12.021. Epub 2016 Feb 18.
The central challenges in tumor sequencing studies is to identify driver genes and pathways, investigate their functional relationships, and nominate drug targets. The efficiency of these analyses, particularly for infrequently mutated genes, is compromised when subjects carry different combinations of driver mutations. Mutual exclusivity analysis helps address these challenges. To identify mutually exclusive gene sets (MEGS), we developed a powerful and flexible analytic framework based on a likelihood ratio test and a model selection procedure. Extensive simulations demonstrated that our method outperformed existing methods for both statistical power and the capability of identifying the exact MEGS, particularly for highly imbalanced MEGS. Our method can be used for de novo discovery, for pathway-guided searches, or for expanding established small MEGS. We applied our method to the whole-exome sequencing data for 13 cancer types from The Cancer Genome Atlas (TCGA). We identified multiple previously unreported non-pairwise MEGS in multiple cancer types. For acute myeloid leukemia, we identified a MEGS with five genes (FLT3, IDH2, NRAS, KIT, and TP53) and a MEGS (NPM1, TP53, and RUNX1) whose mutation status was strongly associated with survival (p = 6.7 × 10(-4)). For breast cancer, we identified a significant MEGS consisting of TP53 and four infrequently mutated genes (ARID1A, AKT1, MED23, and TBL1XR1), providing support for their role as cancer drivers.
肿瘤测序研究的核心挑战在于识别驱动基因和信号通路,研究它们的功能关系,并确定药物靶点。当研究对象携带不同组合的驱动突变时,这些分析的效率会受到影响,尤其是对于低频突变基因。互斥性分析有助于应对这些挑战。为了识别互斥基因集(MEGS),我们基于似然比检验和模型选择程序开发了一个强大且灵活的分析框架。大量模拟表明,我们的方法在统计功效和识别精确MEGS的能力方面均优于现有方法,特别是对于高度不平衡的MEGS。我们的方法可用于从头发现、通路引导搜索或扩展已有的小MEGS。我们将我们的方法应用于来自癌症基因组图谱(TCGA)的13种癌症类型的全外显子测序数据。我们在多种癌症类型中识别出多个先前未报道的非成对MEGS。对于急性髓系白血病,我们识别出一个包含五个基因(FLT3、IDH2、NRAS、KIT和TP53)的MEGS以及一个MEGS(NPM1、TP53和RUNX1),其突变状态与生存率密切相关(p = 6.7 × 10(-4))。对于乳腺癌,我们识别出一个由TP53和四个低频突变基因(ARID1A、AKT1、MED23和TBL1XR1)组成的显著MEGS,这为它们作为癌症驱动因素的作用提供了支持。