Program in Computational Biology & Bioinformatics, Yale University, New Haven, CT, 06520, USA.
Department of Molecular Biophysics & Biochemistry, Yale University, New Haven, CT, 06520, USA.
Nat Commun. 2020 Jul 29;11(1):3696. doi: 10.1038/s41467-020-14743-w.
ENCODE comprises thousands of functional genomics datasets, and the encyclopedia covers hundreds of cell types, providing a universal annotation for genome interpretation. However, for particular applications, it may be advantageous to use a customized annotation. Here, we develop such a custom annotation by leveraging advanced assays, such as eCLIP, Hi-C, and whole-genome STARR-seq on a number of data-rich ENCODE cell types. A key aspect of this annotation is comprehensive and experimentally derived networks of both transcription factors and RNA-binding proteins (TFs and RBPs). Cancer, a disease of system-wide dysregulation, is an ideal application for such a network-based annotation. Specifically, for cancer-associated cell types, we put regulators into hierarchies and measure their network change (rewiring) during oncogenesis. We also extensively survey TF-RBP crosstalk, highlighting how SUB1, a previously uncharacterized RBP, drives aberrant tumor expression and amplifies the effect of MYC, a well-known oncogenic TF. Furthermore, we show how our annotation allows us to place oncogenic transformations in the context of a broad cell space; here, many normal-to-tumor transitions move towards a stem-like state, while oncogene knockdowns show an opposing trend. Finally, we organize the resource into a coherent workflow to prioritize key elements and variants, in addition to regulators. We showcase the application of this prioritization to somatic burdening, cancer differential expression and GWAS. Targeted validations of the prioritized regulators, elements and variants using siRNA knockdowns, CRISPR-based editing, and luciferase assays demonstrate the value of the ENCODE resource.
ENCODE 包含数千个功能基因组数据集,涵盖数百种细胞类型,为基因组解释提供了通用注释。然而,对于特定的应用,使用定制的注释可能是有利的。在这里,我们利用先进的检测方法,如 eCLIP、Hi-C 和全基因组 STARR-seq,在许多富含 ENCODE 的细胞类型上开发了这样的定制注释。这种注释的一个关键方面是转录因子和 RNA 结合蛋白(TFs 和 RBPs)的全面和实验衍生的网络。癌症是一种全身性失调的疾病,是这种基于网络的注释的理想应用。具体来说,对于与癌症相关的细胞类型,我们将调节剂放入层次结构中,并测量它们在癌变过程中的网络变化(重新布线)。我们还广泛调查了 TF-RBP 串扰,强调了 SUB1,一种以前未被表征的 RBP,如何驱动异常肿瘤表达并放大已知致癌 TF MYC 的作用。此外,我们展示了我们的注释如何使我们能够将致癌转化置于广泛的细胞空间背景下;在这里,许多正常到肿瘤的转变趋向于干细胞状态,而致癌基因敲低则表现出相反的趋势。最后,我们将该资源组织成一个连贯的工作流程,以优先考虑关键元素和变体,以及调节剂。我们展示了这种优先级在体细胞负担、癌症差异表达和 GWAS 中的应用。使用 siRNA 敲低、基于 CRISPR 的编辑和荧光素酶测定对优先调控因子、元件和变体进行的靶向验证,证明了 ENCODE 资源的价值。