Department of Genetics, University of Alabama at Birmingham, Birmingham, Alabama 35294, USA.
Chaos. 2010 Jun;20(2):026103. doi: 10.1063/1.3455188.
Interactions between genetic and/or environmental factors are ubiquitous, affecting the phenotypes of organisms in complex ways. Knowledge about such interactions is becoming rate-limiting for our understanding of human disease and other biological phenomena. Phenomics refers to the integrative analysis of how all genes contribute to phenotype variation, entailing genome and organism level information. A systems biology view of gene interactions is critical for phenomics. Unfortunately the problem is intractable in humans; however, it can be addressed in simpler genetic model systems. Our research group has focused on the concept of genetic buffering of phenotypic variation, in studies employing the single-cell eukaryotic organism, S. cerevisiae. We have developed a methodology, quantitative high throughput cellular phenotyping (Q-HTCP), for high-resolution measurements of gene-gene and gene-environment interactions on a genome-wide scale. Q-HTCP is being applied to the complete set of S. cerevisiae gene deletion strains, a unique resource for systematically mapping gene interactions. Genetic buffering is the idea that comprehensive and quantitative knowledge about how genes interact with respect to phenotypes will lead to an appreciation of how genes and pathways are functionally connected at a systems level to maintain homeostasis. However, extracting biologically useful information from Q-HTCP data is challenging, due to the multidimensional and nonlinear nature of gene interactions, together with a relative lack of prior biological information. Here we describe a new approach for mining quantitative genetic interaction data called recursive expectation-maximization clustering (REMc). We developed REMc to help discover phenomic modules, defined as sets of genes with similar patterns of interaction across a series of genetic or environmental perturbations. Such modules are reflective of buffering mechanisms, i.e., genes that play a related role in the maintenance of physiological homeostasis. To develop the method, 297 gene deletion strains were selected based on gene-drug interactions with hydroxyurea, an inhibitor of ribonucleotide reductase enzyme activity, which is critical for DNA synthesis. To partition the gene functions, these 297 deletion strains were challenged with growth inhibitory drugs known to target different genes and cellular pathways. Q-HTCP-derived growth curves were used to quantify all gene interactions, and the data were used to test the performance of REMc. Fundamental advantages of REMc include objective assessment of total number of clusters and assignment to each cluster a log-likelihood value, which can be considered an indicator of statistical quality of clusters. To assess the biological quality of clusters, we developed a method called gene ontology information divergence z-score (GOid_z). GOid_z summarizes total enrichment of GO attributes within individual clusters. Using these and other criteria, we compared the performance of REMc to hierarchical and K-means clustering. The main conclusion is that REMc provides distinct efficiencies for mining Q-HTCP data. It facilitates identification of phenomic modules, which contribute to buffering mechanisms that underlie cellular homeostasis and the regulation of phenotypic expression.
遗传和/或环境因素之间的相互作用无处不在,以复杂的方式影响生物体的表型。关于这些相互作用的知识对于我们理解人类疾病和其他生物现象的速度起着限制作用。表型组学是指综合分析所有基因如何影响表型变异,包括基因组和生物体水平的信息。从系统生物学的角度来看,基因相互作用是至关重要的。不幸的是,这个问题在人类中难以解决;然而,它可以在更简单的遗传模式系统中得到解决。我们的研究小组专注于表型变异的遗传缓冲的概念,在使用单细胞真核生物酿酒酵母的研究中。我们开发了一种方法,即定量高通量细胞表型分析(Q-HTCP),用于在全基因组范围内对基因-基因和基因-环境相互作用进行高分辨率测量。Q-HTCP 正在应用于酿酒酵母全基因缺失菌株的集合,这是系统映射基因相互作用的独特资源。遗传缓冲是指全面和定量地了解基因与表型相互作用的知识将导致对基因和途径如何在系统水平上功能连接以维持内稳态的理解。然而,由于基因相互作用的多维性和非线性性质,以及缺乏先验的生物学信息,从 Q-HTCP 数据中提取有生物学意义的信息具有挑战性。在这里,我们描述了一种新的挖掘定量遗传相互作用数据的方法,称为递归期望最大化聚类(REMc)。我们开发了 REMc 来帮助发现表型模块,表型模块定义为一系列遗传或环境扰动中具有相似相互作用模式的一组基因。这些模块反映了缓冲机制,即基因在维持生理内稳态方面发挥着相关作用。为了开发该方法,我们基于与核糖核苷酸还原酶酶活性抑制剂羟基脲的基因-药物相互作用,选择了 297 个基因缺失菌株,羟基脲是 DNA 合成的关键抑制剂。为了划分基因功能,这些 297 个缺失菌株受到了已知靶向不同基因和细胞途径的生长抑制药物的挑战。Q-HTCP 衍生的生长曲线用于量化所有基因相互作用,并使用数据测试 REMc 的性能。REMc 的基本优势包括客观评估聚类的总数和为每个聚类分配的对数似然值,这可以被认为是聚类统计质量的指标。为了评估聚类的生物学质量,我们开发了一种称为基因本体信息离散 z 分数(GOid_z)的方法。GOid_z 总结了每个聚类中 GO 属性的总富集情况。使用这些和其他标准,我们将 REMc 的性能与层次聚类和 K-均值聚类进行了比较。主要结论是,REMc 为挖掘 Q-HTCP 数据提供了明显的效率。它有助于识别表型模块,这些模块有助于缓冲机制,缓冲机制是细胞内稳态和表型表达调节的基础。