Cohen-Gihon Inbar, Nussinov Ruth, Sharan Roded
Sackler Institute of Molecular Medicine, Department of Human Genetics, Sackler Faculty of Medicine, Tel Aviv University, Tel Aviv, Israel.
BMC Genomics. 2007 Jun 11;8:161. doi: 10.1186/1471-2164-8-161.
Protein domains are fundamental evolutionary units of protein architecture, composing proteins in a modular manner. Combinations of two or more, possibly non-adjacent, domains are thought to play specific functional roles within proteins. Indeed, while the number of potential co-occurring domain sets (CDSs) is very large, only a few of these occur in nature. Here we study the principles governing domain content of proteins, using yeast as a model species.
We design a novel representation of proteins and their constituent domains as a protein-domain network. An analysis of this network reveals 99 CDSs that occur in proteins more than expected by chance. The identified CDSs are shown to preferentially include ancient domains that are conserved from bacteria or archaea. Moreover, the protein sets spanned by these combinations were found to be highly functionally coherent, significantly match known protein complexes, and enriched with protein-protein interactions. These observations serve to validate the biological significance of the identified CDSs.
Our work provides a comprehensive list of co-occurring domain sets in yeast, and sheds light on their function and evolution.
蛋白质结构域是蛋白质结构的基本进化单元,以模块化方式构成蛋白质。两个或更多个(可能不相邻)结构域的组合被认为在蛋白质中发挥特定的功能作用。实际上,虽然潜在的共现结构域集(CDS)数量非常庞大,但其中只有少数在自然界中出现。在这里,我们以酵母为模型物种研究蛋白质结构域含量的调控原则。
我们设计了一种将蛋白质及其组成结构域表示为蛋白质-结构域网络的新方法。对该网络的分析揭示了99个在蛋白质中出现的频率高于随机预期的CDS。已确定的CDS被证明优先包含来自细菌或古菌的保守古老结构域。此外,发现这些组合所涵盖的蛋白质集在功能上高度连贯,与已知蛋白质复合物显著匹配,并富含蛋白质-蛋白质相互作用。这些观察结果有助于验证已确定的CDS的生物学意义。
我们的工作提供了酵母中共现结构域集的完整列表,并阐明了它们的功能和进化。