Venkatesan Aravind, Tripathi Sushil, Sanz de Galdeano Alejandro, Blondé Ward, Lægreid Astrid, Mironov Vladimir, Kuiper Martin
Department of Biology, Norwegian University of Science and Technology (NTNU), N-7491, Trondheim, Norway.
Department of Cancer Research and Molecular Medicine, Norwegian University of Science and Technology (NTNU), N-7489, Trondheim, Norway.
BMC Bioinformatics. 2014 Dec 10;15(1):386. doi: 10.1186/s12859-014-0386-y.
Network-based approaches for the analysis of large-scale genomics data have become well established. Biological networks provide a knowledge scaffold against which the patterns and dynamics of 'omics' data can be interpreted. The background information required for the construction of such networks is often dispersed across a multitude of knowledge bases in a variety of formats. The seamless integration of this information is one of the main challenges in bioinformatics. The Semantic Web offers powerful technologies for the assembly of integrated knowledge bases that are computationally comprehensible, thereby providing a potentially powerful resource for constructing biological networks and network-based analysis.
We have developed the Gene eXpression Knowledge Base (GeXKB), a semantic web technology based resource that contains integrated knowledge about gene expression regulation. To affirm the utility of GeXKB we demonstrate how this resource can be exploited for the identification of candidate regulatory network proteins. We present four use cases that were designed from a biological perspective in order to find candidate members relevant for the gastrin hormone signaling network model. We show how a combination of specific query definitions and additional selection criteria derived from gene expression data and prior knowledge concerning candidate proteins can be used to retrieve a set of proteins that constitute valid candidates for regulatory network extensions.
Semantic web technologies provide the means for processing and integrating various heterogeneous information sources. The GeXKB offers biologists such an integrated knowledge resource, allowing them to address complex biological questions pertaining to gene expression. This work illustrates how GeXKB can be used in combination with gene expression results and literature information to identify new potential candidates that may be considered for extending a gene regulatory network.
基于网络的大规模基因组学数据分析方法已得到广泛应用。生物网络提供了一个知识框架,据此可以解释“组学”数据的模式和动态。构建此类网络所需的背景信息通常以各种格式分散在众多知识库中。无缝整合这些信息是生物信息学的主要挑战之一。语义网提供了强大的技术来组装可计算理解的集成知识库,从而为构建生物网络和基于网络的分析提供了潜在的强大资源。
我们开发了基因表达知识库(GeXKB),这是一种基于语义网技术的资源,包含有关基因表达调控的综合知识。为了证实GeXKB的实用性,我们展示了如何利用该资源来识别候选调控网络蛋白。我们提出了四个从生物学角度设计的用例,以寻找与胃泌素激素信号网络模型相关的候选成员。我们展示了如何将特定的查询定义与从基因表达数据和关于候选蛋白的先验知识中得出的附加选择标准相结合,用于检索构成调控网络扩展有效候选者的一组蛋白。
语义网技术提供了处理和整合各种异构信息源的手段。GeXKB为生物学家提供了这样一种综合知识资源,使他们能够解决与基因表达相关的复杂生物学问题。这项工作说明了如何将GeXKB与基因表达结果和文献信息结合使用,以识别可能被考虑用于扩展基因调控网络的新潜在候选者。