Higham Desmond J, Rasajski Marija, Przulj Natasa
Department of Mathematics, University of Strathclyde, Glasgow G1 1XH, UK.
Bioinformatics. 2008 Apr 15;24(8):1093-9. doi: 10.1093/bioinformatics/btn079. Epub 2008 Mar 14.
Finding a good network null model for protein-protein interaction (PPI) networks is a fundamental issue. Such a model would provide insights into the interplay between network structure and biological function as well as into evolution. Also, network (graph) models are used to guide biological experiments and discover new biological features. It has been proposed that geometric random graphs are a good model for PPI networks. In a geometric random graph, nodes correspond to uniformly randomly distributed points in a metric space and edges (links) exist between pairs of nodes for which the corresponding points in the metric space are close enough according to some distance norm. Computational experiments have revealed close matches between key topological properties of PPI networks and geometric random graph models. In this work, we push the comparison further by exploiting the fact that the geometric property can be tested for directly. To this end, we develop an algorithm that takes PPI interaction data and embeds proteins into a low-dimensional Euclidean space, under the premise that connectivity information corresponds to Euclidean proximity, as in geometric-random graphs. We judge the sensitivity and specificity of the fit by computing the area under the Receiver Operator Characteristic (ROC) curve. The network embedding algorithm is based on multi-dimensional scaling, with the square root of the path length in a network playing the role of the Euclidean distance in the Euclidean space. The algorithm exploits sparsity for computational efficiency, and requires only a few sparse matrix multiplications, giving a complexity of O(N(2)) where N is the number of proteins.
The algorithm has been verified in the sense that it successfully rediscovers the geometric structure in artificially constructed geometric networks, even when noise is added by re-wiring some links. Applying the algorithm to 19 publicly available PPI networks of various organisms indicated that: (a) geometric effects are present and (b) two-dimensional Euclidean space is generally as effective as higher dimensional Euclidean space for explaining the connectivity. Testing on a high-confidence yeast data set produced a very strong indication of geometric structure (area under the ROC curve of 0.89), with this network being essentially indistinguishable from a noisy geometric network. Overall, the results add support to the hypothesis that PPI networks have a geometric structure.
MATLAB code implementing the algorithm is available upon request.
为蛋白质-蛋白质相互作用(PPI)网络找到一个合适的网络空模型是一个基本问题。这样的模型将有助于深入了解网络结构与生物学功能之间的相互作用以及进化过程。此外,网络(图)模型还用于指导生物学实验并发现新的生物学特征。有人提出几何随机图是PPI网络的一个良好模型。在几何随机图中,节点对应于度量空间中均匀随机分布的点,并且根据某种距离范数,当度量空间中对应点足够接近时,节点对之间存在边(链接)。计算实验表明,PPI网络的关键拓扑特性与几何随机图模型之间存在紧密匹配。在这项工作中,我们利用几何特性可以直接测试这一事实,进一步推进了这种比较。为此,我们开发了一种算法,该算法将PPI相互作用数据作为输入,并在连通性信息对应于欧几里得距离(如同几何随机图那样)的前提下,将蛋白质嵌入到低维欧几里得空间中。我们通过计算接收者操作特征(ROC)曲线下的面积来判断拟合的灵敏度和特异性。网络嵌入算法基于多维缩放,网络中路径长度的平方根在欧几里得空间中扮演欧几里得距离的角色。该算法利用稀疏性提高计算效率,并且只需要进行几次稀疏矩阵乘法,其复杂度为O(N²),其中N是蛋白质的数量。
该算法已经得到验证,即它能够成功地重新发现人工构建的几何网络中的几何结构,即使通过重新连接一些链接添加了噪声。将该算法应用于19个公开可用的不同生物体的PPI网络表明:(a)存在几何效应,并且(b)二维欧几里得空间在解释连通性方面通常与高维欧几里得空间一样有效。在一个高可信度的酵母数据集上进行测试产生了非常强烈的几何结构指示(ROC曲线下面积为0.89),该网络与有噪声的几何网络基本无法区分。总体而言,这些结果为PPI网络具有几何结构这一假设提供了支持。
可根据请求提供实现该算法的MATLAB代码。