Are biological networks scale-free graphs? Diego Mauricio Ria˜no-Pach´on, Ingo Dreyer, Bernd Mueller-Roeber Department of Molecular Biology, Institute for Biochemistry and Biology, University of Potsdam† email: {diriano,dreyer,bmr}@uni-potsdam.de

Introduction

Importance of hubs

Scale-free graphs were first described by Barab´asi and Albert[1] based on the study of the web connectivity, followed by several different biological networks (e. g. [2]). A graph is scale-free if the distribution of the vertex connectivity (degree) (k) follows a power-law distribution of the form P (k) ∼ k −γ . Additionally, it should have highly connected nodes, called hubs, which are central to the network topology, and ‘keep the network together’. Targeted removal of such a hub is catastrophic for the network topology. Recently, some authors have pointed out that networks believed to be scale-free graphs are actually scale-rich. Here we follow a recommended procedure to study the degree distribution of biological graphs and to evaluate the importance of hubs in the graph topology (for details read [3, 4]).

It has been suggested that the occurrence of a power-lawlike degree ditribution was enough for the emergence of hubs that kept the network together. Li et al. [3] have shown that this is not the case, and networks displaying power-law degree distributions may not have central hubs. The authors define the s-metric of graph g (s(g)) and its normalized version S(g) as:

Figure 3: PDF of degree distribution

The recommended way to study the degree distribution is through the Cumulative Distribution Function (CDF) which gives the probability of a degree larger than k. The CDF of a power-law function is also a power-law function, where the exponent is one unit less than in equation 1. P (K > k) = Ck −(γ−1)

Graph theory concepts Many systems can be represented as graphs, a set of nodes joined together by links, representing some type of interaction or relationship. Such as: protein-protein interactions, transcriptional regulation and domain cooccurrences, among several others. Figure 1 depicts the representation of the protein domain co-occurences as a graph. In panel ‘A’, two proteins (a and b) are shown with colored boxes, representing different functional domains. In panel ‘B’, the domains of those proteins are represented as colored nodes. Edges link together nodes (domains) that appear in a single protein. Applying this procedure to a full proteome results in a graph like the one in panel C of Figure 1.

s(g) =

Cummulative distribution

(2)

Figure 4 shows the CDF of the same data used in panel A of Figure 3. Fitting a straight line to the CDF gives γ = 1.9, much closer to the real value.

Figure 4: CDF of degree distribution

X

didj

S(g) = s(g)/smax

(4)

(i,j)∈ε

where di and dj are the degrees of nodes i and j, respectively, and ε is the set of edges of graph g. And, smax is the maximally attainable value of s(g), for a graph with the same degree sequence as g. This metric attempts to quantify the role of hubs in the topology of the graph. High values of S(g) indicate a ‘hub-like core’, where the hubs play an important role in the connectivity of the graph (the smax graph was computed as described in appendix A.1 of [3]). s-metric Metric S. cerevisiae O. sativa s(g) 32 672 558 635 S(g) 0.624 0.417

Another option is to compute the centrality of nodes. If hubs are central to the topology, then a positive correlation between the node degree and the node centrality is expected, and the centrality of low-degree nodes would be low. Here we compute the betweenness centrality as implemented in Pajek [9], which is the fraction of shortest paths that go through the node of interest.

Model Selection To decide objectively which is the function best describing the degree distribution, we follow the procedure described in [7], which has basically three steps: (I). Define a set of competing functions. Figure 1: Biological networks as graphs

The degree (k) of a node is its number of incident edges. For example, in Figure 1 panel ‘B’, nodes blue, green and orange each have degree 3, node purple has degree 5, and nodes yellow and light green have degree 2 each. As seen, different nodes have different degrees, this variability is characterized by the degree distribution function P (k), which gives the probability that a node has exactly k edges, or, in other words gives the observed frequency of a node of degree k. Very good introductions to network analysis and theory are found in [5] and [6].

Probability distribution The degree distribution for a scale-free graph can be described by a power-law, especially for large values of k P (K = k) = Ck −γ (1) P where C is a constant, such that ki=1 P (k) = 1 This function when plotted in double logarithmic scale appears as a straight line of slope −γ. This fact has been used as the gold-standard test to decide wether a graph is scalefree or not, and to estimate the value of γ. As shown in panel A of Figure 3, this frequency plot or Probability Distribution Function(PDF) is very noisy for large k, leading to unreliable estimates of γ. In this example γ = 2, and fitting a straight line to this data gives γ = 0.8. In addition, an exponential function (P (k) ∼ e−αk ), can also look like a straight line in the double logarithmic plot for large values of k, as it is seen in panel B of Figure 3. Figure 3 shows n = 10000 integer values sampled from both distributions, power law and exponential. The following listing shows the R code to generate the data.

Figure 2: R code. Generating degree distributions.



(II). Compute Maximum Likelihood estimates of the parameters for each function.

Figure 6: Node centrality

(III). Decide which function describes best the data. To ilustrate this approach we will use real data from the network of domain co-occurrences in Saccharomyces cerevisiae and Oryza sativa. Exponential, stretched exponential, power-law with exponential cutoff and power-law will be used as competing functions (step (I), Figure 5). Name Function Abb Exponential Ce−βk F1 ¯ Stretched exponential Ce−βk/k k −γ F2 Power-law with exponential cutoff k −γ e−k/kcut F3 −γ Power-law Ck F4

The Maximum Likelihood parameters estimation for each function is obtained through non-linear least squares (step (II)). The best describing function is the one with the lowest Akaike Information Criteria (AIC). AIC is a measure that rewards models for good fit, but penalizes for extra parameters. ˆ + d) AIC = 2(−lk(θ) (3) ˆ Where θ is the maximum value of the likelihood function and d is the number of adjusted parameters.

Conclusions • When analysing a degree distribution, always use the cummulative distribution. • Assess statistically which model describes best the degree distribution of your network, using the Akaike Information Criteria. It is also possible to use the Bayesian Information Criteria. • Evaluate the importance of hubs in the network. Two options are: using the s-metric or an index of betweenness centrality. For the domain co-occurrence networks shown here, it is clear that they are not scale-free graphs. Their degree distribution is not well described by a power-law. And, there are many low-degree nodes that are very important for the network topology.

Acknowledgements This work was funded by the IZ-APT Uni-Potsdam.

References [1] Barabasi, A.-L. and Albert, R. (1999) Emergence of scaling in random networks. Science, 286, 509–512. [2] Jeong, H., Tombor, B., Albert, R., Oltvai, Z. N., and Barabasi, A.-L. (2000) The large-scale organization of metabolic networks. Nature, 407, 651–654. [3] Li, L., Alderson, D., Tanaka, R., Doyle, J. C., and Willinger, W. (2005) Towards a theory of scale-free graphs: Definition, properties, and implications (extended version). Tech. rep., Engineering & Applied Sciences Division California Institute of Technology, Pasadena, CA, USA.

Figure 5: Non linear least squares ML fitting AIC Function S. cerevisiae O. sativa F1 -107.15 -193.43 F2 -131.21 -242.07 F3 -100.04 -199.10 F4 -80.18 -183.24

For both species the degree distribution for the domain cooccurrence network is better explained by a stretched exponential function than by a power-law, in contrast to what it was suggested before in [8].

[4] Tanaka, R., Yi, T.-M., and Doyle, J. (2005) Some protein interaction data do not exhibit power law statistics. FEBS Lett, 579, 5140–5144. [5] Albert, R. and Barabasi, A.-L. (2002) Statistical mechanics of complex networks. Rev Mod Phys, 74, 47–97. [6] Newman, M. (2003) The structure and function of complex networks. SIAM Review, 45, 167–256. [7] Stumpf, M. P. H. and Ingram, P. J. (2005) Probability models for degree distributions of protein interaction networks. Europhysics Letters, 71, 152. [8] Wuchty, S. and Almaas, E. (2005) Evolutionary cores of domain co-occurrence networks. BMC Evol Biol, 5, 24. [9] Batagelj, V. and Mrvar, A. (2005) Pajek: Program for Analysis and Visualization of Large Networks Reference Manual List of commands with short explanation version 1.06.

¨ Biochemie und Biologie, Karl-Liebknecht-Str. 24-25, Haus 20, 14476 Golm, Deutschland Address for correspondence: Universit¨at Potsdam, Institut fur

Are biological networks scale-free graphs?

The degree distribution for a scale-free graph can be de- scribed by a power-law, .... options are: using the s-metric or an index of between- ness centrality.

365KB Sizes 0 Downloads 120 Views

Recommend Documents

Understanding biological functions through molecular networks - Nature
1Chinese Academy of Sciences Key Laboratory of Molecular Developmental Biology and Center for Molecular Systems Biology, ... Keywords: network, data integration, modularity, molecular function, genetic ..... has been defined as meeting three criteria

Kalman Filtering, Factor Graphs and Electrical Networks
2 and 3 are then defined to represent factors as shown in Table 1 (left column). In Kalman filtering, both ... The representation of Kalman filtering (and smoothing) as an electrical network al- lows an easy transition .... Internal report. INT/20020

Kalman Filters, Factor Graphs, and Electrical Networks
the electrical network which solves the Kalman filter problem. ...... where F1, F2, F3, F4, are the sets of indices of the one-degree function nodes, of summation-.

Neural Graph Learning: Training Neural Networks Using Graphs
many problems in computer vision, natural language processing or social networks, in which getting labeled ... inputs and on many different neural network architectures (see section 4). The paper is organized as .... Depending on the type of the grap

Download Random Graphs and Complex Networks: Volume 1 ...
Networks: Volume 1 (Cambridge Series in. Statistical and ... Mathematics) · All of Statistics: A Concise Course in Statistical Inference (Springer Texts in Statistics)

Studies of biological networks with statistical model ...
Studies of biological networks with statistical model checking: application to immune ... code, or to manually analyze a significant amount of simulation data.

ePub Plausible Neural Networks for Biological Modelling
Book synopsis. The expression 'Neural Networks' refers traditionally to a class of mathematical algorithms that obtain their proper performance while they 'learn' ...

ePub Plausible Neural Networks for Biological Modelling
ePub Plausible Neural Networks for Biological. Modelling (Mathematical Modelling: Theory and. Applications) Read Books. Books detail. Title : ePub Plausible ...

Enterprise Architecture What are Computer Networks
means only one device at a time can transmit on the ring. If a computer wants to send ..... 802.12 – Demand priority. - 802.14 – Cable TV broadband networking.

Displacement-Time Graphs
A car moving at… a constant speed of +1.0 m/s a constant speed of +2.0 m/s a constant speed of +0.0 m/s. A car accelerating from rest at +0.25 m/s. 2.

Equations? Graphs?
step takes a lot of critical thinking and trial and error. 4. What did you learn about Algebra in this project? Explain. There can be multiple solutions to a single ...

graphs-intro.pdf
this book, we represent graphs by using the abstract data types that we have seen ... The simplest representation of a graph is based on its definition as a set.

Displacement-Time Graphs (Make)
A car moving at… a constant speed of +1.0 m/s a constant speed of +2.0 m/s a constant speed of +0.0 m/s. A car accelerating from rest at +0.25 m/s. 2.

Skip Graphs - IC-Unicamp
Abstract. Skip graphs are a novel distributed data structure, based on skip lists, that provide the full functional- ... repairing errors in the data structure introduced by node failures can be done using simple and straight- .... search, insert, an

Skip Graphs - IC/Unicamp
ble functionality. Unlike skip lists or other tree data structures, skip graphs are highly resilient, tolerating a large fraction of failed nodes without losing con-.

graphs-intro.pdf
It is only when you start considering his or her relation- ships to the world around, the person becomes interesting. Even at a biological level,. what is interesting ...

Skip Graphs - IC-Unicamp
[HHH+02] Matthew Harren, Joseph M. Hellerstein, Ryan. Huebsch, Boon Thau ... [HKRZ02] Kirsten Hildrum, John D. Kubiatowicz, Satish. Rao, and Ben Y. Zhao ...

Graphs with few 3-cliques and 3-anticliques are 3 ...
Feb 18, 2014 - ∗School of Computer Science and engineering, The Hebrew University of ... It is still a major open question to get a good description of these.

Function Graphs Reference.pdf
There was a problem previewing this document. Retrying... Download. Connect more apps. ... Function Graphs Reference.pdf. Function Graphs Reference.pdf.