Behavior Evolution and Event-Driven Growth Dynamics ...

Viewer
Transcript

IEEE International Conference on Social Computing / IEEE International Conference on Privacy, Security, Risk and Trust

Behavior Evolution and Event-driven Growth Dynamics in Social Networks Baojun Qiu∗ , Kristinka Ivanova† , John Yen† , and Peng Liu† ∗ Department

of Computer Science and Engineering of Information Sciences and Technology The Pennsylvania State University, University Park, PA 16802 Email: [email protected], {kivanova, jyen, pliu}@ist.psu.edu † College

been well explored. Social networks typically capture relationship (or connections) between actors (e.g., authors) through events (e.g., coauthoring a paper). In many real networks, the connections are formed between actors because they participate in the same event. For example, in collaborative networks, an event is that researchers coauthor a paper together. Depending on the number of participants in an event, multiple connections can be formed in the networks. Therefore, an event-driven model is a natural way to model the growth of these networks. Also, for these networks, we can see a lot of cliques with three or more nodes (A clique Kn is a simple graph with n nodes in which all pairs of the nodes are connected). It is hard to recreate this phenomenon using a non-event driven model. In addition, event-driven models are more general in terms that any non-event driven model is equivalent to a corresponding event-driven model with the restriction that exactly two actors participate in each single event. At last, in the event-drive context, we can model rich information. For example, three people, A, B and C, participate in an event. A know B and C, but B and C do not know each other, then we know A is a “’bridge’ between B and C. Also, if we further consider the properties of the events and causal relationship between the events, we may get more insights. In an evolving social network, the set of nodes and the set of edges change over time due to new nodes joining, old nodes leaving, and the formation of new connections. The behavior of different nodes can be very different and can evolve over time [11]. For example, in scientific collaboration networks, researchers usually publish papers in collaboration with more senior researchers when they are junior, and to help more junior researchers as they gain experience in the field. In addition, the behavior of a whole network can also evolve over time. For example, a research community may grow slowly at its start stage, and then the growth rate may increase rapidly when it attracts more and more people. Most of the existing growth models have not considered the behavior evolution, especially from the angle of event-driven growth. In this paper, we study the joint effect of attachedness and locality on event-driven social network growth, study the behavior evolution at both the network level and the node level, and propose an evolution-aware event-driven locality

Abstract—In many social networks, the connections between actors are formed because they participate in the same event, such as a set of scholars coauthoring a paper and a person making phone calls or having teleconferences with his friends. Therefore, we propose an event-driven framework for creating network growth models. We also notice that in evolving networks, both the behavior of the whole network and the behavior of nodes evolve over time. For example, we observe in collaborative networks that the growth rates of the communities and the average number of coauthors in papers change as the network sizes increase over time, and researchers’ interactions with local groups and remote groups also evolve over time with their experience (degree). These observations motivate us to propose a behavior evolution-aware event-driven locality and attachedness based model to capture the growth dynamics in social networks. Based on some informative metrics of network structure and properties, such as degree distribution, degree-dependent clustering coefficients, and degreedependent average degree of neighbors, the experiments suggest that our model can better characterize the growing process and simulate important network structures observed in real networks than other non-event driven and non-behavior aware models.

I. I NTRODUCTION Social network and complex network studies have attracted increased interest in recent decades. Many real-world complex networks have been shown to exhibit a set of properties that distinguish them from random networks [1]. The properties include power-law degree distribution, the small world effect, and kinetic properties exhibited in the growing process of social networks [2]–[4], e.g., shrinking diameter phenomenon. Assortative mixing, a specific network correlation feature, has also attracted interests from researchers in recent years [5], [6]. This feature characterizes the correlations between properties of adjacent network nodes and has a profound effect on the structure and functioning of the networks. These properties in real networks represent a significant departure from random networks. However,there is no consensus on the cause of these non-random features. Recently, there has been a flurry of efforts by researchers from different disciplines exploring a variety of factors, including the attachedness factor (the degree of nodes) and the locality factor (distance between nodes), to discover the growth schemes of social networks [7]– [10]. Section II provides an overview of the existing studies. However, the joint effect of the attachedness (degree) factor and the locality factor on network growth dynamics has not 978-0-7695-4211-9/10 $26.00 © 2010 IEEE DOI 10.1109/SocialCom.2010.38

217

and attachedness-based growth model. Based on simulation results, we discover that our model can better characterize the growth of the nanotechnology community in terms of exhibiting properties such as degree distribution, clustering coefficients, and assortative mixing. The rest of this paper is organized as follows: Section II introduces the background of this study and gives a brief review of the related work; Section III presents some observations in real social networks that motivate our work; Section IV first describes the proposed event-driven driven framework for network growth models, then introduces an evolutionaware event-driven locality and attachedness based model that incorporates the joint effect of locality and attachedness and considers both the node-level and the network-level behavior evolution. Section V presents quantitative analysis on some important aspects of the model settings, compares the proposed model with some other models, and shows the impact of the model on the topological properties and correlated properties of the networks. Finally, Section VI notes some potential future work and concludes this paper.

specifies that at each time step a new node is added to the network. The new node copies a number of links from a “prototype” node that is selected randomly from the existing nodes whereas choosing the remaining neighbors is random. Similar graph growth mechanisms also include models that implicitly or explicitly rely on the locality heuristics [4], [14], [16]–[18], [22] or specified feature similarity (correlation) between nodes [23]. Some models also explicitly or implicitly exploit the joint effect of distance and locality. Vazquez [21] has designed the Walking on a network scheme to simulate the graph growth process. At every time step, a new node vi is added and linked to a randomly selected node vj through a directed edge. The node vi then mimics a “random walk” on the network by following the edges starting from node vj and linking to their end points with probability p. This step is repeated for those nodes to which new connections were established, until no new target node is found. Some more recent work on this front includes Morris and Goldstein’s team-based Yule model [19] and Zhang’s DDG model [10]. The team-based Yule model maintains teams during modeling, and uses preferential attachment for within-team actor selection, and random selection of actors outside the team for new collaboration. Hence, it adopts a binary locality measure (i.e., whether an author is within a team or outside of a team). In contrast, DDG model uses the ratio of degree and distance to select two nodes to connect in the networks. In other words, DDG model uses a “continuous” measure for locality. Guimera et al. [14] propose a team assembly mechanism by investigating the interplay between “incumbents” and “newcomers” in the context of collaboration networks. The model implicitly incorporates the node behavior evolution into modeling. Morris and Goldstein’s Yule model [19] uses some aspects of the idea of event-based simulation. However, their work is different from ours in many ways. First, we explicitly present an event-driven framework to facilitate the modeling of growth dynamics of event-driven networks. Second, we study the behavior evolution of nodes and networks, and our proposed model is evolution-aware. Third, we study and consider the joint effect of the factor of degree and the distance. Fourth, our model is more efficient than Yule model because we do not have to maintain any team structure.

II. R ELATED WORK Lots of research on the dynamics of complex networks has been done in recent years, including static analysis on social network evolution, and modeling the static topological properties and dynamic patterns observed in social networks. The majority of the related studies focus on either the attachedness factor or the locality factor. Attachedness measures how well nodes are connected to other nodes in complex networks and therefore usually reflected by the degree of nodes. Barabasi and Albert develop a notable preferential attachment theory that specifies high degree nodes are always favored when building new connections in their well-received article [7]. A model is proposed that new nodes are added to the network one by one, and the probability that a new node will be linked to anP existing vertex depends on the existing vertex’s degree di , di / j dj . This simple scheme is demonstrated to result in power-law degree distribution and a “rich get richer” phenomenon. Many existing models exploit the locality factor explicitly or implicitly, and assume that the formation of a new connection between two nodes is related to their distance in the existing topology. Jost and Joy [15] describe a purely distance-based scheme where each new node is connected to a randomly selected node and the subsequent connections are related to the distance of the destination node. Davidsen et al. [13] present a referral model that connections are always formed between two nodes that share a common neighbor. This model emulates the real-world scenario that one person introduces two of his acquaintances to get to know each other. Such a simple evolution scheme is viewed as the basis of the evolution of social networks. The authors demonstrate that this simple scheme is able to reproduce major nontrivial features of social networks including short path length, high clustering, and scale-free or exponential degree distribution. The scheme is also known as the triangle-closing model [24]. Copying mechanism [9]

III. O BSERVATIONS AND M OTIVATIONS In this section, we present some observations in the nanotechnology collaboration network, NanoSCI, as well as the motivations for our evolution-aware event-based hybrid model. N anoSCI is appealing for investigating social network growth dynamics for the following reasons. Firstly, collaboration networks have been widely used in scientometrics and social networks studies. Secondly, collaboration networks have been known to possess many static and dynamic properties that are similar to other social networks [2], [20]. Thirdly, NanoSCI offers an extensive database including 292, 323 researchers and 368, 511 papers that are indexed by the Science Citation Index (SCI) database spanning 1980 to 2006. In this paper,

218

(a)

(b)

(c)

Fig. 1. (a) Degree distributions for N anoSCI and N anoP article obey power-law like dependence with an exponential cut-off P (k) ∝ k−τ e−k/kc . (b) The percentage of nodes with certain degrees that form new connections. (c) The percentage of node pairs with certain distance that form new connections

we use data from 1980 to 2005 because our data for 2006 is not complete since they were collected in August 2006. In the following subsection, we first study the degree distributions. Then, we exploit the effects of both attachedness and locality in network growth dynamics. At last we systematically study the network level behavior evolution and the node level behavior evolution.

for N anoSCI(2005) and N anoP article(2005), the ratios are relatively smaller—it suggests that nodes with extreme high degree could be less active. From Figure 1b, we can also see that the majority of the edges are formed between node pairs at distance 2. To distinguish the joint effect of degree and distance, we need to find a function F (.) satisfying P r(u, v) ∝ F (d(u), d(v), r(u, v)) such that its marginal distribution on d(u) (or d(v)) and on r(u, v) has a similar shape as that shown in Figure 1b and Figure 1c respectively, where u and v are two nodes in a network, P r(u, v) is the probability to form connection between u and v. d(.) is a function to get the degree of a node, r(.) is a function to return the distance between the input node pair. It is generally a very hard problem to discover the function. One of the approximating ways is to define a set of simple functions such as exponential, log, multiplication, minus etc, then use genetic algorithm (GA) [12] to build formulas based on the predefined function set. Maximum likelihood estimate (M LE) can be used to choose a formula that has the best fit to data. Due to space limit, we leave this to another paper but use some simple formula. The advantage of this way over the GA-based approach includes that it is simple, easy to interpret and easy to compare with those existing models.

A. Degree Distribution Figure 1a shows the degree distribution for N anoSCI and N anoP article in 2005. N anoP article is a sub-community in N anoSCI. From the figure, we have the following observations: first, the distribution is roughly a power law distribution with an exponential cut-off (P (k) ∝ k −τ e−k/kc ). The estimated τ and kc are shown in the figure. This observation agrees with those reported by researchers in many other real datasets. Second, an exception occurs when the degree is too small in that fewer nodes with a small degree than the power law suggests. The reason is that many papers are written by more than two coauthors, thus, usually more than two edges are added in the collaborative network for a researcher even if the researcher just publishes a single paper. This phenomenon cannot be well simulated in existing non-event driven models. B. Effects of Locality and Attachedness in Social Networks

C. Network level behavior evolution

The locality and attachedness has been traditionally considered as principal factors in the formation of connections. In Figure 1b, we show the proportion F d(k) = Mk /Nk , where Nk denotes the number of node with degree equal to k and Mk are the nodes among them that form new edges in the next year (2002 or 2005). In Figure 1c, we show the proportion F r(r) = Mr /Nr , where Nr denotes the number of node pairs at distance r and Mr are the pairs among them that form new edges in the next year (2002 or 2005). These demonstrate explicitly that nodes form new links proportionally to the degrees (when degrees are not extremely large) of the nodes and inversely proportionally to the topological distance. Note that in Figure 1a, for extreme large degrees (> 100), the ratios for N anoSCI(2002) and N anoP article(2002) are not accurate due to too few nodes with such large degrees, while

In this section, we study the network level behavior evolution. We assess the growth rate in terms of the number of papers (events), researchers (nodes), and collaborations (edges). We also study the patterns of the number of coauthors in papers. 1) Growth Rates: Figure 2a shows in log-log scale the edge growth versus node growth for the N anoP article and N anoSCI communities respectively with duplicated edges removed. It appears that the growth speed is almost linear in the log-log scale which implies that the edge growth increase as power law as a function of the node growth. The regression results show that their growth rates are |E(t)| = 2.3453 ∗ |V (t)|1.0238 and |E(t)| = 2.5475 ∗ |V (t)|1.0409 respectively. E(t) and V (t) are the edge set and the node

219

(a)

(b)

(c)

Fig. 2. (a) The number of papers and the number of nodes increase linearly in log-log scale with the number of nodes. (b) Distribution of the number of secondary authors per paper vs. Poisson distribution. (c) The average number of coauthors per paper evolve linearly in semi-log scale with the number of total nodes in both N anoSCI and N anoP article

N anoP article match a Poisson distribution closely, although they have heavier tails than a Poisson. The average number of coauthor (plus the number of secondary author by 1) is 4.3576 and 4.3563 for N anoSCI and N anoP article respectively. We also notice that the average number of participants (coauthors) in events (papers) evolve over time. Figure 2c shows that the mean numbers of coauthors in both N anoSCI and N anoP article increase nearly linearly in semi-log scale from around 4.0 to nearly 5.0 in the latest 10 years. It suggests that researchers tend to be more collaborative in recent years. Similar observations have also been reported, for example, in the collaboration network of computer science [25]. The average number of participants in an event-driven model is very important because it decides the order of the resulted cliques in the collaborative networks, and the edge density of the resulted subgraph because the edge density of a clique is n(n−1)/2 #edge = n−1 #node = n 2 . Also, it has an effect on average separation and clustering coefficients in networks. Therefore, it is important to include the evolution of the average number of participants in the growth model.

set in the cumulative network at time t respectively. The corresponding edge densification rates (derivative on V(t)) for the two communities are ∆E(t) = 2.4011 ∗ |V (t)|−0.0238 and ∆E(t) = 2.6517 ∗ |V (t)|−0.0409 respectively. The edge density is important for growth models. For example, the preferential attachment model (P A) [7] assumes that the number of edges has a linear relationship with the number of nodes. With different setting of the slopes in the linear relationship, the model shows different behaviors on some properties, e.g., clustering coefficients. In the Section V-C, we also present a variant preferential attachment modelAP A. AP A uses the growth rate learned from N anoSCI instead of linear growth. It is observed that AP A shows a different behavior from P A. Because we are studying event-based models, we also study the relationship between the number of paper-writing events and the number of nodes. Figure 2a also shows the node growth versus event growth for the N anoP article and N anoSCI communities respectively. It appears that the growth speed is almost linear in the log-log scale, which implies that the node growth increase as a power law function of the event growth. The regression results show that their growth rates are |D(t)| = 1.0636 ∗ |V (t)|0.9850 and |D(t)| = 1.2746 ∗ |V (t)|0.9855 respectively, where D(t) is the number of events (papers) occurred before time t. Thus, the corresponding node densification rates for the two communities are ∆D(t) = 1.0476 ∗ |V (t)|−0.0150 and ∆D(t) = 1.2561 ∗ |V (t)|−0.0145 .

D. Node Level Behavior Evolution In this section, we study some aspects of behavior evolution of nodes in N anoSCI. 1) Distribution of research lifetime of researchers: We study the distribution of research lifetime of researchers in N anoSCI. The research lifetime of a researcher is defined as the length in years from the researcher joining the community to leaving the community. However, there is no explicit signal that whether a researcher leaves or not. Therefore, we decide that a node has left if it has been inactive for 3+ years [11]. Figure 3a shows the distribution of lifetimes of nodes that have been inactive for 3+ years in N anoSCI in 2005. The lifetime suggests how soon nodes evolve from active to inactive. In other words, we are using a binary measurement of the researcher activeness. In the figure, we see that about 80% researchers switch from active status to inactive in 1 year and a very small fraction of researchers can stay active in the community longer than 5 years. This makes sense because many coauthors are graduate students, they leave the

2) The Number of participants in events: Depending on the number of participants for an event, the number of new connections formed in a collaborative network is different. To connect to the number of edges with the number of events, we also study the number of participants in events. In other words, we study the distribution of the number of coauthors in papers. It is reported that the number of secondary authors (authors other than the first author) tends to be a Poisson distribution [19]. Our observations in N anoSCI and N anoP article verify it. In Figure 2b, the distributions of the number of secondary authors in both N anoSCI and

220

end nodes of the edge right before the formation of the edge. In Figure 4, we show distributions of span distances of the new edges connected to nodes with different degree ranges. Note that repeated edges (hop distance equals to 1) are removed. From the figure, we see that nodes with all levels of seniority (degree) have significant high probabilities to connect to nodes 2 hops away. The probabilities to connect to nodes with long distance tend to decrease for all types of nodes, and the trends are more significant for nodes with rich experience. For example, the most junior researchers (black star solid curve) have almost identical probabilities to make new edges spanning 3-7 hops and then have less and less probabilities to make new edges spanning more hops. For the most senior researchers (blue square solid curve), the probabilities always decrease as the span distance increases. This may suggest that senior researchers usually have stable local groups to collaborate with and may have stable research topics as well. The factor of locality seem always to play a role when making connections for senior researchers. However, for junior researchers, the locality has much less effect especially when hop distance equal to 3-7. Also, junior researchers have higher probabilities than senior ones to connect to nodes that are originally far away. In other words, junior researchers are more likely to jump to new topics. Note that the curves corresponding to nodes with high degrees may be shorter, because they on average have smaller separation to all nodes in the networks. In the figure, Distance = Inf indicates that edges are formed between two disconnected nodes, and Distance = −1 means that the edges are connected to new added nodes.

community after they graduate, and only a few of them may stay in the community as faculty or scientists. The figure also show the Seniority [11] distribution of all nodes and active nodes in 2005. Seniority measures the length of time the nodes have been active in the networks. However, it is hard to model time in years in growth models. Note that the degrees and the active time of nodes has high correlations for both N anoSCI and N anoP article. In other words, junior researchers usually have small degrees and active senior researchers usually have high degrees, and vice versa. Therefore, we can use degrees of the leaving nodes to approximate their lifetimes. Figure 3b shows the degree distributions of inactive nodes in N anoSCI in 2005. In other words, it shows the lifetimes measured in degree. In the simulation, when a node joins the network, the model randomly samples a maximum allowed degree the node can have from the lifetime (measured in degree) distribution observed in the real data. Once the node achieves the maximum degree, it becomes inactive. Note that the maximum number of events a node can participate in is a good measure of lifetime as well, and we see that models using either maximum allowed degrees or maximum allowed events show similar results.

(a)

Fig. 4. Distributions of span distances of the new edges connected from nodes with certain degrees in N anoSCI in 2005. Distance r = Inf indicates edges connected to disconnected nodes, and r = −1 indicates edges connected to new nodes with degree 0

IV. A N E VENT- DRIVEN F RAMEWORK AND A N E VOLUTION - AWARE H YBRID G ROWTH M ODEL

(b) Fig. 3. (a) Distribution of Lifetimes in years and Seniority distributions of nodes in N anoSCI in 2005. (b) Degree distributions of all nodes in N anoSCI and of inactive nodes obey power law like dependency with an exponential cut-off P (k) ∝ k−τ e−k/kc

We have argued that an event-driven model may be a more general and natural way to model networks, and have seen that both locality and attachedness play important roles in network dynamics. We have also studied the behavior evolution. In this section, we propose an event-driven framework for modeling

2) The span distance of new edges vs. the degree of nodes: Span distance of a new edge is the distance between the

221

networks. Based on the framework, we develop an evolutionaware event-driven locality and attachedness based growth model. To compare the networks generated by different models, it is important to guarantee that the numbers of nodes and edge densities in the networks are identical. For non-event driven models, they can directly use the same edge densities defined in section III-C1. To compare event-driven with nonevent driven models, we should also make them have the same edge densities. In the following specification of event-driven framework, we also show how to fulfill this:

set according to the percentage of papers including new researchers in the real data; • SetP ← SetP ∪ {u}; • W HILE d(u) == 0 AN D |SetP | < m – Randomly sample an active node u using preferential attachment schema; – SetP ← SetP ∪ {u}; • W HILE |SetP | < m – Based on d(u), sample a distance r according to P r(r) ∝ Frd (r|d). Frd (r|d) is learned from real data (more information in Section III-D2) or some approximating functions; – Randomly select nodes with distance r; In summary, this model first decides the number of participants based on a stochastic Poisson distribution, then samples a node as leading node with using preferential attachment. Whenever the sampled node is not connected to the graph, a new node is sampled as leading node and the previous node is kept as a participant. Then, based on the degree of the leading node, it decides the probabilities for how far away to make new connections, and then randomly chooses nodes. This model is proposed based on the observations and motivation introduced in previous Section III.

1) t ← 0; 2) Add an event as follows: a) Sample m (the number of participating nodes) according to P r(m) ∝ Fm . Fm is the distribution of the number of participants in events. It can be a Poisson distribution or the distribution observed in real networks, please refer Section III-C2; m+1 b) W HILE > |E(t)|−|Ec |, where |Ec | 2 is the number of edges in the current network and |Et | is the number of edges at time t estimated according to some predefined edge density; i) Repeat: add one new node and set t ← t + 1. c) Sample m nodes as participants based on some schema, for example preference attachment or hybrid schema, put all selected nodes into SetP (the set of participants in the event) d) Form connections between nodes in SetP according to some schema, for example, for a collaborative network, an edge is formed between any node pair; 3) Repeat 2).

V. S IMULATION AND E VALUATION We set up experiments to evaluate the proposed framework and EELAG model. We use experiments to compare eventdriven models with the corresponding non-event driven models, and study the effect of the average number of participants of events in event-driven models. We also compare EELAG with some other models that use attachment preference or locality preference, respectively. We set the growth rates as the same as that in N anoSCI for all models.

Based on the framework, we propose an Evolution-aware Event-driven Locality and Attachedness based Growth model (EELAG) described as follows: Replace Step 2a with: •

A. Event-driven vs. Non-Event-Driven In this subsection, we study the difference on clustering coefficients in networks generated by event-driven models and corresponding non-event driven models. To focus on the comparisons between event-driven and non-event driven models and avoid including too much side-effects of other factors, a simple purely random model (P R) and its corresponding event-driven variant (EP R) are used. In P R model, in each time step, a node is added and some connections are made between uniformly randomly selected pairs of nodes. For EP R, we only need to change Step in the event-driven framework as: uniformly randomly select m nodes as the participants for the event. An identical edge density is used for both P R and EP R. Figure 5 shows degree-dependent clustering coefficients, C(k), that is defined as the average local clustering coefficients (LCC) of all nodes with degree k. C(k) of N anoSCI can be reasonably fit by a power law C(k) ∝ k −α with α = 0.82. This kind of power-law decay of degree dependent clustering coefficients is a signature of a hierarchical structure in the network [26]. The networks generated by EP R has higher

Sample m according to P r(m) ∝ Fm (λ(t)). Fm is a stochastic Poisson distribution with its mean evolve over time or the distribution observed in real networks (See Section III-C2 for details);

Step 2(b)i is changed to: •

Repeat: add one new node n, sample the lifetime of the node nlif etime according to P (f ) ∝ Ff , set lif etimedegree (n) ← nlif etime , and set t ← t + 1. Ff is the lifetime (measured in degree) distribution predefined or learned from real data (refer to III-D1 for more information);

Step 2c is replaced with the following statements: •

With a probability Pα , set s ← the newly added node. Otherwise, randomly sample an active node u (d(u) < lif etimedegree (u)) according to a preferential attachment schema: P r(u) ∝ P d(u)+1 , and set s ← u. Pα is (d(v)+1) v

222

C. Comparisons between EELAG, AP A and ADG on Simulating N anoSCI In this section, we compare EELAG with AP A and ADG models. AP A is a variant from Barabasi and Albert’s P A model [2]. In each time step, one new node is added, and the number of new edges are decided from the growth rate and edge density learned from N anoSCI. We do the modifications for the purpose of fair comparisons that all models should follow the same growth rate and edge density. The edges in each step are formed between the newest node and other nodes selected according to their degrees. ADG also adds one node and a number of new edges decided by the N anoSCI’s growth rate at each step. It first randomly selects a start node u, and then select end nodes with a probability in proportion to the distance to the end nodes. For disconnected nodes, the distances are defined as a big enough value instead of infinity. Therefore, the disconnected nodes also have chance to form edges. For both AP A and ADG, networks are grow on a start network. The start network has 500 nodes and edges are formed by simply purely random process as that in P R. The edge density is as the same as that in N anoSCI. We choose AP A and ADG because they are derived from classical models and use the factor of the degree and the distance respectively.

Fig. 5. The average local clustering coefficients as a function of the degree of nodes

LCC than those generated by P R, and share the same trend with C(k) observed in N anoSCI. Note that for each model, we generate 50 networks and the figure shows the average C(k) calculated from the networks. For all the following experiments, we do it the same way. B. The Effect of Average Number of Participants in Eventdriven Models In the event-driven model, the numbers of participants of events are sampled from a Poisson distribution. In this subsection, we study how the average number of participants affects the clustering coefficients. Again, we use the simple purely random event-driven model, EP R, instead of more complex models to focus on identifying the effect of the average number of participants in the event-driven models. We use three versions of the model with the Poisson’s mean equal to 2, 4, and 8 respectively. The edge density is fixed in all three variants.

1) Degree distribution: In this subsection, we study the degree distributions of the simulated networks by different models and compare them with the distribution observed in N anoSCI. Figure 7a suggests that EELAG has a better performance. EELAG recreates the similar phenomenon in N anoSCI on the proportions of nodes with small degrees, while ADG and AP A fail on that. For proportions at higher degrees, the networks generated by EELAG agree with N anoSCI and have very close power-law decay. 2) Degree-dependent clustering coefficients: We study C(k) of the networks simulated by different models. From Figure 7b, we see that AP A fails to simulate the trends observed in N anoSCI. Both EELAG and ADG show similar trends on C(k) to N anoSCI, however, EELAG is much closer to N anoSCI. 3) The average degree of the nearest neighbors: Social networks are known to be assortative that the degree of connected nodes shows a positive correlation. Statistical analysis can be extended by inspecting knn (k), which is the average degree of neighbors of all nodes with degree equal to k. For assortative (disassortative) networks, knn (k) is monotonically increasing (decreasing) function of k. N anoSCI is a assortative network and knn (k) of N anoSCI can be approximated by a power law knn (k) ∝ k −β , with β = 0.21. Again, EELAG performs better than both AP A and ADG, and its knn (k) is very close to that of N anoSCI in Figure 7c. Note that the behavior of AP A is different from that of Barabasi and Albert’s P A [2] reported by Newman [5] because AP A uses an evolving growth rate observed in N anoSCI instead of a constant linear growth.

Fig. 6. The average local clustering coefficients as a function of the degree of nodes

Figure 6 shows that with different average numbers of participants, although the trends of C(k) are similar, the absolute values of C(k) of the generated networks are quite different. We see that C(k) is higher in networks generated by models with larger average numbers of participants.

223

(a)

(b)

(c)

Fig. 7. (a) Degree distributions. (b) The average local clustering coefficients as a function of the degree of nodes. (c) The average degree of neighbors as a function of degree

VI. C ONCLUSIONS

[8] E. M. Jin, M. Girvan, and M. E. J. Newman. The structure of growing social networks. Physical Review E, 64:046132, 2001. [9] R. Kumar, P. Raghavan, S. Rajagopalan, D. Sivakumar, A. Tomkins, and E. Upfal. The web as a graph. PODS, pages 1–10, 2000. [10] H. Zhang, B. Qiu, K. Ivanova, H. C. Foley, C. L. Giles, J. Yen, Locality and attachedness-based temporal social network growth dynamics analysis: A case study of evolving nanotechnology scientific collaboration networks. Journal of the American Society for Information Science and Technology. Volume 61 Issue 5, Pages 964–977, 2010 [11] B. Qiu, K. Ivanova, J. Yen, P. Liu. Study of effect of node seniority in social networks. In Proceedings of the IEEE International Conference on Intelligence and Security Informatics (ISI), 2010, pp.147–149. 23-26 May 2010, Vancouver, BC, Canada [12] M. Mitchell. An Introduction to Genetic Algorithms. Cambridge, MA: MIT Press. 1996. [13] J. Davidsen, H. Ebel, and S. Bornholdt. Emergence of a small world from local interactions: Modeling acquaintance networks. Physical Review Letters, 88:128701, 2002. [14] R. Guimera, B. Uzzi, J. Spiro, and L. A. N. Amaral. Team Assembly Mechanisms Determine Collaboration Network Structure and Team Performance. Science, 308(5722):697–702, 2005. [15] J. Jost and M. P. Joy. Evolving networks with distance preferences. Physical Review E, 66, 2002. [16] G. Kossinets and D. J. Watts. Empirical analysis of an evolving social network. Science, 311:88–90, 2006. [17] L. Krapivsky and S. Redner. Organization of growing random networks. Physical Review E, 63:066123, 2001. [18] D. Liben-Nowell and J. Kleinberg. The link prediction problem for social networks. Journal of American Society for Information Science and Technology, 58(7):1019–1031, 2007. [19] S. A. Morris and M. L. Goldstein. Manifestation of research teams in journal literature: a growth model of papers, authors, collaboration, coauthorship, weak ties and lotka’s law. Journal of American Society for Information Science and Technology, 58(12):1764–1782, 2007. [20] M. E. J. Newman. Coauthorship networks and patterns of scientific collaboration. Proceedings of National Academy Sciences USA, 101 Suppl1:5200–5205, April 2004. [21] A. Vazquez. Knowing a network by walking on it: emergence of scaling. Europhysics Letters, 54: 430–435, 2001. [22] D. J. Watts, P. S. Dodds, and M. E. J. Newman. Identity and search in social networks. Science, 296:1302–1305, 2002. [23] Q. Xuan, Y. Li, and T.-J. Wu. A local-world network model based on inter-node correlation degree. Physica A Statistical Mechanics and its Applications, 378:561–572, May 2007. [24] J. Leskovec, L. Backstrom, R. Kumar, and A. Tomkins, Microscopic evolution of social networks. SIGKDD, pp. 462–470, 2008. [25] J. Huang, Z. Zhuang, J. Li, C. L. Giles. Collaboration Over Time: Characterizing and Modeling Network Evolution. In Proceedings of The 1st ACM International Conference on Web Search and Data Mining (WSDM), pp. 107–116, 2008 [26] A. Vazquez, R. Pastor-Satorras, A. Vespignani. Large-scale topological and dynamical properties of the Internet. Phys. Rev. E 65, 066130, 2002.

In many social networks, connections are formed between actors because they are involved in the same event. For these networks, it is natural, general and powerful to use eventdriven models to characterize the growth dynamics. Therefore, we propose an event-driven framework to facilitate the creation of event-driven growth models. In this paper, we have also study the behavior evolution of nodes in social networks, and exploited the effect of both locality and attachedness on the formation of new edges. These analysis lead us to propose an evolution-aware hybrid model based on eventdriven framework. Based on some metrics which are informative in characterizing the network structure, such as degree distribution, degree-dependent clustering coefficients and knn (k), our experiments show that the networks generated by our evolution-aware event-driven hybrid model exhibit similar structure as real networks. The future work includes: carrying out experiments on more real networks, further studies of important factors of connection formation and their joint effect, modeling events with richer information, and incorporating more aspects of behavior evolution. ACKNOWLEDGMENT This work was supported by a grant from Defense Threat Reduction Agency (HDTRA1-09-1-0054). We thank Dr. Frank Ritter for helpful comments. R EFERENCES [1] B. Bollobas. Random Graphs, Second Edition. Cambridge University Press, 2001. [2] A. L. Barabasi, H. Jeong, Z. Neda, E. Ravasz, A. Schubert, and T. Vicsek. Evolution of the social network of scientific collaborations. Physica A, 311:3, 2002. [3] R. Kumar, J. Novak, and A. Tomkins. Structure and evolution of online social networks. KDD ’06. pages 611–617, 2006. [4] J. Leskovec, J. Kleinberg, and C. Faloutsos. Graphs over time: densification laws, shrinking diameters and possible explanations. KDD ’05: pages 177–187, 2005. [5] M. E. J. Newman. Assortative mixing in networks. Phys. Rev. Letter. 89:208701, 2002.89:208701, 2002. [6] S. Redner. Testing out the missing links. Nature, 453:47, 2008. [7] A.-L. Barabasi and R. Albert.Emergence of scaling in random networks. Science, 286:509, 1999.

224

Behavior Evolution and Event-Driven Growth Dynamics ...

include power-law degree distribution, the small world effect, and kinetic properties ... (the degree of nodes) and the locality factor (distance between nodes), to discover the ...... Structure and evolution of online social networks. KDD '06. pages ...

Download PDF

1MB Sizes 0 Downloads 157 Views

Report

Behavior Evolution and Event-Driven Growth Dynamics ...

Recommend Documents