Analysis of ASD protein-protein networks Ken Aho June 10, 2014

1

Introduction

Previous analyses with minnows identified a number of human autism spectrum disorder (ASD) genes that were enhanced (up-or-down regulated) in the presence of pharmaceuticals and personal care products (PPCPs). In the current work we wish to describe the characteristics of these genes in the context of human networks of neurological genes. We did this by comparing protein-protein network characteristics for complete set of neurological genes to PPCP and non-PCPP enhanced gene sets. A ”complete” human neurological network (based on cite?) was constructed using the Java freeware package Cytoscape that included both primary and secondary protein neighbors. This network had 7212 nodes (proteins). The previously studied gene subset of the complete network was comprised of 999 proteins. Of these, 256 were PPCP-enhanced genes, and 743 were non-enhanced genes.

1.1

Network indices

In a protein network the importance of nodes can be summarized using indices from graph theory. Many of these consider nodal centrality. Central nodes have the most connections to other nodes in the network or graph Newman (2010). Highly connected genes will have have more profound effects on protein networks because they will effect more downstream genes when perturbed (cite?). This has been called a ripple effect (cite?), and has been demonstrated in (???) Consider, for application of indices described below, the network shown in Fig. 1 • Degree centrality of the i th node node is the number of direct attachments of that node to other nodes. Thus node 3 has degree 5, and node 6 has degree 1. • Closeness centrality of a node is the inverse of the sum of the geodesic distances of the node to every other node in the network.  −1 g X Cc (ni ) =  d(ni , nj ) j=1

where ni indicates the i th node, nj indicates the j th node (i 6= j), d indicates geodesic distance, and g is the number of nodes. Node 3 has closeness 1/(1 + 1 + 1 + 1 + 1) = 1/5. Node 6 (and every other node) has closeness 1/(1 + 2 + 2 + 2 + 2) = 1/9. • Betweenness centrality of a node i is the number of ”shortest geodesic paths” between node j and node k (i 6= j 6= k) that node i resides on. X CB (ni ) = gjk (ni )/gjk j
1

where gjk is the number of shortest geodesic paths connecting node j and node k, and gjk (ni ) is the number of shortest geodesic paths that node i occupies. gjk will equal (g 2 − g)/2 − (g − 1). In the case of Fig. 1 we have (62 − 6)/2 - 5 = 10 shortest pairwise paths, excluding those associated with the i th node. For node 3 these are (1,2) = 1, (5,6) = 2, (4,5) = 2, (1,4) = 2, (2,4) = 2, (1,5) = 2, (2,5) = 2, (4,5) = 2, (1,6) = 2, (2,6) = 2, (4,6) = 2, and (5,6) = 2. Node 3 occupies the shortest path in 9 of these. Thus, its betweenness centrality is 9/10. All other nodes have a betweenness centrality of zero. • The clustering coefficeint (also called transitivity) of a node is the proportion of connections among its neighboring nodes that are realized compared to the number of all possible connections among the neighboring nodes. Let k be the number of nodal neighbors for the nominal node of interest. Then the number of possible pairwise conections between those neighbors is (k 2 − k)/2. In Fig. 1 Nodes 3 has 5 neighbors. Thus, there are (52 − 5)/2 = 10 possible connections between those neightbors. However only one set of neighbors actually communicate. Therefore the clustering coefficient of node 3 = 1/10 = 0.1. Nodes 1 and 2 each have 2 neighbors with (22 −2)/2 = 1 possible connections. In both cases this connection is realized. Thus, the coefficient for nodes 1 and 2 is 1. The clustering coefficient of nodes 4, 5, and 6 is undefined because these nodes each have only one neighbor, preventing simultaneous consideration of communication between the nominal node and neighboring nodes, and communication among neighboring nodes. • The topological coefficient of a node measures the proportion of neighboring nodes shared with other nodes. Nodes with one or no neighbors are typically assigned a topological coefficient of 0. Node 3 has 5 neighbors, and 2 of these communicate. Therefore the topolocial coefficient for node 3 is 2/5 = 0.4. • The average shortest path for a node is simply the average of the shortest geodesic distances to every other node in the network. Node 3 has an average shortest path of (1 + 1 + 1 + 1 + 1)/5 = 1, nodes 4, 5, and 6 have an average shortest path of (1 + 2 + 2 + 2 + 2 + 2)/5 = 1.8, and nodes 1 and 2 have an average shortest path of (1 + 1 + 2 + 2+ 2)/5 = 1.6. • The eccentricity of a node is simply the largest geodesic distance of that node to all other nodes in the network. Node 3 has an eccentricity of 1, and all other nodes have an eccentricity of 2. • The stress of a node is the number of shortest paths passing through it. Thus, a node will have high stress if it is traversed by a large number of shortest paths. Again, we have 10 shortest paths in Fig. 1. Node 3 is on 9 of these, so it has a stress of 9.

2

library(igraph) g <- graph.formula(1-2-3-1, 4-3-5, 3-6) plot(g)

5

1

3

4

2

6

Figure 1: Example network with 6 nodes.

1.2

R and analyses

Sophisticated and flexible network analyses are possible by combining features of the R package RCytoscape with Cytoscape. Users should be aware of several issues when using RCytoscape . • The documentation is somewhat rudimentary, and little advice is given concerning importation of 3

information of information from Cytoscape to R. The primary interest appears to be the opposite scenario. • Imported nodal information from Cytoscape will have zeroes substituted for NAs. This may lead to a situation in which some zeroes are real and some are not. library(RCytoscape)

# # # # #

1) 2) 3) 4) 5)

open cytoscape, open network (e.g., open .cys file) in Cytoscape access and add the CytoscapeRPC plugin In CytoscapeRPC settings, use Port 9,000 and enable XML-RPC Make sure that Cytoscape RPC is enabled

# Here a I transfer a network called network1 in Cytoscape, from Cytosape to R net <- existing.CytoscapeWindow ('network1', copy.graph.from.cytoscape.to.R = TRUE) complete <- getAllNodeAttributes(net) # Here are summaries of nodal characteristics #----------- Examples ----------# # getNodeAttribute(net, "2", "Degree") # returns degree for node 2 # noa(net@graph,"Degree") # returns degrees for all nodes #-------------------------------# source("C:\\Users\\Ken Aho\\Documents\\Gaurav\\genelist.r") # corrects for NAs/0s issues

# Here is a file distinguishing PCPP and non-PCPP-enhanced nodes dvd <- read.csv("C:/Users/Ken Aho/Documents/Gaurav/DrugvNonDrug.csv") # Here I split PCPP-enhanced and PCPP non-enhanced groups drug <- dvd[dvd[,3]=="Drug",] ndrug <- dvd[dvd[,3]=="Non-Drug",]

# Here I match the PCPP-enhanced group to the complete network m <- match(drug$Gene, complete$Gene.name) m.no.na <- m[!is.na(m)] drug.sub <- complete[m.no.na,]

# Here I match the PCPP non-enhanced group to the complete network mnd <- match(ndrug$Gene, complete$Gene.name) mn.no.na <- mnd[!is.na(mnd)] ndrug.sub <- complete[mn.no.na,]

# Here I match nodes are in neither the PCPP nor non-PCPP-enhanced group o <- match(complete$Gene.name, c(as.character(ndrug$Gene), as.character(drug$Gene))) on.na <- which(is.na(o)) o.sub <- complete[on.na,]

4

2

Univariate distributional comparisons of network indices

Empirical distributions of nodal characteristics were defined for the complete network along with the subsets of the complete network. Summaries can be found in Tables 1-3. The distributions of many network indices follow the power law. That is, after log transformation of the index values and log transformation of the frequencies of those index values, a linear relationship will exist between the variables. A number of the plot in this section use this transformation. We define 4 groups: 1. PCPP-enhanced: Genes which were enhanced in the presence of PCPPs 2. PCPP-non-enhanced: Genes which were not enhanced in the presence of PCPPs 3. Complete: Genes in the complete neurological network. This group includes both PCPP-enhanced and non-PCPP-emhanced groups 4. Other: Genes in the complete network which are neither PCPP-enhanced nor non-PCPP-enhanced.

stats <- function(x){ ds <- data.frame(statistics = c(as.integer(length(na.omit(x))), min(na.omit(x)), max(na.omit(x)), mean(na.omit(x)), median(na.omit(x)), HL.mean(na.omit(x)), sd(na.omit(x)), IQR(na.omit(x)), kurt(na.omit(x)), skew(na.omit(x)))) rownames(ds) <- c("$n$", "min", "max", "$\\bar{x}$", "median", "HL-loc.", "$s$", "IQR", "kurtosis", "skew") return(ds) }

ds <- drug.sub[,c(6,4,2,5,29,1,27,7,28,24)] d.all <- data.frame(apply(ds, 2, stats)) nom <- c("Degree","Close.","Between.", "Clust coef.","Top. coef", "Avg. Short. path", "Radiality", "Eccentricity", "Stress","Connectivity") names(d.all) <- nom d.all <- as.matrix(d.all) nds <- ndrug.sub[,c(6,4,2,5,29,1,27,7,28,24)] nd.all <- data.frame(apply(nds, 2, stats)) names(nd.all) <- nom nd.all <- as.matrix(nd.all) os <- o.sub[,c(6,4,2,5,29,1,27,7,28,24)] o.all <- data.frame(apply(os, 2, stats)) names(o.all) <- nom o.all <- as.matrix(o.all) cs <- complete[,c(6,4,2,5,29,1,27,7,28,24)] c.all <- data.frame(apply(cs, 2, stats)) names(c.all) <- nom c.all <- as.matrix(c.all)

5

Table 1: Network summary statistics for the PCPP-enhanced gene set

n min max x ¯ median HL-loc. s IQR kurtosis skew

Degree 257.0 1.0 135.0 13.5 7.0 9.5 18.8 13.0 15.4 3.5

Close. 110.000 0.212 0.408 0.299 0.300 0.299 0.038 0.049 0.010 0.028

Between. 110.00000 0.00000 0.04557 0.00150 0.00046 0.00066 0.00466 0.00120 75.10266 8.14019

Clust coef. 110.000 0.000 1.000 0.148 0.095 0.108 0.198 0.177 8.077 2.604

Top. coef 110.000 0.000 0.800 0.201 0.162 0.174 0.156 0.160 2.695 1.553

Avg. Short. path 110.000 2.450 4.714 3.402 3.334 3.372 0.446 0.562 0.479 0.702

Radiality 110.0 0.6 0.8 0.7 0.7 0.7 0.0 0.1 0.5 -0.7

Eccentricity 110 5 8 6 6 6 1 1 1 1

Stress 110.0 0.0 1745700.0 64417.8 20528.0 26735.0 184545.8 49582.0 64.5 7.4

Connectivity 110.000 5.800 102.500 29.739 24.889 27.231 18.622 20.786 2.743 1.529

Stress 310.0 0.0 2903942.0 97396.5 19321.0 32446.0 263919.7 61565.5 50.0 6.1

Connectivity 310.000 1.000 212.000 35.841 28.106 30.714 30.477 25.450 11.446 2.888

Table 2: Network summary statistics for the non-PCPP-enhanced gene set

6

n min max x ¯ median HL-loc. s IQR kurtosis skew

Degree 744.0 1.0 248.0 15.1 6.0 8.5 26.5 12.0 23.5 4.3

Close. 310.000 0.180 1.000 0.312 0.307 0.307 0.080 0.049 52.001 6.149

Between. 310.00000 0.00000 0.06557 0.00220 0.00040 0.00074 0.00587 0.00141 51.55124 6.17455

Clust coef. 310.000 0.000 1.000 0.171 0.106 0.127 0.215 0.185 6.063 2.358

Top. coef 310.000 0.000 0.833 0.179 0.125 0.151 0.159 0.159 2.415 1.577

Avg. Short. path 310.000 1.000 5.548 3.323 3.261 3.274 0.564 0.516 4.246 0.518

Radiality 310.0 0.5 1.0 0.7 0.7 0.7 0.1 0.1 4.2 -0.5

Eccentricity 310 1 9 6 6 6 1 0 19 -2

Table 3: Network summary statistics for the portion of the complete neurological that was neither PCPP-enhanced nor PCPP-non-enhanced

n min max x ¯ median HL-loc. s IQR kurtosis skew

Degree 6211.0 1.0 244.0 8.4 4.0 5.5 13.7 7.0 57.2 6.1

Close. 1528.000 0.166 1.000 0.301 0.298 0.296 0.079 0.050 56.825 6.659

Between. 1528.00000 0.00000 0.06729 0.00104 0.00018 0.00038 0.00352 0.00081 184.56027 11.64161

Clust coef. 1528.000 0.000 1.000 0.203 0.124 0.154 0.255 0.286 3.027 1.850

Top. coef 1528.000 0.000 1.000 0.214 0.168 0.193 0.173 0.209 1.352 1.152

Avg. Short. path 1528.000 1.000 6.035 3.435 3.353 3.401 0.545 0.573 4.617 0.104

Radiality 1528.0 0.4 1.0 0.7 0.7 0.7 0.1 0.1 4.6 -0.1

Eccentricity 1528 1 9 6 6 6 1 1 18 -2

Stress 1528.0 0.0 2941778.0 45330.7 8301.0 16227.0 151337.9 32597.0 167.4 11.1

Connectivity 1528.000 1.000 212.000 37.914 30.000 32.806 31.994 30.350 8.130 2.376

Stress 1948.0 0.0 2941778.0 54694.1 9953.0 18405.0 176884.5 36448.5 116.0 9.3

Connectivity 1948.000 1.000 212.000 37.122 29.250 32.038 31.201 29.082 8.870 2.481

Table 4: Network summary statistics for the complete neurological gene set

7

n min max x ¯ median HL-loc. s IQR kurtosis skew

Degree 7212.0 1.0 248.0 9.3 5.0 6.0 15.9 8.0 51.1 5.9

Close. 1948.000 0.166 1.000 0.303 0.299 0.298 0.078 0.048 57.489 6.610

Between. 1948.00000 0.00000 0.06729 0.00125 0.00022 0.00044 0.00407 0.00093 124.75551 9.64041

Clust coef. 1948.000 0.000 1.000 0.195 0.111 0.146 0.246 0.267 3.553 1.953

Top. coef 1948.000 0.000 1.000 0.208 0.162 0.185 0.170 0.197 1.514 1.228

Avg. Short. path 1948.000 1.000 6.035 3.416 3.341 3.379 0.545 0.548 4.379 0.189

Radiality 1948.0 0.4 1.0 0.7 0.7 0.7 0.1 0.1 4.4 -0.2

Eccentricity 1948 1 9 6 6 6 1 1 18 -2

2.1

Degree centrality

While 57% of the PCPP-enhanced distribution had degrees greater then its median value, 7 (the proportion did not equal to 0.5 due to ties), this percentage was only 49% for the non PCPP-enhanced group, only 38% for the complete network, and only 36% for the ”other” group. Similary, 40% of the PCPP-enhanced distribution had degrees greater then its Hodges-Lehman location measure, this percentage was only 37% for the non PCPP-enhanced group, only 26% for the complete network, and only 25% for the ”other”’ network. The HL-location measure is used here because with many network indices the mean will be strongly affected by outliers and lack of symmmerty in data, while the median will be influenced by a prepondereance of zeroes, preventing comparisons. The HL-location measure is robust to dstributional assymetry, and is an unbiased estimator for effect size in the Wilcoxon rank sum test. In univariate summaries it is simply the median of all possible pairwise means. Distributional differences are qualitatively apparent in Fig. 2. We see that the distribution of degrees for the PPCP-enhanced group is much less platykutric and concave than the distribution for the both entire network or the PCPP-non enhanced group. med <- median(drug.sub$Degree) length(drug.sub$Degree[drug.sub$Degree >= med])/length(drug.sub$Degree) ## [1] 0.5642 length(ndrug.sub$Degree[ndrug.sub$Degree >= med])/length(ndrug.sub$Degree) ## [1] 0.4866 length(o.sub$Degree[o.sub$Degree >= med])/length(o.sub$Degree) ## [1] 0.3618 length(complete$Degree[complete$Degree >= med])/length(complete$Degree) ## [1] 0.3819 p.med <- HL.mean(drug.sub$Degree) length(drug.sub$Degree[drug.sub$Degree >= p.med])/length(drug.sub$Degree) ## [1] 0.4047 length(ndrug.sub$Degree[ndrug.sub$Degree >= p.med])/length(ndrug.sub$Degree) ## [1] 0.3683 length(o.sub$Degree[o.sub$Degree >= p.med])/length(o.sub$Degree) ## [1] 0.2463 length(complete$Degree[complete$Degree >= p.med])/length(complete$Degree) ## [1] 0.2646 mx hc ho hd

<<<<-

max(complete$Degree) hist(complete$Degree,0:mx, plot = FALSE); comp.cnt <- hc$counts hist(o.sub$Degree,0:mx, plot = FALSE); o.cnt <- ho$counts hist(drug.sub$Degree,0:mx, plot = FALSE); drug.cnt <- hd$counts

8

hnd <- hist(ndrug.sub$Degree,0:mx, plot = FALSE); ndrug.cnt <- hnd$counts

115 y values <= 0 omitted from logarithmic plot

Complete PCPP−non−enhanced PCPP−enhanced Other

50 1

5

10

Frequency

100

500

1000

## Warning:

1

2

5

10

20

50

100

200

Degree centrality Figure 2: Distributions of degree centrality for nodes in the complete neurological network, along with the PCPP-enriched and non-PCPP enriched subsets of the complete network.

In permutational analyses of the complete network using 100,000 permutations the PCPP-enhanced network had significantly higher median and pesudomedian (HL-location) degrees than the complete network 9

(P = 0.00001).

2.2

Closeness centrality

The PCPP-enhanced group had a similar closeness to the rest of the network, but a lower closeness centrality than the PCPP-non-enhanced group. While 50% of the PCPP distribution was greater than its median closeness centrality, 0.3, 59%, 49%, and 47% of the PCPP non-enhnanced, complete, and ”other” groups had values greater than this median, respectively. A graph of the distributions of closeness centrality is shown for the complete network and subsets of the network in Fig. 3. drug.subc <- na.omit(drug.sub$ClosenessCentrality) ndrug.subc <- na.omit(ndrug.sub$ClosenessCentrality) complete.c <- na.omit(complete$ClosenessCentrality) o.subc <- na.omit(o.sub$ClosenessCentrality) med <- median((drug.subc)) length(drug.subc[drug.subc >= med])/length(drug.subc) ## [1] 0.5 length(ndrug.subc[ndrug.subc >= med])/length(ndrug.subc) ## [1] 0.5968 length(complete.c[complete.c >= med])/length(complete.c) ## [1] 0.4913 length(o.subc[o.subc >= med])/length(o.subc) ## [1] 0.4692

mx <- max(complete.c) hc <- hist(complete.c,seq(0, mx,.01), plot = FALSE); comp.cnt <- hc$counts ho <- hist(o.subc,seq(0,mx, 0.01), plot = FALSE); o.cnt <- ho$counts hd <- hist(drug.subc,seq(0, mx, 0.01), plot = FALSE); drug.cnt <- hd$counts hnd <- hist(ndrug.subc,seq(0,mx, 0.01), plot = FALSE); ndrug.cnt <- hnd$counts

10

72 y values <= 0 omitted from logarithmic plot

Complete PCPP−non−enhanced PCPP−enhanced Other

20 10 1

2

5

Frequency

50

100

200

## Warning:

0.01

0.02

0.05

0.10

0.20

0.50

1.00

Closeness centrality Figure 3: Distributions of closeness centrality for nodes in the complete nuerological network, along with the PCPP-enriched and non-PCPP enriched subsets of the complete network.

2.3

Betweenness centrality

The PCPP-enhanced had a slightly higher betweenness centrality than the PCPP-on-enhanced group, and much high betweenness centrality than the complete and ”other” groups. While 50% of the PCPP distribution was greater than its median betweenness centrality, 0.0005, 48%, 37%, and 34% of the PCPP 11

non-enhnanced, complete, and ”other” groups had values greater than this median, respectively. A graph of the distributions of betweenness centrality is shown for the complete network and subsets of the network in Fig. 4. drug.subb <- na.omit(drug.sub$BetweennessCentrality) ndrug.subb <- na.omit(ndrug.sub$BetweennessCentrality) complete.b <- na.omit(complete$BetweennessCentrality) o.subb <- na.omit(o.sub$BetweennessCentrality) med <- median(drug.subb) length(drug.subb[drug.subb >= med])/length(drug.subb) ## [1] 0.5 length(ndrug.subb[ndrug.subb >= med])/length(ndrug.subb) ## [1] 0.4806 length(complete.b[complete.b >= med])/length(complete.b) ## [1] 0.3763 length(o.subb[o.subb >= med])/length(o.subb) ## [1] 0.3462

mx <- max(complete.b) hc <- hist(complete.b,seq(0, mx+.001,.001), plot = FALSE); comp.cnt <- hc$counts ho <- hist(o.subb,seq(0, mx+.001,.001), plot = FALSE); o.cnt <- ho$counts hd <- hist(drug.subb,seq(0, mx, 0.001), plot = FALSE); drug.cnt <- hd$counts hnd <- hist(ndrug.subb,seq(0,mx, 0.001), plot = FALSE); ndrug.cnt <- hnd$counts

12

## Warning:

35 y values <= 0 omitted from logarithmic plot

50 1

5

10

Frequency

100

500

1000

Complete PCPP−non−enhanced PCPP−enhanced Other

0.001

0.002

0.005

0.010

0.020

0.050

Betweenness centrality Figure 4: Distributions of betweenness centrality for nodes in the complete nuerological network, along with the PCPP-enriched and non-PCPP enriched subsets of the complete network.

2.4

Clustering coefficient

The PCPP-enhanced group and the non-PCPP-enhanced group had lower clustering coefficients than the complete network. While 50% of the PCPP distribution was greater than its median clustering coefficeint, 0.095, 55%, 56%, and 57% of the PCPP non-enhnanced, complete, and ”other” groups had values greater 13

than this median, respectively. Distributions of clustering coefficients are shown for the complete network and subsets of the network in Fig. 5. drug.subl <- na.omit(drug.sub$ClusteringCoefficient) ndrug.subl <- na.omit(ndrug.sub$ClusteringCoefficient) complete.l <- na.omit(complete$ClusteringCoefficient) o.subl <- na.omit(o.sub$ClusteringCoefficient) med <- median(drug.subl) length(drug.subl[drug.subl >= med])/length(drug.subl) ## [1] 0.5 length(ndrug.subl[ndrug.subl >= med])/length(ndrug.subl) ## [1] 0.5548 length(complete.l[complete.l >= med])/length(complete.l) ## [1] 0.5637 length(o.subl[o.subl >= med])/length(o.subl) ## [1] 0.57

mx <- max(complete.l) hc <- hist(complete.l,seq(0, mx+.001,.001), plot = FALSE); comp.cnt <- hc$counts hd <- hist(drug.subl,seq(0, mx, 0.001), plot = FALSE); drug.cnt <- hd$counts hnd <- hist(ndrug.subl,seq(0,mx, 0.001), plot = FALSE); ndrug.cnt <- hnd$counts ho<- hist(o.subl,seq(0,mx, 0.001), plot = FALSE); o.cnt <- ho$counts

14

710 y values <= 0 omitted from logarithmic plot

500

## Warning:

20 1

2

5

10

Frequency

50

100

200

Complete PCPP−non−enhanced PCPP−enhanced Other

0.0

0.2

0.4

0.6

0.8

1.0

Clustering Coefficient Figure 5: Distributions of clustering coefficeints for nodes in the complete nuerological network, along with the PCPP-enriched and non-PCPP enriched subsets of the complete network.

2.5

Topological coefficient

The PCPP-enhanced group had similar topological coefficeints to the rest of the network, but had much higher topological coefficeints than the PCPP-non-enhanced group. While 50% of the PCPP distribution was greater than its median closeness centrality, 0.16, 40%, 50%, and 52% of the PCPP non-enhnanced, 15

complete, and ”other” groups had values greater than this median, respectively. Distributions of topological coefficeints are shown for the complete network and subsets of the network in Fig. 6. drug.subt <- na.omit(drug.sub$TopologicalCoefficient) ndrug.subt <- na.omit(ndrug.sub$TopologicalCoefficient) complete.t <- na.omit(complete$TopologicalCoefficient) o.subt <- na.omit(o.sub$TopologicalCoefficient) med <- median(drug.subt) length(drug.subt[drug.subt >= med])/length(drug.subt) ## [1] 0.5 length(ndrug.subt[ndrug.subt >= med])/length(ndrug.subt) ## [1] 0.4 length(complete.t[complete.t >= med])/length(complete.t) ## [1] 0.501 length(o.subt[o.subt >= med])/length(o.subt) ## [1] 0.5216

mx <- max(complete.t) hc <- hist(complete.t,seq(0, mx+.001,.001), plot = FALSE); comp.cnt <- hc$counts hd <- hist(drug.subt,seq(0, mx, 0.001), plot = FALSE); drug.cnt <- hd$counts hnd <- hist(ndrug.subt,seq(0,mx, 0.001), plot = FALSE); ndrug.cnt <- hnd$counts ho <- hist(o.subt,seq(0,mx, 0.001), plot = FALSE); o.cnt <- ho$counts

16

540 y values <= 0 omitted from logarithmic plot

200

## Warning:

20 10 1

2

5

Frequency

50

100

Complete PCPP−non−enhanced PCPP−enhanced Other

0.001

0.005

0.050

0.500

Topological Coefficient Figure 6: Distributions of topological coefficeints for nodes in the complete nuerological network, along with the PCPP-enriched and non-PCPP enriched subsets of the complete network.

2.6

Average shortest path

The PCPP-enhanced group had similar average shortest path length to the rest of the network, but had much higher average shortest path length than the PCPP-non-enhanced group. While 50% of the PCPP distribution was greater than its median closeness centrality, 0.16, 40%, 51%, and 53% of the PCPP non17

enhnanced, complete, and ”other” groups had values greater than this median, respectively. Distributions of average shortest path are shown for the complete network and subsets of the network in Fig. 7. drug.subp <- na.omit(drug.sub$AverageShortestPathLength) ndrug.subp <- na.omit(ndrug.sub$AverageShortestPathLength) complete.p <- na.omit(complete$AverageShortestPathLength) o.subp <- na.omit(o.sub$AverageShortestPathLength) med <- median(drug.subp) length(drug.subp[drug.subp >= med])/length(drug.subp) ## [1] 0.5 length(ndrug.subp[ndrug.subp >= med])/length(ndrug.subp) ## [1] 0.4032 length(complete.p[complete.p >= med])/length(complete.p) ## [1] 0.5087 length(o.subp[o.subp >= med])/length(o.subp) ## [1] 0.5308

mx <- max(complete.p) hc <- hist(complete.p,seq(0, mx+.001,.001), plot = FALSE); comp.cnt <- hc$counts hd <- hist(drug.subp,seq(0, mx, 0.001), plot = FALSE); drug.cnt <- hd$counts hnd <- hist(ndrug.subp,seq(0,mx, 0.001), plot = FALSE); ndrug.cnt <- hnd$counts ho <- hist(o.subp,seq(0,mx+0.001, 0.001), plot = FALSE); o.cnt <- ho$counts

18

4966 y values <= 0 omitted from logarithmic plot

20

## Warning:

5 1

2

Frequency

10

Complete PCPP−non−enhanced PCPP−enhanced Other

0.001

0.005

0.050

0.500

5.000

Average shortest path Figure 7: Distributions of average shortest path for nodes in the complete nuerological network, along with the PCPP-enriched and non-PCPP enriched subsets of the complete network.

2.7

Radiality

The PCPP-enhanced group had similar radiality length to the rest of the network, but had much lower radiality than the PCPP-non-enhanced group. While 50% of the PCPP distribution was greater than its median closeness centrality, 0.74, 60%, 49%, and 47% of the PCPP non-enhnanced, complete, and ”other” 19

groups had values greater than this median, respectively. Distributions of radiality are shown for the complete network and subsets of the network in Fig. 8. drug.subr <- na.omit(drug.sub$Radiality) ndrug.subr <- na.omit(ndrug.sub$Radiality) complete.r <- na.omit(complete$Radiality) o.subr <- na.omit(o.sub$Radiality) med <- median(drug.subr) length(drug.subr[drug.subr >= med])/length(drug.subr) ## [1] 0.5 length(ndrug.subr[ndrug.subr >= med])/length(ndrug.subr) ## [1] 0.5968 length(complete.r[complete.r >= med])/length(complete.r) ## [1] 0.4923 length(o.subr[o.subr >= med])/length(o.subr) ## [1] 0.4705

mx <- max(complete.r) hc <- hist(complete.r,seq(0, mx+.001,.001), plot = FALSE); comp.cnt <- hc$counts hd <- hist(drug.subr,seq(0, mx, 0.001), plot = FALSE); drug.cnt <- hd$counts hnd <- hist(ndrug.subr,seq(0,mx, 0.001), plot = FALSE); ndrug.cnt <- hnd$counts ho <- hist(o.subr,seq(0,mx+0.001, 0.001), plot = FALSE); o.cnt <- ho$counts

20

739 y values <= 0 omitted from logarithmic plot

Complete PCPP−non−enhanced PCPP−enhanced Other

5 1

2

Frequency

10

20

## Warning:

0.0

0.2

0.4

0.6

0.8

1.0

Radiality Figure 8: Distributions of radiality for nodes in the complete nuerological network, along with the PCPPenriched and non-PCPP enriched subsets of the complete network.

2.8

Eccentricity

Distributions of eccentricity are shown for the complete network and subsets of the network in Fig. 9.

21

drug.sube <- na.omit(drug.sub$Eccentricity) ndrug.sube <- na.omit(ndrug.sub$Eccentricity) complete.e <- na.omit(complete$Eccentricity) o.sube <- na.omit(o.sub$Eccentricity) med <- median(drug.sube) length(drug.sube[drug.sube >= med])/length(drug.sube) ## [1] 0.9636 length(ndrug.sube[ndrug.sube >= med])/length(ndrug.sube) ## [1] 0.929 length(complete.e[complete.e >= med])/length(complete.e) ## [1] 0.9579 length(o.sube[o.sube >= med])/length(o.sube) ## [1] 0.9634

mx <- max(complete.e) hc <- hist(complete.e,seq(1, mx + 1, 1), plot = FALSE); comp.cnt <- hc$counts hd <- hist(drug.sube,seq(1, mx + 1, 1), plot = FALSE); drug.cnt <- hd$counts hnd <- hist(ndrug.sube,seq(1, mx + 1, 1), plot = FALSE); ndrug.cnt <- hnd$counts ho <- hist(o.sube,seq(1,mx + 1, 1), plot = FALSE); o.cnt <- ho$counts

22

3 y values <= 0 omitted from logarithmic plot

Complete PCPP−non−enhanced PCPP−enhanced Other

100 10

20

50

Frequency

200

500

1000

## Warning:

2

4

6

8

Eccentricity Figure 9: Distributions of eccentricity for nodes in the complete nuerological network, along with the PCPPenriched and non-PCPP enriched subsets of the complete network.

2.9

Stress

The PCPP-enhanced group had similar stress to the PCPP-non-enhanced, but both of these had much higher stress than the complete network. While 50% of the PCPP distribution was greater than its median stress, 20528, 49%, 37%, and 35% of the PCPP non-enhnanced, complete, and ”other” groups had values greater 23

than this median, respectively. Distributions of stress are shown for the complete network and subsets of the network in Fig. 10. drug.subs <- na.omit(drug.sub$Stress) ndrug.subs <- na.omit(ndrug.sub$Stress) complete.s <- na.omit(complete$Stress) o.subs <- na.omit(o.sub$Stress) med <- median(drug.subs) length(drug.subs[drug.subs >= med])/length(drug.subs) ## [1] 0.5 length(ndrug.subs[ndrug.subs >= med])/length(ndrug.subs) ## [1] 0.4903 length(complete.s[complete.s >= med])/length(complete.s) ## [1] 0.3783 length(o.subs[o.subs >= med])/length(o.subs) ## [1] 0.3469

mx <- max(complete.s) hc <- hist(complete.s,seq(0, mx, 1), plot = FALSE); comp.cnt <- hc$counts hd <- hist(drug.subs,seq(0, mx, 1), plot = FALSE); drug.cnt <- hd$counts hnd <- hist(ndrug.subs,seq(0, mx, 1), plot = FALSE); ndrug.cnt <- hnd$counts ho <- hist(o.subs,seq(0, mx, 1), plot = FALSE); o.cnt <- ho$counts

24

1 x value <= 0 omitted from logarithmic plot 2940189 y values <= 0 omitted from logarithmic plot

Complete PCPP−non−enhanced PCPP−enhanced Other

20 10 1

2

5

Frequency

50

100

200

## Warning: ## Warning:

1e+00

1e+02

1e+04

1e+06

Stress Figure 10: Distributions of stress for nodes in the complete nuerological network, along with the PCPPenriched and non-PCPP enriched subsets of the complete network.

2.10

Connectivity

The PCPP-enhanced group had had much lower connectivity than the PCPP-non-enhanced, complete and other groups than the complete network. While 50% of the PCPP distribution was greater than its median 25

stress, 24.9, 60%, 59%, and 59% of the PCPP non-enhnanced, complete, and ”other” groups had values greater than this median, respectively. Distributions of connectivity are shown for the complete network and subsets of the network in Fig. 11. drug.subn <- na.omit(drug.sub$NeighborhoodConnectivity) ndrug.subn <- na.omit(ndrug.sub$NeighborhoodConnectivity) complete.n <- na.omit(complete$NeighborhoodConnectivity) o.subn <- na.omit(o.sub$NeighborhoodConnectivity) med <- median(drug.subn) length(drug.subn[drug.subn >= med])/length(drug.subn) ## [1] 0.5 length(ndrug.subn[ndrug.subn >= med])/length(ndrug.subn) ## [1] 0.6 length(complete.n[complete.n >= med])/length(complete.n) ## [1] 0.5888 length(o.subn[o.subn >= med])/length(o.subn) ## [1] 0.5929

mx <- max(complete.n) hc <- hist(complete.n,seq(0, mx+.001,.001), plot = FALSE); comp.cnt <- hc$counts hd <- hist(drug.subn,seq(0, mx, 0.001), plot = FALSE); drug.cnt <- hd$counts hnd <- hist(ndrug.subn,seq(0,mx, 0.001), plot = FALSE); ndrug.cnt <- hnd$counts ho <- hist(o.subn,seq(0,mx+0.001, 0.001), plot = FALSE); o.cnt <- ho$counts

26

210779 y values <= 0 omitted from logarithmic plot

20

## Warning:

5 1

2

Frequency

10

Complete PCPP−non−enhanced PCPP−enhanced Other

0

50

100

150

200

Connectivity Figure 11: Distributions of connectivity for nodes in the complete nuerological network, along with the PCPP-enriched and non-PCPP enriched subsets of the complete network.

3

Multivariate comparisons of network indices

NP-MANOVA found significant differences among network structures although the model was poor.

27

library(vegan) allm <- rbind(drug.sub,ndrug.sub,o.sub) trt <- c(rep("PCPP-enhanced",257),rep("non-PCPP-enhanced",744),rep("other",6211)) all1 <- allm[,c(6,4,2,5,29,1,27,7,28,24)] comp <- complete.cases(all1) all2 <- all1[comp,] trt1 <- trt[comp] adonis(all2 ~ trt1) ## Call: ## adonis(formula = all2 ~ trt1) ## ## Terms added sequentially (first to last) ## ## Df SumsOfSqs MeanSqs F.Model R2 Pr(>F) ## trt1 2 3.50 1.75152 5.7539 0.00588 0.001 *** ## Residuals 1945 592.07 0.30441 0.99412 ## Total 1947 595.57 1.00000 ## --## Signif. codes: 0 *** 0.001 ** 0.01 * 0.05 . 0.1 1 No collective multivariate differences in network space existed between the PCPP-enhanced and nonPCPP-enhanced groups. all3 <- all2[trt1=="PCPP-enhanced"|trt1=="non-PCPP-enhanced",] trt3 <- trt1[trt1=="PCPP-enhanced"|trt1=="non-PCPP-enhanced"] adonis(all3 ~ trt3) ## Call: ## adonis(formula = all3 ~ trt3) ## ## Terms added sequentially (first to last) ## ## Df SumsOfSqs MeanSqs F.Model R2 Pr(>F) ## trt3 1 0.307 0.30668 1.0598 0.00253 0.368 ## Residuals 418 120.961 0.28938 0.99747 ## Total 419 121.267 1.00000 Significant differences existed between the PCPP-enhanced group and the ”other” group. all4 <- all2[trt1=="PCPP-enhanced"|trt1=="other",] trt4 <- trt1[trt1=="PCPP-enhanced"|trt1=="other"] adonis(all4 ~ trt4) ## Call: ## adonis(formula = all4 ~ trt4) ## ## Terms added sequentially (first to last) ## ## Df SumsOfSqs MeanSqs F.Model R2 Pr(>F) ## trt 1 1.27 1.26911 4.148 0.00253 0.006 ** ## Residuals 1636 500.54 0.30595 0.99747 ## Total 1637 501.81 1.00000

28

References Newman, M. Networks: An Introduction. Oxford University Press, Oxford, UK, 2010.

29

ASD-knitr.pdf

every other node in the network. Node 3 has an average shortest path of (1 + 1 + 1 + 1 + 1)/5 = 1,. nodes 4, 5, and 6 have an average shortest path of (1 + 2 + 2 + ...

340KB Sizes 7 Downloads 159 Views

Recommend Documents

No documents