The Structure of Package Dependency Network of a Modern Multiprogramming OS Jaderick P. Pabico and Mae Cel R. Ayson Institute of Computer Science College of Arts and Sciences University of the Philippines Los Banos ˜ College 4031, Laguna, Philippines 63-49-536-2313

[email protected] ABSTRACT A package dependency network in an operating system (OS) is a set of core software packages required by the OS for it to function properly and the packages function in a modular fashion wherein one package P1 requires the presence of another package P0 . Some packages Pi , i = {0, 1, . . . , n}, provide some modular functions to some packages Pj , j = {n+1, n+2, . . . , m}, in a giant dependency network. Empirical study on this aspect has been lacking despite the perceived need for it because understanding the structure of package dependency network is important in making models of the dynamics of an OS. In this paper, we present the result of our study of an OS-wide package dependency network comprising 1,712 vertices corresponding to core packages and 2,949 edges corresponding to dependency links between two packages of Fedora Gnu/Linux 10 by using recent statistical methods. We show in the empirical analysis scale-free degree distribution for both the in-degree and out-degree links.

Keywords package dependency network, Linux, multiprogramming operating system

1.

INTRODUCTION

The scientific community has recently witnessed considerable development of statistical methods for quantifying large networks, such as those of biological networks [8], social networks [15, 14, 1], and information networks [2]. These developments enable us to quantify statistical features for the purpose of describing the topological structure of large networks whose respective sizes are not amenable even to large-scale information and data visualization techniques. Further, these also give us a tool for quantifying the structure of the package dependency network that can be used for designing better and secured information systems, such

Contributed scientific paper to the 2009 Philippine Computing Science Congress, Silliman University, Dumaguete City, 2–3 March 2009.

as the OS. Package dependency network in a modern, modular, multiprogramming operating system (OS), such as that of the many distributions of the Gnu/Linux OS, refers to the line of packages in which a package Pn requires some modular functions from upstream packages P0 , P1 , . . . , Pn−1 and at the same time provides some modular functions to downstream packages Pn+1 , Pn+2 , . . . , Pm . For each package Pi in an OS, we can easily extract this dependency line such that for all packages, the combination of these lines forms a giant dependency network. The structure of such dependency network gives us insights on which packages need focus to better design and secure the OS. Of all the studies done on artificial networks, only two have been done on the analysis of package dependencies of a software system [9, 5]. Both studies performed network analysis on the inter-dependencies among packages of different releases of two Open Source OS. In the work of De Souza, et al. [5], a network was built comprising 18,000 nodes corresponding to packages of three current releases of Debian Gnu/Linux [18]. They found out that the network exhibits a scale-free nature and the betweeness centrality suggests that that releases are stable from random bugs brought about by errors emanating from packages. They also found out that the existence of dependencies among packages is due to the same teams of developers working on the same packages. La Belle and Wallingford [9], on the other hand, created two networks out of the Debian Gnu/Linux and FreeBSD [20] package repositories. Their results also suggest that the inter-package dependencies of the two systems are scale-free and robust to package failure brought about by software bugs. In this effort, we created the package dependency network of an essential installation of the recently released Fedora Gnu/Linux 10 [6]. The difference with our work from the two studies mentioned above is that we used the essential packages from the stable release of the distribution, not just the packages from a repository, such as those used by La Belle and Wallingford [9], or from three different releases, such as those performed by De Souze, et al. [5]. Essential packages means that these are the core packages required by the developers for the users to install so that the system will run on its most efficient performance. A stable release,

on the other hand, would mean the set of packages that the developers released which have undergone rigorous testing and debugging. Packages that were extracted from repositories might not be essential nor stable. In this paper, we present the methodology we used to extract the package dependency network of Fedora Gnu/Linux 10. We will also show the following network metrics: degree distribution, in-degree distribution, out-degree distribution, network betweeness centralization, number of unreachable pairs, network diameter, and most distant vertices. We will also discuss the implications of these metrics with respect to OS security.

2.

CONSTRUCTING AND MEASURING THE DEPENDENCY NETWORK 2.1 Creating the Package Dependency Network The packages installed in Fedora Gnu/Linux are automatically managed by a package manager called RPM Package Manager, or simply RPM [17]. Started and widely used originally by RedHat Gnu/Linux distribution, and recently by RedHat Enterprise Linux (RHEL) [16], RPM was originally named RedHat Package Manager. However, because of its wide use as a primary package manager of most Gnu/Linux distributions that were spin-offs of RedHat, such as Mandriva [10], CentOS [4], openSUSE [13], Yellow Dog [19], and many others, RPM was left-recursively renamed, hence RPM stands for RPM Package Manager. In Fedora Gnu/Linux releases, RPM is coupled with a package called Yellow Dog Updater, Modified (YUM), which automatically updates packages and resolves package dependencies [22]. In this study, we used the combination of RPM and YUM in a Perl [23] script to infer the package dependency network of the version 10 release of Fedora with only the core packages installed. Using RPM, we listed all the essential packages installed in the latest release of Fedora Gnu/Linux. From this list, we wrote a Perl script that extracted the dependency requirements of each package using RPM. We save the dependency requirements in a |V |×|V | adjacency matrix M , wherein each matrix element mi,j = 1 if the ith package vi is required by the jth package vj , denoted by vi ← vj . If vj does not directly depend on vi , then mi,j = 0. The quantity |V | is the sum of all packages vi in the network.

2.2 The Dependency Network as a Graph generally, a network is an unweighted graph G = (V, E) composed of a set of vertices V and a set of edges E. The vertex vi ∈ V is the ith entity in a system while each edge ei,j ∈ E is a representation of an interaction between the vertices vi and vj , i 6= j. A vertex maybe a social actor in a social network, an economic agent in a production network, a molecule in a biochemical network, or a computer program in a software network. An edge maybe a friendship relation between two social actors in a social network, a producer-consumer relation between economic agents in a production network, a molecular bonding between molecules in a biochemical network, or a package dependency between computer programs in a software network. In a software network, if software functions are represented by vertices,

edges can be connected between these functions to represent some meaningful interaction between these functions, such as object inheritance in object oriented languages, or simply procedure calls in procedural languages. In our study, the vertex vi represents the ith package while the edge ei,j represents the dependency relation between vi and vj . With respect to M , ei,j means mi,j = 1 which specificially means that vi ← vj .

2.3 Vertex Degree The degree of a vertex v, denoted as v d , is the number of vertices adjacent to v. In the case when G is directed, two statistics can be derived, the in-degree of v, denoted as v in , and the out-degree of v, denoted as v out . The metric v in is defined as the number of incoming edges to v, while v out is the number of outgoing edges from v. In the point of view of package dependency, the v in of a package v is the number of required packages by v, while v out is the number of packages that require v. When G is undirected, it is easy to note that v d = v in + v out . The number of packages viin required by the ith package is given by Equation 1 while the number of packages viout that the ith package provides for other packages is given by Equation 2. Note that viin and viout are the in-degree and out-degree counts, respectively, of vi . viin

=

|V | X

mk,i

(1)

mi,k

(2)

k=1

viout

=

|V | X k=1

2.4 Degree Distribution and Scale-Free Networks According to various empirical results [11, 12, 7], the distribution of edges in real-world networks, such as those mentioned above, roughly follows a power law: P (k) ∼ k−α . This means that the probability P (k) of a vertex having k edges decays with respect to some constant α ∈ ℜ+ . Usually, α ∈ [2, 3] which suggests that G is scale-free [3]. The power law feature of a real network is significant because it shows deviation from any randomly constructed graph GR . A GR was proven to take a Poisson distribution as the cardinality of V is increased indefinitely (i.e., |V | ← ∞) [24]. The power law of a G implies that many vertices will not be highly connected (i.e., low v d ), while a few vertices will be highly connected (i.e., high v d ). The highly connected vertices will act as the network hubs.

2.5 Average Path Length, Clustering, and Small-World Real-world networks have two specific length-specific metrics, both are not found together in GR [24]: 1. a low characteristic average path length L; and 2. a high degree of clustering C. When these two characteristic metrics are found together in a G, the G is known as small world, denoted as GSW , popularly known in social sciences as the Six Degrees of

Table 1: minimum, average, maximum and standard deviation of in-degree, out-degree and degree counts of Gdep . In-Degree Out-Degree Degree Minimum 0 0 0 Average 1.72 1.72 3.45 Maximum 90 58 90 Standard Deviation 4.74 3.15 5.78

Separation [21]. Let L denote the average shortest path length between any two vertices vi ∈ G and vj ∈ G and C denote the average clustering which measures the tendency of vertices to form local cliques. The C is much higher in GSW than in GR [24]. This means that if a vertex vj is adjacent to vertices vi and vk , i 6= j 6= k, then vi and vk are more likely to be adjacent to each other in GSW than when they are in GR . The C measures this probability, which literally is the likelihood of the neighbors of a vertex to be also neighbors among themselves. A GSW exists if LR ≈ LSW and CR << CSW , where CR and LR denote the clustering and average geodesic path length, respectively, of a given GR = (VR , ER ), CSW and LSW denote the clustering and average geodesic path length, respectively, of a given GSW = (VSW , ESW ). The graphs GR and GSW must be equivalently sized such that |VR | = |VSW | and |ER | = |ESW |. A GR with a characteristic LR and CR can be easily derived given |VR | and |ER | because the closed form has already been widely known [24].

2.6 Connected Component A connected component is a subgraph of G such that for all pairs of vertices vi ∈ G and vj ∈ G, there exists a path from vi to vj . The size of the connected component is the number of vertices such that there exists an edge traversal path between each vertex in the set. 90% of the vertices in real-world networks were empirically found to be in the connected component.

3.

RESULTS AND DISCUSSION

There are 2,221 packages included in the Fedora 10 DVD release for the i386 family of processors totalling about 3.1GB. However, running the OS with the bare bone essential packages requires installation of only 1,712 packages. The remaining 509 packages are considered optional. Upon creating the package dependency network Gdep from M , we found out that |V | = 1, 712 and |E| = 2, 949. Figure 1 shows the visualization of Gdep . Table 1 shows the summary of the minimum, average and maximum in-degree, out-degree and degree counts of Gdep . The in-degree count of each vertex vi ∈ Gdep ranges from 0 to 90 with an average count of 1.72. This means that on the average, 1.72 packages is required by any given package and that some packages require no package while some is dependent to as high as 90 packages. The out-degree count of each vertex vi ∈ Gdep ranges from 0 to 58 with an average of 1.72. This means that on the average, a package provides to about 1.72 packages and that some packages do not provide to any while some provides to as high as 58 packages. Figure 2 shows the distribution of in-degree, out-degree,

and degree of Gdep , all plotted in a log-log scale. All distributions follow a power law with P (kin ), P (kout ), and P (k) means the in-degree, out-degree, and degree distributions, respectively (Equations 3 through 5). These powerlaw equations where determined using a linear regression fit on the respective log-log transformation of the data.

P (kin ) P (kout ) P (k)

−1.52 ∼ 245.5 × kin −1.91 ∼ 616.6 × kout −1.73 ∼ 812.8 × k

(3) (4) (5)

The power-law nature of the in-degree, out-degree and degree distributions of Gdep suggest that the package dependency exhibits a scale-free network. This means that many packages are not highly important while only a few are highly important packages. These highly important packages act as hubs of the network. Figure 1 visually confirms these statistics on Gdep . It can be seen from the figure that only a few packages have relatively bigger vertex sizes than most of the vertices. These vertices have many edges going to and coming from it, making them vertices appear as hubs in the visualization. The hubs in the network provides us an information as to which packages are more important than the other. In the point of view of resource utilization, developers must design these packages so as not to spend CPU time and memory or disk spaces, and eventually improve the packages’ run-time efficiency. In most multiprogramming OS, disk spaces act as a virtual memory or memory swap areas during context switching. In the point of view of security, these packages must be carefully designed so as not to be the source of failure of the system, such as being the source of bugs, the target of malwares, or poisoning attacks. Due to the relative importance of these packages on the OS, when these packages become a source of failure, the OS will easily become vulnerable to various losses. Thus, the identification of these hubs is important for securing and improving the OS. Other metrics found in our analysis are as follows: 1. The network betweeness centrality value was found to be 0.00085; 2. The number of unreachable pairs is 2,911,140; 3. The average distance among reachable pairs is 3.4; 4. The network diameter is 13. Due to space limitation, we will discuss the implications of these metrics in the complete version of this paper.

4. CONCLUSION This study presents the structure of the package dependency network of Fedora Gnu/Linux 10. Fedora is a modern, multiprogramming OS that depends on the functions of various software packages. These software packages were modularly designed such that some general functions needed by a package vj , which were already defined by another package

Figure 1: (Colored in digital copies) The package dependency network of Fedora Gnu/Linux 10. Circles represent packages while arrows represent dependency relations. The area of a particular vertex vi is a relative visualization for viout .

Figure 2: The distribution of (a) in-degree, (b) out-degree, and (c) degree of Gdep in log-log scale. Lines on each plot is the power-law fit using regression analysis.

vi , need not be defined by vj within itself. Instead, vj ’s installation in the system must depend on the availability of vi . This relationship between vi and vj is what comprises the package dependency network. In this scenario, we say that vj depends on vi , while vi provides for vj . Several network measures were used in this study to identify the structure of the dependency network. The in-degree, out-degree and degree distributions suggest that the network is scalefree. This information is significant to package designers and developers because identifying which packages are the most important in the system allows them to target the packages for runtime efficiency improvement and security purposes.

[13] OpenSUSE Project. openSUSE, 1994. http://www.opensuse,org.

5.

[16] Red Hat, Inc. RHEL: Red Hat Enterprise Linux, 1995,. http://www.redhat.com/.

ACKNOWLEDGMENTS

The authors thank the Institute of Computer Science and the College of Arts and Sciences, University of the Philippines Los Ba˜ nos for its financial support of this work through CAS-TF #8217300 and ICS-GF #2326103, respectively.

6.

REFERENCES

[1] Camille Chezka P. Arevalo and Jaderick P. Pabico. Automatic characterization of a Friendster network using a data mining WebBot. In Proceedings (CDROM) of the 4th Network of CALABARZON Educational Institutions, Inc. (NOCEI) Research Forum, Batangas State University, Batangas City, 2008. [2] Albert-Laszlo Barabasi and Reka Albert. Emergence of scaling in random networks. Science, 286(5439):509–512, 1999. [3] Albert-Laszlo Barabasi and Eric Bonabeau. Scale-free networks. Scientific American, 288:50–59, 2003. [4] Lance Davis. CentOS, 2003. http://www.centos.org. [5] O. Felicio de Souza, M.A. de Menezes, and T.J.P. Penna. Analysis of the package dependency on Debian Gnu/Linux, 2008. www.complex.if.uff.br. [6] Fedora Project. Fedora Gnu/Linux, 2003. http://fedoraproject.org. [7] Yoshi Fujiwara and Hideaki Aoyama. Large-scale structure of a nation-wide production network, 2008. http://arxiv.org/abs/0806.4280. [8] Arend Hintze and Christoph Adami. Evolution of complex modular biological networks. PLoS Computational Biology, 2(4):e23, 2008. [9] Nathan LaBelle and Eugene Wallingford. Inter-package dependency networks in open-source software. In Proceedings of the International Conference on Complex Systems (ICCS2006), 2006. [10] Mandriva, S.A. Mandriva Linux, 1998. http://www.mandriva.com. [11] M.E.J. Newman. Scientific collaboration networks. I. network construction and fundamental results. Physical Review E, 64(1):016131 (8pp), 2001. [12] M.E.J. Newman. Scientific collaboration networks. II. shortest paths, weighted networks, and centrality. Physical Review E, 64(1):016132 (7pp), 2001.

[14] Jaderick P. Pabico. Inferences in a virtual community: Demography, user preferences and network topology. Philippine Information Technology Journal, 1(2):2–8, 2008. [15] Jaderick P. Pabico and Camille Chezka P. Arevalo. Patterns of internet-based friendship among residents of Los Ba˜ nos Laguna: The Friendster case. Transactions of the National Academy of Science and Technology of the Philippines, 30(1):220, 2008.

[17] rpm5.org. RPM Package Manager, 2008. http://rpm5.org/. [18] Software in the Public Interest, Inc. Debian Gnu/Linux, 1993. http://www.debian.org. [19] Terra Soft Solutions. YDL: Yellow Dog Linux, 1999. http://www.yellowdoglinux.com/. [20] The FreeBSD Foundation. The FreeBSD Project, 2008. http://www.freebsd.org. [21] Jeffrey Travers and Stanley Milgram. An experimental study of the small world problem. Sociometry, 32(4):425–443, 1969. [22] Seth Vidal. YUM: Yellow Dog Updater, Modified, 2009. http://yum.baseurl.org/. [23] Larry Wall. Perl, 1987,. http://www.perl.org/. [24] Duncan Watts and Steven Strogatz. Collective dynamics of ’small world’ networks. Nature, 393(6684):440–442, 1998.

The Structure of Package Dependency Network of a ...

networks, such as those of biological networks [8], social networks [15, 14, 1], ... the package dependency network of Fedora Gnu/Linux 10. We will also show ...

116KB Sizes 0 Downloads 304 Views

Recommend Documents

structure and performance of a dependency language ...
marker functions as an anchor for every parse, and .... The novel element of our model is the link bigram con- ..... princeton.edu/~ristad/papers/memt.html.

Evolving network structure of academic institutions - Applied Network ...
texts such as groups of friends in social networks and similar species in food webs (Girvan .... munity membership at least once through the ten years studied.

Evolving network structure of academic institutions - Applied Network ...
a temporal multiplex network describing the interactions between different .... texts such as groups of friends in social networks and similar species in food webs ( ...

Evolving network structure of academic institutions - Applied Network ...
differ from the typical science vs humanities separation that one might expect – instead ... Next, for each graduating year we identify all students that earned a degree ..... centrality of chemistry, computer science, engineering, mathematics, and

The Network Structure of International Trade - American Economic ...
The Network Structure of International Trade†. By Thomas Chaney *. Motivated by empirical evidence I uncover on the dynamics of. French firms' exports, I offer a novel theory of trade frictions. Firms export only into markets where they have a cont

The Study of Neural Network Adaptive Variable Structure ... - CiteSeerX
The global Internet, wireless communication systems, ad-hoc networks or ... from the free flow to the high congestion state indicated in terms of number of packet ...

The Study of Neural Network Adaptive Variable Structure ... - CiteSeerX
the non-periodic cases apply especially to wireless networks. We consider network ..... of saturation are observed) and hence this advantage of the triangular ...

The Electronic Structure Package for Quantum ... - Research at Google
Feb 27, 2018 - OpenFermion is an open-source software library written largely in Python under an Apache 2.0 license, aimed at enabling the simulation of fermionic models and quantum chemistry problems on quantum hardware. Beginning with an interface

Impact of Supply Chain Network Structure on FDI - Semantic Scholar
equilibrium, with the threshold strategy of each player characterized by its ...... and MNCs' co-location decisions,” Strategic Management Journal 26, 595-615.

Coevolution of behaviour and social network structure ...
Assortment, co-evolution, cooperation, dynamic network, game theory, prisoner's dilemma, ...... As a service to our authors and readers, this journal provides.

The structure of Inter-Urban traffic: A weighted network ...
Jul 18, 2005 - environmental planning and provides analytical tools for a wide spectrum of applications ranging from impact evaluation to decision-making ...

Impact of Supply Chain Network Structure on FDI - Semantic Scholar
companies wanting to establish affiliates in a foreign market, finding good procurement channels for materials and sales channels for products is a big issue.

UHC HMO PACKAGE A, NETWORK 1.pdf
... Care4 No charge. Physician Care No charge. Page 1 of 6 ... UHC HMO PACKAGE A, NETWORK 1.pdf. UHC HMO PACKAGE A, NETWORK 1.pdf. Open.

ORGANIZATIONAL STRUCTURE OF THE CMU JOURNAL OF ...
ORGANIZATIONAL STRUCTURE OF THE CMU JOURNAL OF SCIENCE.pdf. ORGANIZATIONAL STRUCTURE OF THE CMU JOURNAL OF SCIENCE.pdf.

A Network of Rails - GitHub
network of open source projects centered around Ruby on Rails. This dataset provides ... reasons, were often hosted on large source code hosting sites, the most dominant of ... GitHub also added two major new “social” features: the ability to sta

The-Structure-Of-Magic-A-Book-About-Language-And-Therapy.pdf
Page 1 of 3. Download ]]]]]>>>>>(-PDF-) The Structure Of Magic: A Book About Language And Therapy. (-EPub-) The Structure Of Magic: A Book About ...

Social Structure and Development: A Legacy of the ...
The second is that the administration of non%Russian parts of the Soviet Union ... Over 67% of the Jews living in Russia held white collar jobs, while only about 15% ..... population than the repressive system in the neighboring German%held ...

A Framework for Developing the Structure of Public Health Economic ...
placed on these approaches for health care decision making [4], methods for the .... the methods described in the articles were identified using a data extraction ...

Synthesis and structure of salts of a sterically shielded ... - Arkivoc
Multi-gram amounts of halogen-free lipophilic aluminate salts have been ..... transformation reactions.38-43 The synthesis of IPrAu(SMe)2 almebate (8) has ...