A Case Study on Software Branching: Mozilla and Firefox

Viewer
Transcript

A Case Study on Software Branching: Mozilla and Firefox Jun Chen and Jialin Song {j2chen,j8song}@cs.uwaterloo.ca

CS846 Course Project Report Prof. Michael W. Godfrey

School of Computer Science University of Waterloo

Abstract Many studies have been performed on software evolution on individual software products. To our best knowledge, there is few documented study on the co-evolution of branched software products. Software branching is commonly seen when adapting one software product to different environments or targeting customers in software industry. The kinship between branched software products results in their evolution patterns to be related. In this paper, we examine the branched software co-evolution based on a case study on two well-known branched web browsers, Mozilla and Firefox. Since Mozilla and Firefox are successful and mature, we are expecting the result to be generally applicable and representative. In particular, we are interested in the evolution of shared code and the relationship on the revolution patterns between these two products. We present a systematic analysis methodology for software branching study.

1

Introduction

During software evolution, branching may happen. We define software branching as a software evolution phenomenon in which one software product is split into multiple parallel developing products. We also call this kind of branched software products as sibling software products. Software branching may be a result of implementing different design goals, satisfying conflicting customer requirements, adapting existing product to various environments, or experimenting new ideas. In many cases, it is impractical to implement everything mentioned above in one software product due to the software complexity considerations. Here, we refer to software complexity as the increasing in source code size and the addition of conditional switches. In the extreme situation, one software product may be branched into two or more parallel developing software products. Although in general, branched software products are built initially based on the same group of source code and normally share some source code throughout their lifetime, the difference between two software products may change over time. Software branching is closely related to source code file branching, which is provided in many version control software such as Microsoft Visual Sourcesafe [6]and CVS (Concurrent Versions System) [12]. During the development stage of branched software products, file branching may be an easy way to implement new designs or concepts only on one or some of the sibling products. File branching can effectively avoid unnecessary increasing of software complexity in other sibling products. File branching results in the divergence of shared source code between branched products. However, from maintenance cost point of view, file branching may be somehow undesirable since it increases the amount of source code needed to maintain in parallel products. The maintenance cost may be doubled lately while propagating common updates to branched files resides in multiple sibling products. Therefore, the maintenance cost consideration may force branched software products to converge to share as much code in common as possible. The software complexity and maintenance cost considerations serve as different driving forces behind branched software evolution and may impact the software development and maintainability differently. In this paper, we are going to investigate the software branching and branched software 1

evolution based on the case studies of Firefox [10] and Mozilla [11], two well known branched web browsers. Since Moizlla and Firefox are successful, well managed open source software, we are hoping our study to be generally applicable and representative. We are interested in the evolution patterns of the shared code and branched code between two products and the relationship in their evolution patterns. This paper is organized as follow. First, we will give a brief background introduction on Mozilla-Firefox branch and some related works on the software branching analysis in Section 2. Then, the methodology for the branching analysis will be introduced in Section 3. After that, we will present the results of our case study on branching analysis on Mozilla and Firefox in Section 4. We make some discussions in Section 5. Finally, we talk about our conclusions and possible future works in Section 6.

2

Background and Related Works

Mozilla is a free Internet software suite that contains a web browser, an email client, an HTML editor and an IRC client. The name Mozilla has been an internal name for Netscape Navigator. Mozilla as an independent product was developed partially based on the Netscape Communicator code by the Mozilla foundation after Netscape Navigator lost its war with Microsoft Internet Explorer. Mozilla intends to provide a free cross-platform web browsing tool. The first version of Mozilla, 1.0 was released in June 2002, which was praised for providing several new features that IE did not have. The most current version 1.8 alpha was released in January 2005. Firefox was split from Mozilla in the year 2002. It is intended to be an experimental product, and a light weight alternative to the heavy Mozilla suite. At the beginning, it is called Phoenix project. Later, the name was changed to Firebird from version 0.6 [8], and then Firefox since version 0.8 [9] in the year of 2004. For the sake of the simplicity, we will refer to the evolution branch from Phoenix to Firefox as Firefox branch. Until Firefox version 0.8, over 90,000 lines of code has been changed since the time of splitting [9]. However, Firefox still shares certain internal components such as Gecko (html rendering engine) with 2

the Mozilla browser [10]. Both the similarities and differences between Mozilla and Firefox have made them attractive candidates for our study on software branching. There are many works done on understanding the software evolution. Paper [3] reported an evolution analysis on the open source Linux kernel. Paper [7] present different patterns of evolution for open source software products. Paper [1] investigates the differences in the continuous and discontinuous evolutions in the software products. Paper [1] defines the discontinuous evolution as a kind of evolutions which produces a new family of products from an original source. We believe the branching of a software product fits this criteria quite well. However, paper [1] and [2] concentrated in investigating how the core architecture of a product is preserved through the discontinuous evolution. In our research, we are more interesting to understand how the code are shared among branches instead.

3

Analysis Methodology

The analysis consists of two major components: the facts of interest and the methodology to find these facts. We refer to these features as the configuration of an analysis. In this section, we will present the configuration of our analysis on software branching.

3.1

The Facts of Interest

First of all, we like to identify the type of branching, one-sided branching or full branching. one-sided branching refers to the scenario that after branching, one of the branches slows down or even stops evolving while another branch evolves actively and continually. In such scenario, a new product based on the old product emerges. There are legitimate reasons for doing this. Developer probably wants to re-implement a system with a newly emerged technology, which may deliver better performance or other benefits. For example, developer may want to re-implement a perl based online application using J2EE and .Net to improve its performance. In this case, we may observe the old branch stops evolving and enters maintenance stage, while another branch of the software develops on its own. The one-sided

3

branching is very similar to discontinuous evolution discussed in [1]. full branching refers to the scenario that two branches co-evolve in parallel after branching. For instance, an Operation System designed for enterprise users may be derived from its desktop implementation for private users. As certain subsystems are changed to fit some specific needs, some code maybe commonly shared and maintained by both versions. Both systems evolve in parallel. We believe that the main difference between the one-sided branching and full branching is their motivation. One-sided branching aims at improving the performance of existing feature, while full branching aims more at refitting software products for different requirements. In case of full branching, the evolution of the shared code is an interesting aspect of investigation. We believe the amount of shared code is a good indicator of branching evolution. We refer to the evolution distance as a measurement of how closely two split software products relates to each other. The higher the percentage of shared code in both systems, the closer related those two systems are. We believe the software products might experience both widening and shortening of evolution distance as they evolve. For instance, two software products may be first branched partially from a set of shared code. Each branch specializes on customizing the original product to fit the different requirements. Therefore, many branch specific changes are made to the branched code. These modifications reduce the percentage of the shared code in both systems. We refer to this behavior as the diverge of two branched systems. As the branch evolves, the developers might discover certain portions of branched code may be either replaced with some code from one branch or some of the branched code can be merged together. The merging of branched code increase the size of shared code. We refer to such behavior as the converge of the branched software. We speculate the diverging and converging might happen interleavingly in the evolution of two branches. Moreover, we like to see what the dominant forces behind these diverging and converging are. We will pay special attention to analyze how the concerns on the maintenance and complexity influence the evolution.

4

Another fact about the shared code we like to know is how stable is these shared code. The shared code is stable if there are few changes made to these code since the day they are first shared. In other words, the stable shared code does not prone to changes [5]. This information is important to the software maintainers because they might choose to package a piece of stable and shared code into a library to improve reusability and maintenance of the source code. We believe a set of frequently change shared code is more likely to experience high number of diverges and converges. Therefore, for an analysis on branched the software products, we have identified several important facts of interest. First of all, we like to identify the type of branch. Then, we will investigate how the branched software products had evolved using the shared code as an indicator. We are most interested in finding out how the shared and unshared changed over time, and what the main driving force behind these changes are.

3.2

The Analysis Methodology

In this section, we discuss our analysis methodology in detail. We will present the types of data collected at the analysis stage and algorithms used to process the data. Moreover, we will present a implementation of our analysis. 3.2.1

Forming Release Pairs

The analysis starts with forming the sibling release pairs. A sibling release pair is a pair of released products, in which two products come from different branches but released at very close period of time. If the sibling release pairs can be formed for most of major versions in both branches, it is a strong evidence suggesting full branching. Otherwise, if there are a lot of releases in one branch can’t find corresponding releases in other branch since certain date in the evolution, it is very likely that this branching is half branching. The further analysis on the shared code is carried out horizontally and vertically on the products in the release pairs.

5

3.2.2

Horizontal Analysis

The horizontal analysis aims to find the shared code in each sibling release pair. Shared code can be identified by comparing sibling releases of two products. In this work, this is done in two granularities, at file level and at subsystem level. At the file level, two files (same name and relative path) in sibling releases are considered to be shared if they are identical. At the subsystem level, two subsystems are considered to be shared or not basing on their similarity in terms of ULOC (uncommented line of code) and/or NOF (number of files). The information about each file, e.g. total LOC, ULOC, and the information about each subsystem such as NOF, total ULOC are collected and stored in database for further analysis. In the following paragraph, the algorithm used to identify the subsystem level shared code is introduced, see Figure 1. Every subsystems in one version of a sibling pair release are checked against its sibling version. If they are determined to be shared, they will be added into the shared subsystem list of that sibling release. The main component of this algorithm is how to determine whether two subsystems are shared. We have proposed two sharing criteria, as shown in Figure 2. In the coarse grained subsystem level analysis, two measurements in different granularities are used. Function (“matchingCriteria ULOC” in Figure 2) measures subsystem similarity by comparing the summed ULOC of all files in each sibling version of the subsystem. Two sibling releases of the subsystem are identified to be shared if they have roughly the same ULOC (Uncommented Line of Code) and roughly same NOF (Number of Files). Function (“matchingCriteria NOF” in Figure 2) measures sibling subsystem similarity by the percentage of identical files shared between sibling releases. Two sibling subsystem releases are considered to be shared if the number of the identical files shared between them exceed 95%. For simplicity, we defined shared sibling files as identical if they have the same name and exactly the same ULOC.

6

Procedure SubSystemAnalyzer begin Pairing up products from different branches. for each pair P begin for each pair of {(sub1, sub2)| sub1 belongs to branch_1 and sub2 belongs to branch_2 and sub1.name=sub2.name } begin if (matchingCriteria(sub1, sub2)) matchedsubsystem.add(new MatchedSubs(sub1, sub2)); end P.shared=matchedsubsystem; end end

Figure 1: Algorithm for Finding Shared Code 3.2.3

Vertical Analysis

The vertical analysis stage is to identify the evolution patterns of branched software products based on the file and subsystem information achieved from horizontal analysis. At subsystem level, we study the history of sharing status and based on the historical information, we categorize each subsystem into the one of the seven categories as follows. Unique Subsystems: the subsystems that only exist in one branch, which imply they are unique to the branch they belonged. Split Subsystems: the subsystems that are originally shared by the both branches, but split later. Union Subsystems: the subsystems that are originally branched but later merged together. Disappeared Subsystems: the subsystems that are initially shared but later disappeared from both branches.

7

#for coarse grained analysis Boolean Proc matchingCriteria_ULOC(v1_sub1, v2_sub1) begin double sizediff=0.5; double nfilediff=0.5; if ((v1_sub1.loc-v2_sub1.loc)/max(v1_sub1.loc, v2_sub1.loc) > sizediff AND (v1_sub1.nfile-v2_sub1.nfile)/max(v1_sub1.nfile, v2_sub1.nfile) > nfilediff) return false; else return true; end #for finer grained analysis Boolean Proc matchingCriteria_NOF(v1_sub1, v2_sub1) begin threshold = 0.05; v1_sub1.unmatched = v1_sub1.numFiles; v2_sub1.unmatched = v2_sub1.numFiles; for (every file v1_f1 in v1_sub1) begin for (every file v2_f1 in v2_sub1) begin if (v1_f1.name == v2_f1.name) begin if (v1_f1.loc == v2_f1.loc) begin v1_sub1.matched ++; v2_sub1.matched ++; v1_sub1.unmatched --; v2_sub1.unmatched --; end end end end if (v1_sub1.unmatched / v1_sub1.numFiles > threshold || v2_sub1.unmatched / v2_sub1.numFiles > threshold) return false; return true; end

Figure 2: Algorithms for Evaluating8 the Subsystems for Sharing Status

Shared Code Base Subsystems: the subsystems that are constantly shared between two branches during the whole evolution period of time. Interleaving Subsystems: the subsystems that have multiple interleaves in converging and diverging behaviors. For example, a subsystem might be shared, then unshared, and shared again. Branched Subsystems: the subsystems that are always existed in both versions of a sibling release pair, but constantly unshared between two branches. Secondly, we like to measure the stability of subsystems in the different categories, especially those constantly shared in the evolution. Figure 3 details the algorithm to calculate the stability of the subsystems. The changes of each sibling versions of subsystems, in the unit of absolute ULOC and percentage, are collected. Further, we calculate the std (standard deviation value) and the mean value of changes on a particular subsystem over the evolution. Based on the mean value, we might know the size of every changes made to each system over the time. The std measures how widely the size of each change varies from each other. A stable subsystem in our definition should have low mean value and low std value. We believe that the std and mean value pair is a better choice than change ration value [1] because they cover both the range of changes as well as their mean. 3.2.4

Current Implementation

Figure 4 shows the current implementation of our analysis process. The colored polygons represent the processes that have been currently implemented by our system. The un-colored polygons represent the manual processes existed in our implementation. The round rectangles represent the artifacts of each process. The dash line stands for a “use” relationship, while the solid line stands for a “produce” relationship. We setup our analysis by paring up different versions of branched products to form sibling release pairs. For each version of the branched products, we first compile and build the application. We configure each build with a common set of configuration settings. Therefore, each build contains the same set of features. As a result, a set of object files are generated. We 9

#subs is an array contain different version of subsystem in a version Procedure StabilityEvaluator (subs) begin for i:=1 to subs.length do begin absdiff[i]:=subs[i].loc-subs[i-1].loc; reldiff[i]:=absdiff/subs[i-1].loc; end mean(absdiff); #calculate the mean value on absolute changes mean(reldiff); #calculate the mean value on relative changes std(absdiff); #calculate the standard deviation on absolute changes std(reldiff); #calcualte the standard deviation on relative changes end

Figure 3: Algorithms for Evaluating the Stability of Subsystems extract the path of these object files and store them into a text file. Then, we feed this file list into “.o to .c mapping script” to generate a list of c files, which are actually compiled based on current configuration. The pathes of these c files are stored into a file called “.c file list”. Then, we feed “.c file list” into a software called “LC counter” customized from a third party open source code C/C++ counting software [4]. It produces a set of descriptive data for the entities in the source code. It produces a set of line number information for every file in the input set. These line number information includes: total LOC with comment, ULOC without comment. Also, “LC counter” produces some overall information about each subsystem including total number of files and total ULOC. Note that, “LC counter” derives the subsystem hierarchy from the file system directory hierarchy. A top-level directory will be considered as a subsystem if it contains a set of compiled c files. For every version, an overall count on number of files and number of lines are produced as well. We store these information into a file called “subsystem file info txt” file. The “subsystem file info txt” file is then processed by our database table generation script to produce two tables in the database: “subsys stat table” and “file stat table” table. The “subsys stat table” contains the line number information and the file number infor-

10

mation for each subsystem. The primary keys of this table are the subsystem name and version id. The overall information for each version is stored under a fake subsystem name, “ALL”. The “file stat table” contains the line number information associated with each file. The “file stat table” links to the “subsys stat table” using subsystem name and version id as the foreign keys. We implement the “SubsystemAnalyzer” and “StabilityEvaluator” using the algorithms defined in section 3.2. Both programs are implemented in Java and connect to database using JDBC (Java Database Connection). The “SubsystemAnalyzer” extracts the line number data from tables to produce an output describing how the subsystems are shared in the each sibling release pair. It also reports a set of subsystems that are constantly shared over the evolution. Moreover, “SubsystemAnalyzer” produces a table containing file level sharing information in each subsystem. For a pair of subsystems, the information stored are: the number of files shared by both subsystems, the number of files unique to each subsystem, the number of files that have same name but are not shared according to our standard. This information is manually examined to produce the categorization subsystems according to their sharing status. The constantly shared subsystems can be provided to the “StabilityEvaluator” to generate stds and mean values on the changes of those subsystems. These data can be used to measure the stability of these shared subsystems.

4

Case Study

We use the name Firefox to represent both Firebird and Firefox releases. Identical configuration settings are used to compile all versions of Mozilla and Firefox to ensure that they include the same set of features. Some Mozilla unique features such as mail/news client, composer and chat client are disabled and not compiled since they are not provided in Firefox. Mozilla and Firefox are built with the same configuration settings. Only those files compiled during the build process are counted in the following case study. These files are identified by checking the existence of corresponding object files. 11

Data Collection

Legend: process implemented

.o to .c matching script

artifact of process .c_files_list manual process LC Counter

subsystem−file_loc_txt

uses produces

table creation script

Database

Subsys_Stat_Table

File_Stat_Table

Subsys_Diff_Table

Subsys_Changes_txt

Analysis

SubsystemAnalyzer

Subsys_sharing_txt

StabilityEvaluator

Manual_Categorization

Figure 4: Flowchart of software branching analysis

12

Subsys_Category_txt

4.1

System overview

Mozilla and Firefox consist of about two thousand C and C++ source code files (”.c”, ”.cp” and ”.cpp” files) and about 0.8 to 0.9 million uncommented lines of code in each release. Figure 5(a) counts Mozilla and Firefox releases by ULOC, while Figure 5(b) counts NOF, the number of files. A file is considered to be shared if and only if two versions of the file in the sibling releases have identical ULOC. It is shown that the overall size of Mozilla and Firefox releases are relatively stable in terms of NOF and ULOC during the one year and a half period of time. However, the percentage of shared code changes significantly across some of the releases. In Mozilla 1.3.1 and Firebird 0.6, shared code only consists of about 30% total ULOC and 50% of total NOF. However, in the consequent releases, the percentage of shared code increases to over 90% in both measures. We guess the increasing in the percentage of shared code in Mozilla 1.5 and Firebird 0.7 might be a result of a major refactoring process due to maintenance cost consideration, although we could not find any written evidence from the corresponding change logs and online documents to support our thoughts yet. It is easy to imagine that 50% branched files will impact the maintenance and further parallel development significantly. In such situations, many changes have to be propagated to two branches of Mozilla and Firefox, which may double the maintenance and development cost on the affected files. Another interesting observation is that after the sharp increasing of shared code in Oct. 2003, the percentage of shared code, in terms of both ULOC and NOF, decreases again gradually as time goes. Figure 6(a) shows similar results from another angle. In May 2003 release pair (Mozilla 1.3.1 and Firebird 0.6), 800 files are identified as branched, which means that 800 files out of 2000 files in total exist in both sibling release pair but are different from each other. If we count those files that are unique to one version of sibling release pair, Mozilla and Firebird have to maintain branched copies for over half of their source code files. Surprisingly, in their next release pair in Oct. 2003, the number of branched files was suddenly reduced to zero. Also the unique files in firebird was significantly reduced by over half. This discovery further confirms our belief on the happening of a major refactoring in Oct. 2003. We further notice that, after the major

13

refactoring, the number of branched files increases gradually again. 1e+06

System evolution

firefox mozilla shared

3000

600000

400000

200000

0 Jan. 02

firefox mozilla shared-code-base

2500

Number of files

Uncommented LOC

800000

2000 1500 1000 500 0 Jan. 02

May 03

Oct. 03

Feb. 04

Jun. 04

Nov. 04

May 03

Oct. 03

Mar. 05

(a) System evolution by ULOC

Feb. 04 Date

Jun. 04

Nov. 04

Mar. 05

(b) System evolution by NOF

Figure 5: Branched software system evoltuion

25000

700

db - f db - m dbm - f dbm - m expat - f expat - m jpeg - f jpeg - m

firefox only files mozilla only files branched files

600

20000

Uncommented LOC

Number of files

500 400 300

15000

10000

200 5000

100 0 Jan. 02

May 03

Oct. 03

Feb. 04

Jun. 04

Nov. 04

Mar. 05

(a) Branched files

0 Jan. 02

May 03 Oct. 03

Feb. 04

Jun. 04 Nov. 04 Mar. 05

(b) Shared subsystems by ULOC

Figure 6: Branched files and shard subsystems

4.2

Subsystem classification

To ease the study of shared code evolution pattern and the relationship between shared code evolution and subsystem characteristics, the subsystems are classified into different groups according their similarity and historical sharing information. The similarity and sharing information are achieved from horizontal analysis between sibling releases of parallel products. 14

Shard-code-base

Unique Subsystem

Splitting

Union

rdf

Interleave

Branched

Disappeared string

caps

accessible

docshell

dom

editor

content

browser

layout

extensions

gfx

db

chrome

htmlparser

intl

js

dbm

ipc

uriloader

profile

modules

embedding

toolkit

view

widget

xpfe

expat

xpinstall

jpeg netwerk nsprpub security xpcom

Table 1: Classification of subsystems by subsystem ULOC The historical information is collected from vertical analysis. For the horizontal analysis, two kinds of granularities, ULOC or number of shared files at subsystem level, are used to measure the sharing status in horizontal analysis. The results of the first measure, as shown in Table 1, is based on the counting and comparing of ULOC for all files in each subsystem. The second measure, as shown in Table 2, counts the number of identical files. The subsystems are classified as shared if and only if over 95% of code or files, in terms of ULOC or NOF respectively, are considered as shared between sibling releases. Based on sharing status identified for each subsystem along the release history, the subsystems are classified into seven categories as shown in Tables 1 and 2. We notice that these two measurements result in quite different results. We feel that summation of ULOC at subsystem level is too coarse to get a good indication on sharing measurement, since it can not deal with the fact that some changes may cancel off each other in term of the increasing or decreasing in ULOC summed at each subsystem level. The coarse grain of ULOC measurement at subsystem level impacts more on the classification of shared subsystem than other categories. In contrast, measurement by shared NOF is more accurate to actual file sharing status of each subsystem. For the branched software products, from the maintenance point of view, it is commonly believed that it is more efficient and desirable to share as much code as possible among branched products to reduce the cost on update propagation. 15

Shard-code-base

Unique Subsystem

Splitting

Union

Interleave

Branched

Disappeared string

db

accessible

content

caps

editor

dbm

browser

docshell

dom

js

expat

chrome

gfx

embedding

modules

ipc

htmlparser

extensions

rdf

intl

xpcom

nsprpub

layout

xpinstall

security

profile

view

uriloader

jpeg

toolkit netwerk

xpfe

widget

Table 2: Classification of subsystems by NOF

Mozilla

Firefox

Name

Standard Deviation

Mean

Name

Standard Deviation

Mean

xpcom

0.070

0.056

xpcom

0.060

0.056

content

0.020

0.024

content

0.010

0.018

embedding

0.017

0.013

embedding

0.006

0.007

expat

0.006

0.003

expat

0.006

0.003

nsprpub

0.003

0.003

nsprpub

0.002

0.001

db

0.001

0.001

db

0.001

0.001

jpeg

0.000

0.000

jpeg

0.000

0.000

dbm

0.000

0.000

dbm

0.000

0.000

Table 3: Stability of constantly shared subsystems

16

400000

350000

shared unique united interleaved branched

800

Number of files

300000

Uncommented LOC

1000

shared unique split united interleaved branched

250000

200000

150000

600

400

100000 200 50000

0 Jan. 02

May 03

Oct. 03

Feb. 04

0 Jan. 02

Jun. 04 Nov. 04 Mar. 05

(a) Subsystem classification by ULOC

May 03

Oct. 03

Feb. 04

Jun. 04

Nov. 04

Mar. 05

(b) Subsystem classification by NOF

Figure 7: Branched software system evoltuion Our study shows that over half of the total files in Mozilla and Firefox are constantly shared throughout the evolution period of time. By the term shared file, we refer to those files that exist in both sibling releases and are identical to their sibling versions. Figure 6(b) shows that shared subsystems are relatively stable in size. Table 3 shows the standard deviation and mean of share subsystems classified using ULOC measurement. The deviation and mean are relatively low, which implies high stability of these subsystems. Figure 8(a) shows some united (union) subsystems. Because subsystems vary significantly in size, only three united subsystems are shown in the figure due to graphical scaling consideration. The graphical scaling consideration also apply to some other figures. Some subsystems are initially considered as shared and remaining identical in many versions. However, since certain point, the shared-code-base is divided into two branches in the source tree. From there on, two branches evolve more independently. This might be an evidence that at a certain evolution point, some subsystems of the shared-code-base can not be shared efficiently due to some reasons, e.g. technical reasons, different goals etc. Figure 8(b) shows a split subsystem, “rdf”, which is the only one we found. The subsystem “rdf” was initially identical, but becomes about 30% different in size lately throughout the remaining evolution period. Figures 9 shows interleaved subsystems measured in term of NOF. Each curve with label 17

14000

Uncommented LOC

12000

20000

htmlparser-f htmlparser-m docshell-f docshell-m view-f view-m

rdf - firefox rdf - mozilla

15000

Uncommented LOC

16000

10000

8000

6000

4000

10000

5000

2000

0 Jan. 02 May 03 Oct. 03 Feb. 04 Jun. 04 Nov. 04 Mar. 05

0 Jan. 02

(a) Some united subsystems

May 03

Oct. 03

Feb. 04

Jun. 04

Nov. 04

Mar. 05

(b) Split subsystem

Figure 8: Union and Split ended with “- avg” indicates the average number of files of Mozilla and Firefox releases for each corresponding subsystem. Each curve labelled with “- shared” represents the number of shared files between Mozilla and Firefox releases in the corresponding subsystem. Figure 9(b) shows the percentage of unshared files over total number of files in each interleaved subsystem. It shows some interesting phenomena. Firstly, as we mentioned before, we see that all interleaved subsystems in two sibling release pair clearly converged together in Oct. 2003 due to a possible major refactoring. We also observe that these subsystems diverge from their sibling versions later again. The degree of divergence is large for many of them. However, in certain versions, these diverged subsystems converge again. We guess this is due to the application of some minor refactorings on individual subsystems. It seems this kind of minor refactoring is frequently applied to every sibling release pairs except for the first two. Although we could not find documented evidence to support our guess of refactoring, it seems the maximization of shared source code is clearly desirable to the developers. Branched subsystems are those significantly differ in size from their siblings in the release pair throughout the evolution period, see Figure 10. Many of the branched subsystems tend to diverge from their siblings during the software evolution. Some subsystems only exist in one branch throughout their evolution history. Some of these subsystems may be the implementation of some unique features of the product it 18

300

200

widget intl extensions layout caps dom profile uriloader embedding

0.8

Percentage of unshared files

250

Number of files

1

widget - avg widget - shared intl - avg intl - shared extensions - avg extensions - shared layout - avg layout - shared

150

100

0.6

0.4

0.2 50

0 Jan. 02 May 03 Oct. 03 Feb. 04 Jun. 04 Nov. 04 Mar. 05

(a) Some interleaved subsystems by NOF

0 Jan. 02 May 03 Oct. 03 Feb. 04 Jun. 04 Nov. 04 Mar. 05

(b) Interleaved

subsystems

by

percentage

(unshared NOF/total NOF)

Figure 9: Interleaved subsystems 180

140 120 100 80

0.8

Percentage of branched files

160

Number of files

1

widget - avg widget - shared xpcom - avg xpcom - shared xpfe - avg xpfe - shared editor - avg editor - shared js - avg js - shared modules - avg modules - shared

60 40

editor js modules rdf xpcom xpfe xpinstall

0.6

0.4

0.2

20 0 Jan. 02 May 03 Oct. 03 Feb. 04 Jun. 04 Nov. 04 Mar. 05

(a) Branched subsystems by NOF

0 Jan. 02 May 03 Oct. 03 Feb. 04 Jun. 04 Nov. 04 Mar. 05

(b) Branched subsystems by percentage (branched NOF /total NOF )

Figure 10: Branched subsystems belongs to. We omitted the plotting of such subsystems.

5

Discussion

As we studied in this project, the evolution of shared code in branched software products is a repetitive process of converging and diverging. The complexity consideration seems to

19

be a nature force behind divergence, while refactoring process for better maintenance serves as a strong force to bring diverged subsystems converging again. It is understandable that while facing the pressure of release deadlines, developers may naturally select the easier way during the implementation of new features or concepts though source code branching. We know that before May 2003, Phoenix, the previous name of Firefox product, resided in the same CVS repository with Mozilla. Phoenix is built from Mozilla source tree by adding a MOZ PHOENIX option. Mozilla 1.3.1 and Firebird 0.6 are the first officially branched release. This release pair is painful from the maintenance point of view, since over half of the files are branched, which doubles the cost on propagating source code update. As a consequence, in the next release pair in Oct. 2003, a huge effort was devoted to solve this problem by merging branched code together. Interestingly, it seems the painful experience are still not enough to stop the diverging of the shared source code completely. It is not persuasive enough to explain this phenomenon as driving by the consideration of complexity alone, since the developers must have already known that branching hurts future maintenance. It seems that there are some other considerations that force the diverging of share source code. One possibility might be to experiment new designs or concepts in one product first and marge it to another one lately if successful. Again, we have no documented evidence and this is a subject of our future investigation.

6

Conclusions and future works

In this project, we introduce a systematic and automated methodology for software branching analysis. We further apply our branching analysis methodology to two well-known open source products, Mozilla and Firefox. According to our observations from the case study, maintenance and software complexity considerations seem to be the major driving force behind branched software evolution. These two factors affect the development and maintenance of branched software products in different ways. Source code branching is effective to reduce software complexity while implementing conflicting new designs or concepts on one of the branched software product. 20

However, due to the doubling of maintenance cost on branched code, software complexity might become a secondary consideration. There might be some other considerations that force shared code diverging during software evolution. During the software evolution, the size of shared code tends to decrease gradually. The maximization of shared code seems to be desirable. Refactoring processes are frequently applied to increase the shared code for maintenance purpose. The evolution pattern in term of the amount of shared code between sibling products seems to be a series of diverge and converge processes. Instead of evolving totally separately, we believe that branched software systems co-evolve throughout their life circle. As our future works, we will customize “diff” tool to provide more precise measurement on file sharing detection instead of counting ULOC. We will also make more studies on other branched software systems to examine our observations. We also want to systematically integrate our analysis components into a pipeline, fronted by a GUI.

7

Acknowledgments

We gratefully acknowledge Prof. Michael W. Godfrey for his advises and support on this project. We also thank Lijue Xu and Jingwei Wu for their support on the Beagle origin analysis tool in the early stage of this project.

21

References [1] Mikio Aoyama. Continuous and discontinuous software evolution: aspects of software evolution across multiple product lines. In IWPSE ’01: Proceedings of the 4th International Workshop on Principles of Software Evolution, pages 87–90, New York, NY, USA, 2001. ACM Press. [2] Mikio Aoyama. Metrics and analysis of software architecture evolution with discontinuity. In IWPSE ’02: Proceedings of the International Workshop on Principles of Software Evolution, pages 103–107, New York, NY, USA, 2002. ACM Press. [3] Michael Godfrey and Qiang Tu. Growth, evolution, and structural change in open source software. In IWPSE ’01: Proceedings of the 4th International Workshop on Principles of Software Evolution, pages 103–106, New York, NY, USA, 2001. ACM Press. [4] B. Marick and R. Rizzuto. lc and lc2 source code line counter. [5] Tom Mens and Serge Demeyer. Future trends in software evolution metrics. In IWPSE ’01: Proceedings of the 4th International Workshop on Principles of Software Evolution, pages 83–86, New York, NY, USA, 2001. ACM Press. [6] Microsoft. Microsoft visual sourcesafe. [7] Kumiyo Nakakoji, Yasuhiro Yamamoto, Yoshiyuki Nishinaka, Kouichi Kishida, and Yunwen Ye. Evolution patterns of open-source software systems and communities. In IWPSE ’02: Proceedings of the International Workshop on Principles of Software Evolution, pages 76–85, New York, NY, USA, 2002. ACM Press. [8] The Mozilla Organization. Firebird 0.6 release notes and faq. [9] The Mozilla Organization. Firebird 0.8 release notes and faq. [10] The Mozilla Organization. Mozilla firefox project. [11] The Mozilla Organization. Mozilla suite. 22

[12] CVS Project. Concurrent versions system.

23

A Case Study on Software Branching: Mozilla and Firefox

Software branching is closely related to source code file branching, which is provided ... an evolution analysis on the open source Linux kernel. Paper [7] ... Developer probably wants to re-implement a system with a newly emerged ..... uration settings are used to compile all versions of Mozilla and Firefox to ensure that they.

Download PDF

256KB Sizes 2 Downloads 177 Views

Report

A Case Study on Software Branching: Mozilla and Firefox

Recommend Documents