Maturity of source code management â another ...

Viewer
Transcript

Maturity of source code management – another approach to code quality Bogusz JELINSKI

Abstract. Source code quality is often viewed only through the prism of the static analysis and good programming practices. But there are many other code characteristics which if treated carelessly can entail loss of the entire code or increased number of incidents in the production environment. The purpose was to depict the neglected traits of source code management (SCM) and to propose a tool which should address some process vulnerabilities – the encapsulation, in which the contamination of deployed code is prevented by limiting the input points and the dependency on humans are minimized. Additionally to the tool a metric has been proposed to measure the extent to which the process is made hermetic (SCM maturity), preventing code contamination. Finally statistical data is provided for the code integration pattern – one of the neglected trait of SCM. Keywords. Source code, quality, proces maturity, software process improvement, SPI, SCM.

1.

Introduction

Source code is a vital company asset – it is the core of the digital business. It determines the behavior of computer systems but is often not visible by the business staff and is not managed explicitly. When “good code quality” is required, “the compliance with good programming practices” is meant. It is overlooked that any perfect code can by outdated or licenses unconsciously used by developers can obligate code’s owner to share it with the whole Internet. The code can even be completely useless (any further change or maintenance blocked) as there might be no knowledge or tools to compile it. Missing know-how can effectively limit the competition between potential suppliers and can strengthen the negotiating position of the current one. One should also know that it is technically feasible to deliver a fully functional IT solution without having delivered the compiled part of the source code with all negative consequences for future development and maintenance. In less scary scenarios the way you use source code affects the efficiency of IT departments and the user experience. If you construct 90% of code during integration or user acceptance tests (see Section 7), either the tests are costly or the production environment is exposed to a serious threat. All three project dimensions - cost, time, scope/quality are badly affected. This article is based on author's experience in an implementation of the source code management process (SCM) in a large telecommunication company using hundreds of computer systems. The goal of the effort was to depict both threats arising from mismanagement of source code and ways to avoid negative consequences of the materialization of risks. The following key questions were formulated at the beginning of the research: 1. What configuration items should be managed to protect the source code? 2. What are the aspects of code quality? 3. What are the risks of a mismanagement? 4. How to design the SCM process in order to address these risks? 5. Can a performance indicator be proposed for the process? 6. Are there any other vital process metrics? For example – how much of the code is constructed during tests? As a result of the research a tool has been proposed, which addresses some process vulnerabilities – the encapsulation, in which the contamination of deployed code is prevented by 1

limiting the code delivery points and the dependency on humans is minimized by enclosing their knowledge into the process (eg compilation and deployment scripts and tools). The improving performance of the process can be measured using a commonly understood scale – maturity, which may be also a part of a customized software maintainability measure, as the ISO 25010 [1] sees it but does not operationalize it. The term “SCM maturity” was used in a similar context eg by Forester Consulting [2] and there were many other attempts to create a software maturity model [3], not to mention ISO/IEC 33004 [4]. The maturity scale is proposed in Section 6, two snapshots of its progress measured during a vast IT transformation are provided. It is worth mentioning that the scale was used not only for measurement itself but also as an improvement guide - a checklist of good practices, a target to be achieved by managers. The goal of the SCM transformation was to provide complete and up to date source code for a vendor consolidation programme, so the idea behind was to concentrate on the asset management, not on agility or efficiency. 2. Elements of source code Let us make a brief summary what “source code” actually is. It is made up of files containing one or more instructions executed in a specified order by a computer. These instructions determine functionalities of a system, regardless of the programming language - C/C++, Java, PL/SQL, PHP or HTML. The source code also includes SQL scripts eg scripts performing a rollback to the last, error-free software version. When transferred to the purchaser the source code should also be accompanied by: •

a description of compilation sequence and all tools necessary to complete the compilation,

•

resources not considered as source code but required to compile and run the system eg properties and configuration files, Ant/Maven scripts, and libraries,

•

shell scripts used to compile, configure, run, manage, and monitor the system,

•

unit tests – numerous albeit small fragments of code which goal is to verify the correct operation of the code itself.

3. How does the quality of code manifest itself? The table below presents some operational characteristics linked to the quality of code. This is more a subjective business approach than an attempt to provide a complete categorization of software characteristics as proposed in ISO-25010 [1]. Trait up to date complete

compilable

Description coherent with the production environment containing all the elements indicated in Section 2. the software process (dynamic) perspective - should compile if complete.

If neglected some functionalities missing a blocker for development and maintenance a blocker like above. If we do not know how to compile the code or do not have tools to do it we are unable to conduct any change in a computer system. Its

fully merged

well tested coding standards obeyed opensource licenses obeyed

low IT debt

eco friendly (green index)

code reused

code covered

source code has then only a sentimental value for us. some functionalities missing when the release goes live.

no change/functionality forgotten for a release (ISO functional suitability/completeness) Had all expected functionalities been More issues on production likely delivered before tests started? What to appear (see Section 7) does code integration pattern [5] look like? ISO-9126-3 [6] (replaced by ISOSecurity or performance issues, 25010) or other good programming code hard to maintain practices conformed with. One has to be sure that a software It can restrict the use of software. complies with open-source (BSD, In the best case - you shall be ASL, …) and free software (GPL, forced to reveal your sourceLGPL) licenses, as these licenses are code. unconsciously used by developers. that metaphor of the coding The interest on the debt is the cost of bug-fixing standards has gained momentum lately. It is an attempt to estimate how much would it cost to remove from the source code all violations of good practices. The methodology for the calculation of this measure is still in its infancy. Yet another metaphor – how greedy More energy consumed. more for computing power a software is. It hardware to be purchased. might combine the static analysis with a some run-time characteristics – memory and processor usage Avoid duplication of functionalities Software harder to maintain. within one system (less copy-pastes) and across the organization - see ISO/IEC 25010 reusability. It can help reduce the cost of system development. what proportion of code is being Higher risk of bugs tested by tests such as unit tests? Table 1 – Traits of the good quality code.

Three traits from the table will be discussed in more detail later on. Any discourse about code quality should (but very often fails to) start with its completeness (or integrity, if ISO terminology is to be followed) and being up-to-date. In order to protect these valuable characteristics it is proposed to meet a few requirements regardless to what extent the IT services have been outsourced: • take care of legal aspects, including the right to independently modify the source code if it 3

is reasonable. store the code in your own repository (revision/version control system, in other words) or at least in one which is maintained by a trusted third party (escrow). • oversee your manufacturing process to be sure that your source code repository is consistent at least with your production environment. You can do it by encapsulation of the code management process (recommended, see below) or by a periodic comparison of the repository with runtime environments. • automate as many steps of the process as feasible. Get rid of the human factor. • review (inspect) your code manually or automatically and measure other process characteristics (the code integration pattern – see below). All source code should be conserved, not only that of the biggest or critical systems. It happens that a loss of source code of a seemingly irrelevant module, which is integrated (sometimes in an undocumented way) with crucial systems blocks an important change in these crucial systems. That was exactly the case that gave the author of this article the stimulus to take up the matter and to spend a few years on it. •

4. Encapsulation of the SCM process Based on author’s intuition rather than having committed an extended research a technique has been proposed to ensure the completeness and timeliness of the source code - the selfdeployment of production environments by the acquirer with its own repository, or carrying out this task by a trusted third party, thus avoiding any unauthorized, undocumented or just unknown changes to go live. If adapted also to tests environments and accompanied by some supervision of repositories' content it helps avoid the deployment of untested changes. Therefore, to avoid “contamination” of changes, the following preventive encapsulation is proposed: 1. the software supplier should be formally obliged to the provision of the complete source code (vide description, see Section 2) with an unequivocal documentation of compilation sequence and a deployment procedure, most preferably a script, eg a Jenkins job, 2. after the construction phase the supplier should submit the code to the acquirer's repository, by hand or by automated replication of the supplier's own repository. The decision on which stages of the project the code is to be delivered and whether the test environment should also be subject to the regime of this process depends on the particular organization and situation. 3. the supplier provides the address (location) of the source code (for Subversion in form of URL@REVISION). The location of each version should be stored in a Configuration Management Data Base (CMDB). This detail is important as there can be hundreds of branches in the repository and the withdrawal of changes due to the occurrence of unforeseen, negative occurrences requires the knowledge of the location of the previous version. A good practice is to appoint a person (a product owner) responsible for keeping CMDB up-to-date, even if the release history is supported by tools. 4. The code submitted by the supplier, which location is stored, is checked out from the repository under the supervision of a person who is loyal to the acquirer (if the process is not automated). 5. The code gets compiled as described in the compilation procedure under the same supervision; executables are produced. 6. The interpreted code and executables are deployed to runtime environments under

supervision or in an automated way. Let us repeat – the only executable allowed to be deployed is the one produced in step 5 (the interpreted code checked out in step 4) 7. in further phases stabilization and maintenance fixes are delivered (and new change requests, new functionalities) – the process iteratively returns to step 2. Encapsulation does not mean here creating a black box, hiding implementation details. The idea is to restrict human interference which would mean adding some undocumented knowledge, complicating repeatability of the process. If you pursue that process at least for production environments then you get confidence of having up-to-date and complete source code. With one crucial exception – this process does not guarantee the delivery of source code for libraries other than open-source software and commercial third-party stuff. We mean libraries with ordered functionalities for which no source code is provided, deliberately or mistakenly. Human intervention is required, periodically or every time – the verification of changes in the list of libraries required for the purpose of compilation. Unfortunately, sometimes we discover the lack of code for a library many years after an implemented change, when you need to change the functionality delivered by the library, often when its author is not available or the agreement frees him from liability. Such a library or a part of it is to be implemented from scratch. As stated before, this is not always due to malice intent of software provider, they may also have competency gaps, as well as their sub-contractors. 5. An alternative for encapsulation While studying various SCM process implementations the author noticed two alternatives to the implementation of the above-described approach to source code management (as a method of achieving completeness of the code): - performing at certain intervals a comparison of code archive with the content of production environments. - deploying the production environment from the repository periodically or at the end of an outsourcing contract, with extensive regression tests. These methods allow to dispense with keeping excessive human resources on the acquirer’s side. However: • such ad-hoc actions must be conducted or supervised by a competent and loyal third party, usually a highly paid consultant. • due to the nature of technology it is not always possible to determine the conformity of the object code (executables), even if they are formed from the same source code. • The restoration of the production runtime environment from the repository is generally not accepted by the business side because of the risk of downtime and requires costly regression testing of the whole system. • an efficient, repeatable process, subjected to self-optimization, in which people act in a quasi-automatic, learned way, is usually much less expensive (and often even unnoticed) than single audits, requiring escalations and interfering with the course of other business processes in the company. • postulated in Section 4 techniques and tools do not imply the creation of any new process or process instance, but they only shift already functioning activities in terms of supervision, roles and tools towards the acquirer. The resulting costs are already incurred and included in the price, the acquirer pays for them (eg for the repository storage for developers) as the supplier is not a charity organization. So the postulated change is 5

mainly about the loyalty of supervisors, ownership of tools and procedures and their location. 6. The SCM process maturity It is suggested that you can't control what you can't measure [7]. For the needs of a transformation of source code management within a vast collection of systems (197 systems, some of them counting 1M+ lines of code) the author developed an assessment model [8] with a process maturity measure (see Fig. 1) which described the degree of encapsulation. In this way it was possible to report transparently (eg with suggestive colors) and numerically the progress of expected organizational changes and to report aggregated values per department or functional area. The idea was to assign a number to qualitative attributes that described the achievement of an important milestone, as follows: • level 0 (black): lack of source code, unknown location of code • level 1 (red): source code in a drawer, in a server, not in a repository under version control • level 2 (orange): code under version control, known location stored in CMDB but unable to compile it – no knowledge, no tools or no hardware. A case where there is the knowledge and tools but they are not used for deployment is also classified as Level 2. Here we have no certainty that the code is coherent with the production environment (eg environments are deployed from supplier’s repository). That is probably the most common case in the business with an outsourced IT - a semblance of having complete source code. • level 3 (yellow): the production environment is deployed from code owner’s repository (the process is supervised) but compilers are owned or managed by the supplier or the compilation is an arduous process – no single build script. In other words – we have much more trust in the content of a repository but there might be some knowledge missing how to use it. This situation might be very risky too – old versions of tools might be hard to get, threats arising from a migration to new compilers might be hard to accept by the business owner. For the interpreted code this level could be the first to be labeled “code under control” as level 4 and 5 are compilation oriented. • level 4 (green): compilation/build of a system might still be conducted „by hand” but it is carried out in business owner’s environments (not in suppliers’ premises) and a single build script exists – the build knowledge is stored in form of a computer algorithm/sequence, not stored in someone’s head or an ambiguous text description. Executables produced by this script are deployed to the production environment. In other words – we have the code, compilers and most of the knowledge, we use them but there is little automation. • level 5 (light blue): automated build, compilation is carried out eg. by a continuous integration tool - there is a front-end with the “build” button and build history. It is an important step because of getting rid of the human factor and due to the repeatability of the process. • level 6 (navy blue): the production environment is deployed automatically with no human intervention, the executable is distributed by a script/robot. That is the last place for a human to monopolize the knowledge and a vital step towards increased operational efficiency.

Fig. 1. Levels of maturity of source code management When the transformation started more than half of the code was not in the repository. After fifteen months nearly all compiled systems had their own compilation automation, many compilation scripts were written from scratch as suppliers did not want to provide them or there was not any contract relationship with any supplier at that time. About one third of systems (71 of total 197) stayed at the level 3 as they had no compiled components.

Fig. 2. The progress of code maturity within a SCM transformation carried out by the author – the share of maturity levels within 197 systems.

7. The code integration pattern – have the changes been well tested? There is a trait of code that requires special attention and which characterizes both the trust in code and the quality of the whole SCM process. The following questions may be raised during a software project: - were all changes planned for the next release ready to be merged to the integration branch before starting tests? - in other words - did the changes have the opportunity to be tested? - were all functionalities expected by business owner developed? At least developed before the code affected a test case? This might resemble the “fully merged” trait (see table 1) but it is not about forgetting to conduct a merge but about an intentional behavior – a decision to start integration or acceptance tests without having closed the development. Unfortunately the practice to construct source code 7

during tests is very common and may be caused by poor requirements management or inadequate volume of human resources (developers, testers). This is a pathological phenomenon - a disease which should be fought as it ruins quality. When there are fixed release deadlines it implies running untested functionalities. The source code repository lets you easily, quickly and cheaply measure the percentage of source code delivered at each project phase. If you use Subversion then 'svn diff' does the trick. If a reasonable threshold is exceeded you can analyze the causes and risks. This is feasible provided that also the test environments are made hermetic (encapsulated; see Section 4), not only the production ones. This metric is like a clinical thermometer, it aggregates all the pathologies which pollute today’s software processes. The concept to pay attention to the extent of code being constructed during particular project phases is not new. The code integration pattern was mentioned by Stephen Kan [5] as a “simple and useful project management and quality management tool” (see also [9]). The author measured the percentage of code constructed during integration and user acceptance tests during eighty-one projects, which changed five systems. The value (see table 2) is underestimated by at least ten percent points as in most cases there was a delay in the code delivery to the repository or indicated code revisions were late. On average 43% of the code was constructed during tests. Number Average System of [%] projects System1 5 56,3 System2 12 40,5 System3 14 33,9 System4 10 50,7 System5 40 43,8 Table 2. Average percentage of code constructed during tests Moreover, the number of projects with the value exceeding 80% was nearly the same as these below 20%.

Fig. 3. Frequency of particular code integration patterns (the percentage of code constructed during tests) This data is presented to visualize the scale of the problem within modern companies, which aspire to become “digital” ones.

8. Some other interesting topics Many other interesting aspects related to the management of source code have been omitted here, which are a source of potential risks and opportunities, for example: • organization of work in the repository, including issues so meticulous as permitted directions of promoting bug fixes in the project branches, regime of code merges, naming of branches. • measurement of size changes in a software (backfired function points) • monitoring of code delivery and its impact onto work efficiency (Hawthorne effect [10]) • which programming language to choose to reduce lock-in to suppliers or meet expected performance goals? Summary There are many code characteristics which if treated carelessly can entail loss of the entire code or increased number of incidents in the production environment. Most of the risks described above are due to innate imperfect human nature. Software suppliers and employees will always want to have a better negotiating position by monopolizing know-how wherever possible. By automating tasks we can make the software process repeatable and self-documenting and the negative consequences of losing an employee are mitigated. Instead of a vague Word document XML files are used (eg. Jenkins jobs). The automation is often already used by software suppliers, mainly to oversee the work of subcontractors and their employees. No additional effort to be claimed and reimbursed! Therefore, the code management is not just a matter of engineering – it helps maintain competition in the supply of software and exerts pressure on the suppliers bids, thus it assists in building competitive advantage. References 1. ISO/ IEC CD 25010, Software Engineering: Software Product Quality Requirements and Evaluation (SQuaRE) Quality Model and guide. (2008) 2. Continuous Delivery: A Maturity Assessment Model. Forrester Consulting (2013) https://info.thoughtworks.com/Continuous-Delivery-Maturity-Model.html [16 March 2016] 3. Schweigert T., Vohwinkel D., Korsaa M., Nevalainen R., Biro M.: Agile maturity model: analysing agile maturity characteristics from the SPICE perspective. Journal of Software: Evolution and Process. DOI: 10.1002/smr 4. ISO/IEC 33004 2nd Edition, Information technology — Process assessment — Requirements for process reference, process assessment and maturity models. (2015) 5. Kan, S. H.: Metrics and Models in Software Quality Engineering. pp. 242, Addison-Wesley, Boston (2004) 6. ISO/IEC TR 9126-3, Software Engineering - Product Quality - Part 3: Internal Metrics. (2003) 7. DeMarco, T.: Controlling Software Projects: Management, Measurement and Estimates. pp. 3, Prentice Hall, Upper Saddle River (1982) 8. Wagner S.: Software Product Quality Control. pp. 36, Springer-Verlag, Berlin (2013) 9. Laird, L.M., Brennan, M.C.: Software Measurement and Estimation: A Practical Approach. pp. 186, IEEE Computer Society Press, Los Alamitos (2006)

9

10. Landsberger, H. A.: Hawthorne Revisited: Management and the Worker, Its Critics, and Developments in Human Relations in Industry. Ithaca, Cornell University (1958)

Source Code Aplikasi.pdf

Source Code Roundup.pdf

Source Code Management/Version Control - CSE, IIT Bombay

Reading Source Code

Source code aplikasi Digital Library (digilib) berbasis php _ Source ...

Sample Certificate - Project Management Maturity in Organizations ...

pdf editor source code

Open Source Code Serving Endangered Languages - GitHub

Verification of Source Code Transformations by Program ... - CiteSeerX

fiizxa Call Of Duty 4 Hack Source Code

maturity matrix.pdf

Dynamic Debt Maturity

Source Code Aplikasi ERP + Accounting Indonesia

A flexible, robust, open-source code for simulating ...

Source Code for Biology and Medicine - Semantic Scholar

Management of Fever Without Source in Infants and Children

PDF Lions' Commentary on UNIX with Source Code ...

Source Code Mp3 Player Visual Basic VB.NET.pdf