IJRIT International Journal of Research in Information Technology, Volume 3, Issue 4, April 2015, Pg. 407-413

International Journal of Research in Information Technology (IJRIT) www.ijrit.com

ISSN 2001-5569

Various possibilities of Clone Detection in Software’s: A Review Er. Richa Grover1 G.I.T.M.1 Kurukshetra University,India1 [email protected]

Er. Narender Rana2 Astt. Proff. In C.S.E2 G.I.T.M,Kurukshetra University,India2 [email protected]

Abstract: Software clone detection involves detection of duplicated code from two source codes. As a result, software systems often contain sections of code that are very similar, called software clones or code clones. In bug detection if a bug is present in one code fragments then it have to checked to all similar copied code fragment and it results in more in bug detection Every clone detection technique requires an intermediate representation of program so that the matching algorithm can accurately detect clone in an efficient manner. Program slicing is one of the most widely used intermediate representations to detect code clones. A program slice is an independent part of the program which does not affect the behavior of remaining program. Thereafter, various algorithm is used that performs a matching between the computed variable dependencies. .The aim of this paper is that to study different type of clones and various possibilities of clone detection in softwares.

Keywords: Software clones, Program slicing, Variable dependencies, Dead code, Matching algorithm. I. Introduction In software development process, cloning of software code is becoming common. Merriam- Webster dictionary defines clone as it appears to be a copy of an original form. Copying existing code fragments from a section of code and pasting it with or without modification into another section of code is called code cloning. The copied code is called a software clone and the process is called software cloning. Cloning is such a type of duplicity of an original form. In bug detection if a bug is present in one code fragments then it have to checked to all similar copied code fragment and it results in more time and maintenance cost. Therefore, considering high maintenance cost and more time for bug propagation software clone detection is required. Software clone detection is detection of software code clones. But in post development phase, it is very difficult to find out which code fragment is original and which one is copied code fragment.However software cloning sometimes useful for developers because it may reduce the time of coding. Code cloning is very bad practice in software development process. Code cloning is not only difficult to maintain but also produces subtle errors[11]. Code clones may adversely affect the software systems quality, especially their maintainability and comprehensibility[12].

II. Related Work In the field of software clone detection, various systematic review is provided by Roy & Cordy[1],Rattan & Bhatia[2]. These surveys show that the research is going on increasing day by day. But still there is no precise definition about code clone. Every research has its own definition but not a specific. In almost software, code cloning is done because with exact clones: it shrinks 14% the size of code fragment and with parameterized clones: it shrinks 61% of the code clone[2]. Mostly 20-30% of large software system consists of cloned code[1]. Mostly code cloning happens due to open source software. In open source software, source code is provided; due to this any developer can easily use copy-paste and code cloning. With availability of source code, developers can easily cloning the functionality of code which is not textually similar.

Er. Richa Grover, IJRIT-407

IJRIT International Journal of Research in Information Technology, Volume 3, Issue 4, April 2015, Pg. 407-413

1. Clone Terminology Clone detection tools normally report clones in the form of Clone Pairs or Clone Classes or both. Both these terms are based on a similarity relation between two or more cloned fragments. The similarity relation between the cloned fragments is an equivalence relation (i.e., a reflexive, transitive, and symmetric relation)[8]. A clone relation holds between two code fragments if they are the same sequences, where sequences may refer to the original character text strings, strings without whitespace, sequences of tokens, transformed or normalized sequences of tokens, and so on. Clone detection tools uses terms like code fragment, code clone, clone type clone pair and clone classes. In the following we define clone pair and clone class in terms of the clone relation [1][2].

(A) Code Fragment: A code fragment is any sequence of code lines with or without comments. It can be of any granularity, e.g., function definition, begin end block, or sequence of statements. A code fragment is identified by its file name, begin line number and end line numbers in the original code base. (B) Code Clone: A code fragment CF2 is a clone of another code fragment CF1 if they are similar by some given similarity definition. Two fragments that are similar to each other form a clone pair (Code fragment1, Code fragment2) and when many fragments are similar, they form a clone class or clone group. (C) Clone Types: Two code fragments are similar according to textual similarity and their functional similarity. According to textual similarity they can be type1, type2, type3 and with functional similarity they can be type4 clones. (D ) Clone pair: A pair of code fragments is called a clone pair if there exist a clone relation between them i.e. a clone pair is a pair of code fragments which are identical or similar to each other. (E) Clone class: A clone class is a maximal set of code fragments in which any two of code fragments holds a clone relation or form a clone pair.

2. Software Clones Baxter et al. [7] define clones as the segments of code that are similar according to some defnition of similarity. While they provide a threshold-based defnition of tree similarity for near-miss clones, there is no specific definition of detection independent clone similarity. A more vaguer definition is provided by Kamiya et al. [8]. They define clones as the portions of source file(s) that are identical or similar to each other. While by the term identical they mean exact copy clone, there is no formal definition of the term similar.According to different research studies, there are four basic types of clones. One study Cordy[1] shows that two types of similarities between two code fragments. Similarity of two code fragments is based on their similarity of program text and similarity of functionality. First three types are textual similarity and type four is functional similarity. The complexity of detecting software clones is going on increasing as we are going through from type 1 to type 4 clones.

(A)Type 1 (exact clones) Exact clones are similar code fragments but dissimilarity in white space and comments. Type 1 clones are broadly known as exact clones. Here in figure 1 we have two code fragments and these both are exact copy of each other after removing white spaces and comments. Even if we put something else or more words in comment line and more spaces in one or two code fragments then also these two code fragments are exact clones after removing white spaces and comments. Figure1: (exact clones)

(B)Type 2 (renamed/parameterized clone) As the name suggests parameters are renamed i.e. identifiers (name of constants, variables, functions), literals, types, layout and comments. Type 2 clones are widely known as parameterized clones. A parameterized clone is Er. Richa Grover, IJRIT-408

IJRIT International Journal of Research in Information Technology, Volume 3, Issue 4, April 2015, Pg. 407-413

a code fragment which is same as the original except for some possible variations in the names of corresponding user-defined identifiers. The reserved words and sentence structures are mostly the same as the original one. Here in figure2 we have of two code fragments in which a change is made to their variable names and value assignments but the syntactic structure is same in both code fragments Figure 2 : (renamed clones)

(C)Type 3 (near-miss clones) Code fragments with further modifications such as statements insertion/deletion and also changes to identifiers, types, layout, and literals. Some number of statements insertion to and deletion from original source code fragment brings in type 3 clones. Type 3 clone are widely known as near-miss clones because the cloned code is nearly matched with original code with some less or more number of statements. In most cases all the statements of original source code used directly after being changed in their identifiers, literals and with some modification of statements like insertion or deletion of statements. Figure3: (near-miss clones)

Er. Richa Grover, IJRIT-409

IJRIT International Journal of Research in Information Technology, Volume 3, Issue 4, April 2015, Pg. 407-413

(D)Type 4 (semantic clones) Code clones are the result of semantic similarity between two or more code fragments. These code clones are functionally similar but differ in textually. According to Roy & Cordy[1] two code segments may be developed by two different programmers to implement the same kind of logic making the code fragments similar to in their functionality. Functional similarity reflects the degree to which the components act alike, captures similar functional properties and similar assessment methods rely on matching of pre/post-condition. Here in figure4 we are taking an example of factorial program in which in one source code fragment a simple for loop is executed and functionality is result come as factorial as shown below. But in other source code fragment, a recursion method is called and functionality is same as in first code fragment i.e. results in factorial. Figure4: (semantic clones)

3. Various Possibilities of Clones Detection in Software Various clone detection techniques are presented in various researches. Four types of clones viz. type 1, type 2, type 3, and type 4 are detected by different technique. For any technique, the source representation and match detection technique are most important characteristics. The detection of code clones is mainly a two phase process: transformation and a comparison phase. Transformation phase is pre-processed phase in which removing any uninteresting parts like comments and blank lines. Intermediate representation is a way of extracting useful information based upon which comparison is done. Various intermediate representation/ transformation techniques are available viz. AST parse tree, Regularized token, call graph, vector space, PDG etc[2].After choosing source code representation technique, match detection algorithms are applied. Clone detection techniques are classified into various type.

(A)Text based clone detection: In this technique, the target source code of program is considered as sequence of texts/lines/strings. Then two code fragments are compared with each other to find sequence of same text/string/lines. Therefore, if any two or more code fragments are found to be similar then they are called cloned code. Various tools are available for this text based technique like DuDe, Simian, SDD, NICAD etc. DuDe is line based clone detection tool which is helpful for detect smaller sized exact clones[11]. Simian is language independent tool which treated all programming language source code as plain text file. SDD is Similar Data Detection which is helpful in detecting large size systems. NICAD is text based hybrid clone detection tool which is very effective in detecting near-miss clones[1].

(B)Token-Based clone Detection: In this technique, tokens are extracted from lexical analysis/parsing/transformation of complete source program. Then, this extracted sequence of tokens is scanned for detecting cloned subsequence of tokens. Available tools for this technique are CCFinder,CCFinderX, DCCFinder, CPMiner,RTF, iclone, ConQAT and FCFinder etc[8][9].

Er. Richa Grover, IJRIT-410

IJRIT International Journal of Research in Information Technology, Volume 3, Issue 4, April 2015, Pg. 407-413

(C)Tree-based clone detection: In this, parse tree and abstract syntax tree are used with a parser of specific language as user’s interest. Some matching techniques are applied for detecting similar sub trees. Various tools for this approach are Yang[12] CloneDR[7], Sim, Deckard, ClemanX, JCCD etc. (D)Graph based clone detection: This technique is based on fine-grained program dependency graphs(PDGs) which represent the structure of a program and the data flow within it. This technique considers semantic information of the source. In this, isomorphic subgraph matching algorithms are applied for detecting similar subgraphs. If these subgraphs are matched then they are considered as cloned code. Various tools for this technique are Duplix, PDG-DUP[4],Scorpio[10] etc. (E)Metric based clone detection: This approach collects the different metric values for different code fragments and then a comparison is made between these metric values of different code fragments. With depend upon specific metric value; if value is matched then it is cloned code. Metrics have been used successfully in clone analysis, clone evolution, clone visualization. Various tools for this technique are Columbus, Source Monitor, Datrix[1] etc. (F)Hybrid clone detection: Several new techniques are formed with use of previous techniques. It depends on developer. Some existing hybrid techniques are there which are developed by different researchers. Sutton et al applied an algorithm to detecting clones for large code bases, this algorithms is using clustering.

4. Program Slicing The original concept of a program slice was proposed by Mark Weiser[5]f. According to his definition, a slice s of program p is a subset of the statements of p that retains some specified behavior of p. The desired behavior is detailed by means of a slicing criterion c. Generally, a slicing criterion c is a set of variables V and a program point l. When the slice s is executed, it must always have the same values as program p for the variables in V at point l. Program slicing is a technique to decompose programs by analyzing their data and control flow. Roughly speaking, a program slice consists of those program statements which are related to the values computed at some program point and/or variable, referred to as a slicing criterion[3]. As it was originally defined: "A slice is itself an executable program which is subset of the program whose behavior must be identical to the specified subset of the original program's behavior". According to Weiser, a static slice is a set of statements that directly or indirectly affect the value of a variable at a given program point and this point is known as slicing criterion. The slicing criterion is denoted by (S, V) where ‘S’ is the statement or line number and ‘V’ is the variable in the program[5]. The program slice is reduced, executable program obtained from program. The process of slicing deletes those parts of the program which can be determined to have no effect upon the semantics of interest. The task of computing program slice is called as program slicing.. Slicing has applications in testing and debugging, program comprehension, re-engineering and software measurement. Slicing is of two types i.e. static slicing and dynamic slicing. Whether it is static slicing or dynamic slicing is depends on behavior of variable i.e. variable used is statically or dynamically. Figure5 shows a program sliced with respect to the slicing criterion (print (product), product).

Figure 5: (program slice)

III. Formulation of Problem There are various techniques for detecting software clones of different types i.e. type1, type2, type3, type4. The proposed approach detects the non-contiguous clones with use of program slicing and compare the slice using Er. Richa Grover, IJRIT-411

IJRIT International Journal of Research in Information Technology, Volume 3, Issue 4, April 2015, Pg. 407-413

various matching algorithm. A non-contiguous clone is a code clone having elements which are not sequentially residing within source code of program. Therefore, detecting of these types of clones is not possible with other techniques.

IV. Proposed approach Clone detection approach consists of three major steps. First, the preprocessing of the input that normalizes the input into a standard form which is acceptable by the next phase. Second, an intermediate representation of the input in which the normalized input is transformed into an intermediate representation for which an easier and efficient matching algorithm can be developed. Third, the matching algorithm that executes on the intermediate representation to detect most of the potential clones. • In the proposed approach, Preprocessing is done by removing all the unnecessary spaces, blank lines and comments.This will also reduce the complexity of the algorithm as after the preprocessing the size of the code would be reduced to some extent. • Next phase is the intermediate representation of the normalized code. Some slices of program are extracted from the source code using variable dependency. • The last phase is for matching the two intermediate representations of the two input source programs by applying matching algorithm. These three phases are integrated to form a technique for non-contiguous clone detection.

V. Conclusion and Future Scope Software clone detection is a very broad and important research area to improve maintainability and quality of the system. A large number of clone detection techniques have been explored over the last two decades. After going through a vast study of the literature in this area, we came to a conclusion that a limited work is done in the area of determining intertwined code clones. By determining the intertwined clones, it is highly probable that each and every possible clone is successfully detected The first phase for this approach is to remove the dead code from souce code. But in future input source code file need to be modified and then any form of source code file can taken as input in that case there is no requirement of any dead code removal.This approach just detect the software clones and in future it may be possible to detect the malware code by checking it with machine code.This approach is applicable for detecting software clone in structured programming i.e.for ‘c’ programs, further it will be extended for object-oriented programs like ‘c++’ and ‘java’ programs.

VI. References 1.

C.K. Roy, J.R. Cordy, A Survey on Software Clone Detection Research, Technical Report 2007541, Queen’s University at Kingston Ontario, Canada, 2007, p. 115. 2. Dhavlesh Rattan, Rajesh Bhatia, Maninder Singh, Software Clone Detection: A Systematic Review, Information and Software Technology 55 (2013) 1165-1199. 3. A Survey of Program Slicing Techniques by Frank Tip,1984. 4. R. Komondoor, S. Horwitz, Using slicing to identify duplication in source code, in: Proceedings of the 8th International Symposium on Static Analysis (SAS’01), vol. LNCS 2126, Paris, France, 2001, pp. 40–56. 5. Weiser, Mark. “Program Slicing.” Proc. 5th Intl. Conference on Software Engineering, San Diego, California, IEEE Computer Society, March 1981, 439-449. 6. S. Bellon, R. Koschke, G. Antoniol, J. Krinke, E. Merlo, Comparison and evaluation of clone detection tools, IEEE Transactions on Software Engineering 33 (9) (2007) 577–591. 7. Ira Baxter, Andrew Yahin, Leonardo Moura, Marcelo Sant Anna. Clone Detection Using Abstract Syntax Trees. In Proceedings of the 14th International Conference on Software Maintenance (ICSM'98), pp. 368-377, Bethesda, Maryland, November 1998. 8. T. Kamiya, S. Kusumoto, K. Inoue, CCFinder: a multi-linguistic token-based code clone detection system for large scale source code, IEEE Transactions on Software Engineering 28 (7) (2002) 654–670. 9. Z. Li, S. Lu, S. Myagmar, Y. Zhou, CP-Miner: finding copy–paste and related bugs in large-scale software code, IEEE Transactions on Software Engineering 32 (3) (2006) 176–192 10. Y. Higo, S. Kusumoto, Code clone detection on specialized PDG’s with heuristics, in: Proceedings of the 15th European Conference on Software Maintenance and Reengineering (CSMR’11), Oldenburg, Germany, 2011, pp. 75–84. 11. Wettel and Marinescu ,The program dependence graph and its use in optimization. ACM Transactions on Programming Languages and Systems, 9(3):319–349, 2005.

Er. Richa Grover, IJRIT-412

IJRIT International Journal of Research in Information Technology, Volume 3, Issue 4, April 2015, Pg. 407-413

12. A. Chou, J. Yang, B. Chelf, S. Hallem, and D. R. Engler. An empirical study of operating system errors. In Proceedings of the 18th ACM symposium on Operating systems principles (SOSP'01), pp. 7388, Ban®, Alberta, Canada, October 2001. 13. Simon Giesecke. Generic modelling of code clones. In Proceedings of Duplication, Redundancy, and Similarity in Software, ISSN 16824405, Dagstuhl, Germany, July 2006.

Er. Richa Grover, IJRIT-413

Various possibilities of Clone Detection in Software's: A Review - IJRIT

In software development process, cloning of software code is becoming common. ... (C) Clone Types: Two code fragments are similar according to textual ...

218KB Sizes 5 Downloads 275 Views

Recommend Documents

Various possibilities of Clone Detection in Software's: A Review - IJRIT
Software clone detection involves detection of duplicated code from two ..... Program slicing is a technique to decompose programs by analyzing their data and ...

Review on Various Application of Cloud computing in ... - IJRIT
IJRIT International Journal of Research in Information Technology, Volume 2, ... Keywords: Cloud Computing, ICT, Wireless Sensor Actor Network, Agri – Cloud, Mobile Cloud. .... supports the 2.5G, 3G or 4 G technologies, distributed all over.

Review on Various Application of Cloud computing in ... - IJRIT
phones, laptops, software, scientific instruments. Mobile ... components used in availing ICT services, such as virtual computers, traffic monitoring and redirecting, .... [10] Cloud computing and emerging IT platforms: Vision, hype, and reality for 

A Review on Various Collision Detection and ...
avoidance for VANET are not efficient to meet every traffic scenarios. .... each vehicle has traffic information with itself and this information is transferred to all the ...

A Study Of Various Techniques For The Brain Tumor ... - IJRIT
A Study Of Various Techniques For The Brain Tumor ..... Journal of Advanced Research in Computer Science and Software Engineering, Vol. 2, No. 3, issue 3 ...

A Study Of Various Techniques For The Brain Tumor ... - IJRIT
IJRIT International Journal of Research in Information Technology, Volume 3 ..... assignment and there is a need and wide degree for future examination to ... Journal of Advanced Research in Computer Science and Software Engineering, Vol.

Enhanced Dynamic Detection of Code Injection Attack in OS ... - IJRIT
At runtime, a monitor compares the behavior of the variants at certain ... The global decision is made by a data fusion center, ... complete solution. Modern static ...

Detection Elimination and Overcoming of Vampire Attacks in ... - IJRIT
Ad hoc wireless sensor networks (WSNs) promise exciting new applications in the near future, such as ubiquitous on-demand computing ... In the one cause of energy loss in wireless sensor network node in the idle consumption, when the nodes are not pa

Detection and Prevention of Intrusions in Multi-tier Web ... - IJRIT
In today's world there is enormous use of Internet services and applications. ... networking and e-commerce sites and other web portals are increasing day by ...

Detection and Prevention of Intrusions in Multi-tier Web ... - IJRIT
Keywords: Intrusion Detection System, Intrusion Prevention System, Pattern Mapping, Virtualization. 1. ... In order to detect these types of attacks an association .... website not listed in filter rules Noxes instantly shows a connection alert to.

Detection Elimination and Overcoming of Vampire Attacks in ... - IJRIT
... Computer Science And Engineering, Lakkireddy Balireddy College Of Engineering ... Vampire attacks are not protocol-specific, in that they do not rely on design ... are link-state, distance vector, source routing, geo graphic and beacon.

Enhanced Dynamic Detection of Code Injection Attack in OS ... - IJRIT
Security vulnerabilities in software have been a significant problem for the computer industry for decades. ... The malware detection system monitors data from a suite of .... us to detect and prevent a wide range of threats, including “zero-day”

A Framework for Real Time Detection of ... - IJRIT
widely used social networking sites. ... profiles [7], [8] for confusing users, blacklisting URLs for security reasons [9], [10] and tools for reporting spam. [11].

A Framework for Real Time Detection of ... - IJRIT
Twitter is one of the famous OSN sites that provide a platform for virtual communities. ... hackers who will make use of the service for sharing malicious content. ... profiles [7], [8] for confusing users, blacklisting URLs for security reasons [9],

Identification of Enablers of Poka-Yoke: A Review - IJRIT
Keywords: Indian Manufacturing Industries, Poka-Yoke, Quality, ... method is to eliminate human errors in manufacturing process and management as a result of ...

Identification of Enablers of Poka-Yoke: A Review - IJRIT
IJRIT International Journal of Research in Information Technology, Volume 1, Issue 8, ... application of this tool, errors are removed in production system before they produce ... Chase and Stewart state that Poka-Yoke involves a three steps process

An Innovative Detection Approach to Detect Selfish Attacks in ... - IJRIT
Student, Computer Science & Engineering, Laki Reddy Bali Reddy College Of Engineering. Mylavaram .... Haojin Zhu et.al proposed a method to find the probable security threats towards the collaborative spectrum ... integrity violations [6].

Fire Detection Using Image Processing - IJRIT
These techniques can be used to reduce false alarms along with fire detection methods . ... Fire detection system sensors are used to detect occurrence of fire and to make ... A fire is an image can be described by using its color properties.

Fire Detection Using Image Processing - IJRIT
Keywords: Fire detection, Video processing, Edge detection, Color detection, Gray cycle pixel, Fire pixel spreading. 1. Introduction. Fire detection system sensors ...

A Review on Change Detection Methods in Hyper spectral Image
Keywords: - Change detection, hyper spectral, image analysis, target detection, unsupervised ..... [2] CCRS, Canada Center for Remote Sensing, 2004.

A Review: Study of Iris Recognition Using Feature Extraction ... - IJRIT
INTRODUCTION. Biometric ... iris template in database. There is .... The experiments have been implemented using human eye image from CASAI database.