IJRIT International Journal of Research in Information Technology, Volume 2, Issue 6, June 2014, Pg: 188-193

International Journal of Research in Information Technology (IJRIT) www.ijrit.com

ISSN 2001-5569

Survey of Programming Plagiarism Detection Vale Alekya1, S Sai Satyanarayana Reddy2 1

PG Scholar, Department of Computer Science, Lakireddy Bali Reddy College of Engineering Mylavaram, Andhra Pradesh, India [email protected]

2

Proffessor, Department of Computer Science, Lakireddy Bali Reddy College Of Engineering Mylavaram, Andhra Pradesh, India [email protected]

Abstract As day to day technology goes on increasing as well as fraud increasing. Thefting others intellectual properties and presenting as their owns. Technically, We can called as “Plagiarism”. In this paper we have done an overview of effective plagiarism detection methods that have been used for source code plagiarism detection. And also discussed several important issues in plagiarism detection such as; plagiarism detection Tasks, plagiarism detection process and some of the current plagiarism detection tools. (JPlag,Moss,PDM). Keywords: Plagiarism Detection; Detection Process, Detection Techniques.

1. Introduction Plagiarism, the act of taking the writings of another person and passing them off as one’s own. The fraudulence is closely related to forgery and piracy-practices generally in violation of copyright laws.” Encyclopedia Britannica [1].Plagiarism can be considered as one of the electronic crimes, like (computer hacking, computer viruses, spamming, phishing, copyrights violation and others crimes). Plagiarism defined as the act of taking or attempting to take or to use (whole or parts) of another person’s works, without referencing or citation him as the owner of this work. It may include direct copy and paste, modification or changing some words of the original information from the internet books, magazine, newspaper, research, journal, personal information or ideas. According to the Merriam-Webster Online Dictionary, to ”plagiarize” means: – To steal and pass off (the ideas or words of another) as one’s own. – To use (another’s production) without crediting the source. – To commit literary theft. – To present as new and original an idea or product derived from an existing source. Also according to Turnitin.com, plagiarism.org and Research Resources this are considered plagiarism: – Turning in someone else’s work as your own. – Copying words or ideas from someone else without giving credit. – Failing to put a quotation in quotation marks. – Giving incorrect information about the source of a quotation. – Changing words but copying the sentence structure of a source without giving credit. Vale Alekya,IJRIT

188

IJRIT International Journal of Research in Information Technology, Volume 2, Issue 6, June 2014, Pg: 188-193

– Copying so many words or ideas from a source that it makes up the majority of your work, whether you give credit or not (see our section on ”fair use”rules). Plagiarism can be classified into five categories: 1. Copy & Paste Plagiarism. 2. Word Switch Plagiarism. 3. Style Plagiarism. 4. Metaphor Plagiarism. 5. Idea Plagiarism. There are two types of plagiarism are more occurs: 1. Textual plagiarisms: this type of plagiarism usually done by students or researchers in academic enterprises, where documents are identical or typical to the original documents, reports, essays scientific papers and art design. 2. A source code plagiarism: also done by students in universities, where the students trying or copying the whole or the parts of source code written by someone else as one’s own, this types of plagiarism it is difficult to detect.

2. Why Plagiarism Detection is Important In some of the academic enterprises like universities, schools and institutions, plagiarism detection and prevention became one of the educational challenges, because most of the students or researchers are cheating when they do the assigned tasks and projects. This is because a lot of resources can be found on the internet. It is so easy to them to use one of the search engines to search for any topic and to cheat from it without citing the owner of the document. So it is better and must all academic fields they should have to use plagiarism detection soft-wares to stop or to eliminate students cheating, copying and modifying documents when they know that they will be found. Some types of plagiarism acts can be detected easily by using some of the recent plagiarism detection soft-wares available on the market or over the internet. However for some of the expert plagiarism who is using some of the anti-plagiarism soft-wares which are available over the internet, it needs more efforts to detect the plagiarism or cannot be detected at all. Plagiarism is practiced not only by student but also there are some staff members who like to publish papers in which some parts are directly copied or partially modified to be one of the famous people. There is a big number of plagiarism soft-wares used for plagiarism detection and many of detection tools have been developed by researchers but still they have some limitations as they cannot prove or they show evidence that the documents has been plagiarized from another document other similarity and give hints to some other documents. This is if the paper has been published globally in some international journal, but some of universities and some of the research centres still do not taking any action against plagiarism detection which help people to cheat more and more. So still now by using the recent detection software, plagiarism can not 100% be detected? Copyrights and legal aspects for use of published documents also can be covered by using plagiarism software, so it can show whether this person has legally or illegally copied the documents or not and it also show the whether this person has permission from the owner to use this document or not. Plagiarism detection is also one of the most important issues to journals, research center and conferences; they are using advanced plagiarism detection tools to ensure that all the documents have not been plagiarized, and to save the copyrights from violation for the publishers. 2.1 Plagiarism Detection Tasks The first step in dealing with plagiarism is to clearly define the tasks at hand.[2],plagiarism detection was divided into two main tasks are:

Vale Alekya,IJRIT

189

IJRIT International Journal of Research in Information Technology, Volume 2, Issue 6, June 2014, Pg: 188-193

2.1.1 Extrinsic Plagiarism Detection Task Extrinsic plagiarism detection assesses plagiarism on a reference to one or more source documents in the corpus. This task tries to utilize the capability of the computer to search for similar documents inside a corpus and retrieve possibly plagiarized documents. 2.1.2 Intrinsic Plagiarism Detection Task Intrinsic plagiarism detection evaluates cases of plagiarism by searching into possible suspicious documents in isolation. This type tries to represent the ability of the human to detect plagiarism by examining differing writing styles. “Intrinsic plagiarism aims at identifying potential plagiarism by analyzing a document with respect to undeclared changes in writing style. Several studies had been made under this task such as [3].

3. Plagiarism Detection Process & Tools Unfortunately, many academic institutes do not take plagiarism as seriously as they should. Often they take an “ostrich” approach and turn a blind eye to any wrong doing or at best they may have a very soft policy against plagiarism equating it with bad behavior. However, more and more institutes are taking plagiarism seriously [4]. 3.1 Plagiarism detection process stages :

Figure 1. Four-stage Plagiarism Detection Process

As illustrated below in Figure 1, Lancaster and Culwin [21] define the important stages used for plagiarism detection as collection, analysis, confirmation and investigation. These four stages are important for designing error free process. In this section, these four stages and their functions will be discussed. 3.1.1. Stage One: Collection This is the first stage of Plagiarism Detection Process, and it entails the student or researcher to upload their assignments or works to the web engine, the web engine acts as an interface between the students and the system.

3.1.2. Stage Two: Analysis Vale Alekya,IJRIT

190

IJRIT International Journal of Research in Information Technology, Volume 2, Issue 6, June 2014, Pg: 188-193

In this stage all the submitted corpus or assignments are run through a similarity engine to determine which documents are similar to other documents. There are two types of similarity engines, first intra-corpal engine and second extra-corpal engine. The intra-corpal engines work by returning ordered list between each similar pairs. By contrast, the extra-corpal engines return suitable web links. 3.1.3. Stage Three: Confirmation The function of this stage is to determine if the relevant text has been plagiarized from other texts or to determine if there is a high degree of similarity between a source document and any other document.

3.1.4. Stage Four: Investigation This is the final stage of a Plagiarism Detection Process and it relies on human intervention. In this step a human expert is responsible for determine if the system ran correctly as well as determining if a result has been truly plagiarized or simply cited. All four of these stages rely on recognizing the similarity between documents and as a result, they rely on efficient algorithms to search out the similarities between the documents. There is also an element of time complexity required for the human to confirmation and investigation suspected instances of plagiarism. 3.2 Current Plagiarism Detection Tools: Source code plagiarism or it called programming plagiarisms usually done by students in universities and schools can be defined act or trial to use, reuse, convert and modify or copy the whole or the part of the source code written by someone else and used in your programming without citation to the owners. Source code detection mainly requires human intervention if they use Manual or automatic source code plagiarism detection to decide or to determine whether the similarity due to the plagiarism or not. Manual detection of source code in a big number of student homework’s or project it is so difficult and needs highly effort and stronger memory, it seems that impossible for a big number of sources. Plagiarism detection system or algorithms used in source-code similarity detection can be classifies according to Roy and Cordy [5] can be classified as based on either: •

’Strings - look for exact textual matches of segments, for instance five-word runs. Fast, but can be confused by renaming identifiers’.



”Tokens - as with strings, but using a lexer to convert the program into tokens first. This discards whitespace, comments, and identifier names, making the system more robust to simple text replacements. Most academic plagiarism detection systems work at this level, using different algorithms to measure the similarity between token sequences”.



”Parse Trees - build and compare parse trees. This allows higher-level similarities to be detected. For instance, tree comparison can normalize conditional statements, and detect equivalent constructs as similar to each other”.

Vale Alekya,IJRIT

191

IJRIT International Journal of Research in Information Technology, Volume 2, Issue 6, June 2014, Pg: 188-193



”Program Dependency Graphs (PDGs) - a PDG captures the actual flow of control in a program, and allows much higher-level equivalences to be located, at a greater expense in complexity and calculation time”.



”Metrics - metrics capture ’scores’ of code segments according to certain criteria; for instance, ”the number of loops and conditionals”, or ”the number of different variables used”. Metrics are simple to calculate and can be compared quickly, but can also lead to false positives: two fragments with the same scores on a set of metrics may do entirely different things ”.



”Hybrid approaches - for instance, parse trees + suffix trees can combine the detection capability of parse trees with the speed afforded by suffix trees, a type of string-matching data structure”.

Some Programming Plagiarism Detection Tools are: 3.2.1 JPlag: JPlag is a system that finds similarities among multiple sets of source code files. This way it can detect software plagiarism. JPlag does not merely compare bytes of text, but is aware of programming language syntax and program structure and hence is robust against many kinds of attempts to disguise similarities between plagiarized files. JPlag currently supports Java, C#, C, C++, Scheme and natural language text. JPlag is typically used to detect and thus discourage the unallowed copying of student exercise programs in programming education. But in principle it can also be used to detect stolen software parts among large amounts of source text or modules that have been duplicated (and only slightly modified). JPlag has already played a part in several intellectual property cases where it has been successfully used by expert witnesses. JPlag has a powerful graphical interface for presenting its results. Example: Just to make it clear: JPlag does not compare to the internet! It is designed to find similarities among the student solutions, which is usually sufficient for computer programs. The use of JPlag is free, but you must obtain an account. This requirement is not only necessary to avoid unauthorized use of JPlag by students, but also to provide the easy and installation-free access to the software. 3.2.2 Moss: Moss (for a Measure Of Software Similarity) is an automatic system for determining the similarity of programs. To date, the main application of Moss has been in detecting plagiarism in programming classes. Since its development in 1994, Moss has been very effective in this role. The algorithm behind moss is a significant improvement over other cheating detection algorithms. Moss is not a system for completely automatically detecting plagiarism. Plagiarism is a statement that someone copied code deliberately without attribution, and while Moss automatically detects program similarity, it has no way of knowing why codes are similar. It is still up to a human to go and look at the parts of the code that Moss highlights and make a decision about whether there is plagiarism or not. One way of thinking about what Moss provides is that it saves teachers and teaching staff a lot of time by pointing out the parts of programs that are worth a more detailed examination. But once someone has

Vale Alekya,IJRIT

192

IJRIT International Journal of Research in Information Technology, Volume 2, Issue 6, June 2014, Pg: 188-193

looked at those portions of the programs, it shouldn't matter whether the suspect code was first discovered by Moss or by a human; the case that there was plagiarism should stand on its own. In particular, it is a misuse of Moss to rely solely on the similarity scores. These scores are useful for judging the relative amount of matching between different pairs of programs and for more easily seeing which pairs of programs stick out with unusual amounts of matching. But the scores are certainly not a proof of plagiarism. Someone must still look at the code. Moss can currently analyze code written in the following languages: C, C++, Java, C#, Python, Visual Basic, Javascript, FORTRAN, ML, Haskell, Lisp, Scheme, Pascal, Modula2, Ada, Perl, TCL, Matlab, VHDL, Verilog, Spice, MIPS assembly, a8086 assembly, a8086 assembly, MIPS assembly, HCL2. The current Moss submission script is for Linux. In response to a query the Moss server produces HTML pages listing pairs of programs with similar code. Moss also highlights individual passages in programs that appear the same, making it easy to quickly compare the files. Finally, Moss can automatically eliminate matches to code that one expects to be shared (e.g., libraries or instructor-supplied code), thereby eliminating false positives that arise from legitimate sharing of code. 3.2.3 PMD: The PMD open source tool provides a Copy/Paste Detector (CPD) for finding duplicate code. CPD uses the Karp-Rabin string matching algorithm. It works with Java, JSP, C, C++, Fortan and PHP code. It also provides guidance on how to add other programming languages to the tool. Unlike JPlag, MOSS this tool is not specifically aimed at detecting similarities in students’ work but works well in doing so. Similarly to JPlag, CPD uses a variation of the Karp-Rabin string matching algorithm developed by Wise. The developers of PMD provide excellent support and documentation for this tool. Because it is a duplicate code detector, this tool scans the files themselves for duplicate code, hence it returns similar code found within the same file. However, it is also successful in returning similar code across different files and can be used as a tool for detecting similarity in source-code files.

4. Conclusion In this study the problem of plagiarism detection was considered as it is one of the most publicized forms of code reuse around us today. In particular, it has been shown in this study how the plagiarism problem can be handled using different techniques and tools. However, there are still some weaknesses and shortages in these techniques and tools which will affect the success of plagiarism detection significantly.

References [1] Encyclopedia Britannica, http://www.britannica.com/EBchecked/topic/462640/plagiarism (last access February 7, 2011) [2] Bao Jun-Peng, Shen Jun-Yi, Liu Xiao-Dong, Song Qin-Bao, A Survey on Natural Language Text Copy Detection, Journal of Software, 2003, vol.14, No.10, pp. 1753-1760. [3] http://www.plagaware.com/about plagaware/application (last access February 7,2011). [4] Zechner, M., Muhr, M., Kern, R., Michael, G. External and intrinsic plagiarism detection using vector space models. In: Proceedings of the SEPLN’09Workshop on Uncovering Plagiarism, Authorship and Social Software Misuse. pp. 4755 (2009). [5] http://www.cs.queensu.ca/queensu.ca/TechReports/Reports/2007-541.pdf. (last access February7,2011) Vale Alekya,IJRIT

193

Survey of Programming Plagiarism Detection

As day to day technology goes on increasing as well as fraud increasing. ... paper we have done an overview of effective plagiarism detection methods that ..... C, C++, Java, C#, Python, Visual Basic, Javascript, FORTRAN, ML, Haskell, Lisp, ...

58KB Sizes 2 Downloads 261 Views

Recommend Documents

Plagiarism, detection and intentionality
Plagiarism in an Online World: Problems and Solutions, Information Science Reference,. New York, pp. 124-143. [ISBN: 978-1599048017]. Introduction. The issue of academic integrity within higher education has received considerable attention in the lit

Plagiarism, detection and intentionality - Semantic Scholar
regard to the design of algorithms as such and the way in which it is ..... elimination of plagiarism requires a systemic approach which involves the whole system.

GPLAG: Detection of Software Plagiarism by ... - ACM Digital Library
Along with the blossom of open source projects comes the convenience for software plagiarism. A company, if less self-disciplined, may be tempted to plagiarize ...

GPLAG: Detection of Software Plagiarism by Program ...
giarism detection tools appear sufficient for academic use, ..... 3. SOFTWARE PLAGIARISM DETECTION. This section reviews existing plagiarism detection ..... Even through the above two-stage pruning, for any g ∈ .... a meaningful way: it balances th

Face Detection Methods: A Survey
IJRIT International Journal of Research in Information Technology, Volume 1, Issue 11, November, 2013, Pg. 282-289 ... 1Student, Vishwakarma Institute of Technology, Pune University. Pune .... At the highest level, all possible face candidates are fo

Survey on Malware Detection Methods.pdf
need the support of any file. It might delete ... Adware or advertising-supported software automatically plays, displays, or .... Strong static analysis based on API.

A Plagiarism Detection System in Computer Source Code - Ijcsra.org
International Journal of Computer Science Research and Application ..... She received her M.S. degree in Algorithms and Software Products (2007), Faculty of.

What Plagiarism Detection Systems Reveal and why it ...
Apr 14, 2011 - detection system) encounter, interpret and constitute each other ..... that the writer has taken the text from is not in the database of the detection ... files). Some texts on the internet are also behind passwords (not in the ...

efficient and effective plagiarism detection for large code ... - CiteSeerX
1 School of Computer Science and Information Technology,. RMIT University ... our approach is highly scalable while maintaining similar levels of effectiveness to that of JPlag. .... Our experiments with an online text-based plagiarism detection ...

A Survey of Spectrogram Track Detection Algorithms
Sep 22, 2009 - and the abundance of data requires the development of more sensitive detec- tion methods. This problem ... Detection, Remote Sensing, Vibration Analysis, Frequency Tracking ...... of increased storage space). To separate ...

A Comprehensive Survey of Data Mining-based Fraud Detection - arXiv
knowledge, which proposes alternative data and solutions from related domains. Keywords. Data mining applications, automated fraud detection, adversarial detection. 1. .... telecommunications products/services using non-existent identity .... collect

Intrusion Detection Systems: A Survey and Taxonomy - CiteSeerX
Mar 14, 2000 - the Internet, to attack the system through a network. This is by no means ... latter approach include its reliance on a well defined security policy, which may be absent, and ..... and compare the observed behaviour accordingly.

A Survey on Brain Tumour Detection Using Data Mining Algorithm
Abstract — MRI image segmentation is one of the fundamental issues of digital image, in this paper, we shall discuss various techniques for brain tumor detection and shall elaborate and compare all of them. There will be some mathematical morpholog

Intrusion Detection Systems: A Survey and Taxonomy - CiteSeerX
Mar 14, 2000 - r The prototype version ...... programmer may have emacs strongly associated with C files, ... M Esmaili, R Safavi, Naini, and J Pieprzyk.

Intrusion Detection Systems: A Survey and Taxonomy
Mar 14, 2000 - ed into two groups: the first deals with detection principles, and the second deals with .... direct A.2, DIDS A.9, MIDAS(2) A.2 ... ful login attempts.

Of Flattery and Thievery: Reconsidering Plagiarism in a ...
May 5, 2007 - Web gives them access to huge amounts of text at the touch ... ago, I have noticed a rising emphasis by my univer- ... likely to be seeking credit.

UNITED STATES DISTRICT COURT SOUTHERN ... - Plagiarism Today
music arises from synchronization and master use licenses, and Serendip receives far more ... YouTube video on Twitter.com and Patreon.com. 13. Defendant's ...

Plagiarism 2017- Sept Final.pdf
Page 2 of 10. 2 CRNA Today. Target Audience- Continuing Education (CE) courses are “Provider-Directed. Independent Study” as defined by the AANA, “self-paced learning activity. developed for individual use”. These CE's are intended for Nurse

UNITED STATES DISTRICT COURT SOUTHERN ... - Plagiarism Today
of provisions of the Digital Millennium Copyright Act (DMCA), pursuant to 17. U.S.C. § 512, to ... June 2, 1972, with Registration Certificate N2920. Serendip, as ...

Plagiarism MLA Handout revised.pdf
The wine snobbery of the time. extolled the merits of wines from the slopes of Mount Lebanon, from Palestine, Egypt and Magna Graecia-Greater. Greece; i.e. ...

Programming Exercise 8: Anomaly Detection and ... - nicolo' marchi
multivariateGaussian.m - Computes the probability density function .... takes as input the data matrix X and should output an n-dimension vector mu that holds the ..... dients.2 If your implementation is correct, you should find that the analytical.

cultural attitudes towards plagiarism
1 Dept of Organisation, Work and Technology, Lancaster University ... 2 THE RESEARCH CONTEXT, STRUCTURE AND PURPOSE .. 6. 2.1 ... Degree of plagiarism . .... 2002 and is based in the Information Management Research Institute at Northumbria .... After

STUDENTS RESPONSES TO PLAGIARISM THESIS MAKING ...
STUDENTS RESPONSES TO PLAGIARISM THESIS MAKIN ... uhammadiyah University of North Sumatera).pdf. STUDENTS RESPONSES TO PLAGIARISM ...

Digital Cheating and Plagiarism in Schools
Information Sciences and Technology at Penn State ... tivity, peer-to-peer communication, career seek- ... generation who are fluent with digital technology.