Comparison of Similarity Metrics for Refactoring Detection Benjamin Biegel, Quinten David Soetens, Willi Hornig, Stephan Diehl, and Serge Demeyer
Mining Software Repositories Honolulu, Hawaii. May 21, 2011.
Universität Trier
Introduction • Why is it important to detect refactorings? • Replication experiment – Existing refactoring detection approach “Identifying Refactorings from Source-Code Changes” P. Weißgerber et al. (best paper, ASE’06)
– But replaced the similarity metric (clone detector) – Non-identical replication (invited talk MSR’10)
Main Research Questions I. How significant is the influence of each similarity metric? II. What is the impact of these metrics on the detected refactorings?
EXPERIMENTAL SETUP
original
replication
signature-based analysis
filtering
ccfinder
SH
SH shingles
jccd
SH shingles
overlap ccfinder
jccd
Evaluated Projects • 4 of 7 projects of the original experiment – – – –
• the strong filter is very selective much smaller influence of the clone detectors
Filtering Strategy 2 ambiguous candidates
c a
b SH
best candidates
a b c
a c b ranking
b a c
Filtering Strategy 2 ambiguous candidates Overlap
weak
Overlap
2.1% - 22.8%
51.6% - 59.7% • still a large common basis • but different rankings
Overlap small artifacts
• less differences by ignoring smaller artifacts
62.6% - 82.3%
Conclusions I.
How significant is the influence of each similarity metric?
II.
What is the impact of these metrics on the detected refactorings?
• Specific characteristics of the clone detectors are mostly not represented in the detected refactoring candidates • The result sets have a comparable quality Overlap has a pretty high recall
LESSONS LEARNED
Our Experience with the Replication • Original raw data, processed data, scripts, tool, and documentation was available • But still a steep learning curve • Expert support important Contact the original author first
Mining Software Repositories. Honolulu, Hawaii. May 21, 2011. ... best candidates. SH b c a. Filtering Strategy 2 ambiguous candidates a b c a c b b a c ranking ...
pressed domain so as to cater to the constraints imposed ... tion: Euclidean distance is not always the best metric. The ... but also on good similarity measures.
evaluate the usefulness of this type of visualisation as an image browsing aid. So far ... evaluation of the different arrangements, or test them as browsing tools.
Performance Comparison of Optimization Algorithms for Clustering in Wireless Sensor Networks 2.pdf. Performance Comparison of Optimization Algorithms for ...Missing:
Square pixels became the norm because there needed to be an industry standard to avoid compatibility issues over .... Euclidean Spaces'. Information and ...
I She's os tall as her brother. Is it as good as you expected? ...... 9 The ticket wasn't as expensive as I expected. .. .................... ............ . .. 10 This shirt'S not so ...
1 'My computer keeps crashing,' 'Get a ......... ' . ..... BORN: WHEN? WHERE? 27.7.84 Leeds. 31.3.84 Leeds. SALARY. £26,000 ...... 6 this job I bad I my last one.
Comparison of Proper of Time for Lent.pdf. Comparison of Proper of Time for Lent.pdf. Open. Extract. Open with. Sign In. Main menu. Displaying Comparison of ...
expansion methods and three term-dropping strategies. His results show that .... An iterative approach is used to determine the best EM distance to describe the rel- evance between .... Cross-lingual Filtering Systems Evaluation Campaign.
same meaning of the sentence. An example that .... clude: Duplicate white spaces removal, excessive tatweel (or Arabic letter Kashida) removal, HTML tags ...
Northwestern University, Evanston, IL, USA 60201, USA pardo@northwestern. .... The right panel of Figure 1 shows the standard deviation of participant sim- ... are only loosely correlated to human similarity assessments in our dataset. One.
knowledge. It seems to be a simple fact that Kristin and I disagree over when .... vocal critic of notions of concept similarity, it seems only fair to give his theory an.
1 MIT CSAIL. 2 Tel Aviv University ... ponent in a variety of computer vision applications such as ...... dation grant 1556/10, National Science Foundation Robust ... using accelerated proximal gradient approach. ... Online object tracking: A.
oriented) chosen by the client. A strong ... antee the availability, reliability, and security of their proce- dures and IT .... ues are spanned in three qualitative values: (F)ull, (P)artial, ..... goals) and treatments (used to secure the goals) (l
Education Programs Office. The authors would also like to ... M.S. Thesis, Virginia Polytechnic Institute and State. University, Blacksburg, Virginia, 2000.
University of Cambridge Computer Laboratory. Pembroke Street ... very general classes of image (such as âsurfingâ or âbirdsâ), that do not depend on a user's ...
link similarity, and to the optimization of ranking functions in search engines. 2. SEMANTIC SIMILARITY. 2.1 Tree-Based Similarity. Lin [12] has investigated an ...