Comparison of Similarity Metrics for Refactoring Detection Benjamin Biegel, Quinten David Soetens, Willi Hornig, Stephan Diehl, and Serge Demeyer

Mining Software Repositories Honolulu, Hawaii. May 21, 2011.

Universität Trier

Introduction • Why is it important to detect refactorings? • Replication experiment – Existing refactoring detection approach “Identifying Refactorings from Source-Code Changes” P. Weißgerber et al. (best paper, ASE’06)

– But replaced the similarity metric (clone detector) – Non-identical replication (invited talk MSR’10)

Main Research Questions I. How significant is the influence of each similarity metric? II. What is the impact of these metrics on the detected refactorings?

EXPERIMENTAL SETUP

original

replication

signature-based analysis

filtering

ccfinder

SH

SH shingles

jccd

SH shingles

overlap ccfinder

jccd

Evaluated Projects • 4 of 7 projects of the original experiment – – – –

Azureus, 10664 transactions Jfreechart, 2412 transactions JFtp, 209 transactions Tomcat 3, 4158 transaction

FILTERING STRATEGIES AND RESULTS

Filtering Strategy 1 weak and strong weak

strong

filtering

filtering

results

results

Filtering Strategy 1 weak and strong

Overlap weak

strong

2.1% - 22.8%

51.5% - 81.2%

• the strong filter is very selective  much smaller influence of the clone detectors

Filtering Strategy 2 ambiguous candidates

c a

b SH

best candidates

a b c

a c b ranking

b a c

Filtering Strategy 2 ambiguous candidates Overlap

weak

Overlap

2.1% - 22.8%

51.6% - 59.7% • still a large common basis • but different rankings

Overlap small artifacts

• less differences by ignoring smaller artifacts

62.6% - 82.3%

Conclusions I.

How significant is the influence of each similarity metric?

II.

What is the impact of these metrics on the detected refactorings?

• Specific characteristics of the clone detectors are mostly not represented in the detected refactoring candidates • The result sets have a comparable quality Overlap has a pretty high recall

LESSONS LEARNED

Our Experience with the Replication • Original raw data, processed data, scripts, tool, and documentation was available • But still a steep learning curve • Expert support important Contact the original author first

Mahalo!

Biegel - Comparison of Similarity Metrics for ...

Mining Software Repositories. Honolulu, Hawaii. May 21, 2011. ... best candidates. SH b c a. Filtering Strategy 2 ambiguous candidates a b c a c b b a c ranking ...

447KB Sizes 0 Downloads 157 Views

Recommend Documents

Comparison of Similarity Metrics for Thumbnail Based ...
pressed domain so as to cater to the constraints imposed ... tion: Euclidean distance is not always the best metric. The ... but also on good similarity measures.

A comparison of measures for visualising image similarity
evaluate the usefulness of this type of visualisation as an image browsing aid. So far ... evaluation of the different arrangements, or test them as browsing tools.

Performance Comparison of Optimization Algorithms for Clustering ...
Performance Comparison of Optimization Algorithms for Clustering in Wireless Sensor Networks 2.pdf. Performance Comparison of Optimization Algorithms for ...Missing:

Comparison of Square Comparison of Square-Pixel and ... - IJRIT
Square pixels became the norm because there needed to be an industry standard to avoid compatibility issues over .... Euclidean Spaces'. Information and ...

comparison
I She's os tall as her brother. Is it as good as you expected? ...... 9 The ticket wasn't as expensive as I expected. .. .................... ............ . .. 10 This shirt'S not so ...

comparison
1 'My computer keeps crashing,' 'Get a ......... ' . ..... BORN: WHEN? WHERE? 27.7.84 Leeds. 31.3.84 Leeds. SALARY. £26,000 ...... 6 this job I bad I my last one.

Comparison of Proper of Time for Lent.pdf
Comparison of Proper of Time for Lent.pdf. Comparison of Proper of Time for Lent.pdf. Open. Extract. Open with. Sign In. Main menu. Displaying Comparison of ...

Query Expansion Based-on Similarity of Terms for ...
expansion methods and three term-dropping strategies. His results show that .... An iterative approach is used to determine the best EM distance to describe the rel- evance between .... Cross-lingual Filtering Systems Evaluation Campaign.

Query Expansion Based-on Similarity of Terms for Improving Arabic ...
same meaning of the sentence. An example that .... clude: Duplicate white spaces removal, excessive tatweel (or Arabic letter Kashida) removal, HTML tags ...

Modeling Perceptual Similarity of Audio Signals for ...
Northwestern University, Evanston, IL, USA 60201, USA pardo@northwestern. .... The right panel of Figure 1 shows the standard deviation of participant sim- ... are only loosely correlated to human similarity assessments in our dataset. One.

A Recipe for Concept Similarity
knowledge. It seems to be a simple fact that Kristin and I disagree over when .... vocal critic of notions of concept similarity, it seems only fair to give his theory an.

Best-Buddies Similarity for Robust Template ... - People.csail.mit.edu
1 MIT CSAIL. 2 Tel Aviv University ... ponent in a variety of computer vision applications such as ...... dation grant 1556/10, National Science Foundation Robust ... using accelerated proximal gradient approach. ... Online object tracking: A.

comparison of techniques
Zircon. Zr [SiO4]. 1 to >10,000. < 2 most. Titanite. CaTi[SiO3](O,OH,F). 4 to 500. 5 to 40 k,c,a,m,ig,mp, gp,hv, gn,sk. Monazite. (Ce,La,Th)PO4. 282 to >50,000. < 2 mp,sg, hv,gp. Xenotime. YPO4. 5,000 to 29,000. < 5 gp,sg. Thorite. Th[SiO4]. > 50,000

Risk as Dependability Metrics for the Evaluation of ...
oriented) chosen by the client. A strong ... antee the availability, reliability, and security of their proce- dures and IT .... ues are spanned in three qualitative values: (F)ull, (P)artial, ..... goals) and treatments (used to secure the goals) (l

Comparison of Results
Education Programs Office. The authors would also like to ... M.S. Thesis, Virginia Polytechnic Institute and State. University, Blacksburg, Virginia, 2000.

Evaluating a Visualisation of Image Similarity - rodden.org
University of Cambridge Computer Laboratory. Pembroke Street ... very general classes of image (such as “surfing” or “birds”), that do not depend on a user's ...

Algorithmic Detection of Semantic Similarity
link similarity, and to the optimization of ranking functions in search engines. 2. SEMANTIC SIMILARITY. 2.1 Tree-Based Similarity. Lin [12] has investigated an ...