JOURNAL OF COMPUTER SCIENCE AND ENGINEERING, VOLUME 5, ISSUE 1, JANUARY 2011 15

Comparison of Similarity Metrics for Thumbnail Based Image Retrieval Vidya R. Khapli, Anjali S Bhalchandra Abstract— Similarity metrics plays important role in content-based image retrieval (CBIR). This paper compares eight image similarity measures such as Eucledean Distance, Cityblock Distance (Manhattan/L1), Canberra Distance, Jafery Divergence, Bhattacharya Distance, Chi Square Distance, Bray Curtis Distance and Kolmogorov Distances for thumbnail based image retrieval. Wangʼs database of 1000 images is used to check the retrieval performance. Features of all the database images were extracted using VQ. Experimental results on the database indicate that a retrieval performance can be improved significantly by using Kolmogorov distance metric as compared to traditional Euclidean distance based approach. Index Terms— CBIR, VQ, similarity metric

—————————— u ——————————

1 INTRODUCTION

C

ontent-based retrieval of images and video has become an active research area today. This is due to the worldwide networking that allows us to communicate, share, and learn information in the global manner. Digital library and multimedia databases are rapidly increasing. Hence there is a need for efficient search algorithms. Traditionally image retrieval has been by using keywords based searching methods. This is very time consuming and difficult for describing every color, texture, shape, and object within the image. It is well known that an image speaks thousands of words. So instead of manually annotated by text-based keywords, images would be indexed by their own visual contents, such as color, texture and shape. Hence researchers turned attention to content based retrieval methods [1]. Due to advances in data storage and rapid growth of World Wide Web along with image acquisition technologies, there is creation of huge image data sets and digital archives. Images as a norm are now being stored in compressed domain so as to cater to the constraints imposed by storage and transmission costs of managing large amount of image data. Effective and efficient image indexing and accessing tools are of supreme importance in order to fully utilize the huge digital data available on Internet. Compressed domain indexing (CDI) techniques can be broadly classified into two categories: transform domain techniques, and spatial domain techniques. The transform domain techniques are generally based on DFT (Discrete Fourier Transform), KLT (Karhunen-Loeve transform), DCT, and Subbands/Wavelets. Spatial domain techniques include vector quantization (VQ) and fractals. Search techniques can be based on many features such as colour, shape, and texture but in this paper we concen-

trate only on VQ based codebook features. The codewords in an image specific codebook are a good representation of the image content itself. Consequently two images can be compared by comparing their codebooks only. This also has the advantage that by doing so, not only information on the colour content of the images, but also spatial information (i.e. information on texture and shape) is exploited as the codewords in a codebook encompass both types of information [2,3]. Proper distance measure is to be used for comparing two codebooks. Usually Euclidean Distances between the images in the database and the query image are calculated and used for ranking. The smaller the distance more similar is the image to the query. But this metric has some limitation: Euclidean distance is not always the best metric. The fact that the distances in each dimension are squared before summation, places great emphasis on those features for which the dissimilarity is large. It is computationally complex too. Hence it was found important to explore different similarity measures, which overcomes above problems and improve retrieval accuracy. For example, using Kolmogorov distance metric instead of the Euclidean distance metric retrieval accuracy is improved from 71.21 % to 77.78 %. This clearly indicates that retrieval performance not only depends on good image features but also on good similarity measures. This served to motivate the present work to explore different similarity measures and present the comparative study. The remainder of the paper is organized as follows: Section 2 illustrates the proposed framework for Compressed Domain Image Retrieval. Different similarity metrics are briefly discussed in section 3. Section 4 explains platform used for implementation. Section 5 shows empirical findings and section 6 is conclusions.

————————————————

• V.R.Khapli is Principal, K K Wagh Womens Polytechnic, Nashik, MS, India. • A.S.Bhalchandra is with the Department of Electronics and Telecommunication, Govt.College of Engg. Aurangabad © 2011 JCSE http://sites.google.com/site/jcseuk/

16

2 PROPOSED COMPRESSED DOMAIN IMAGE RETRIEVAL

chine.

2.1 Experimental System A novel approach is followed for effective IR in compressed domain. It is suggested and also experimentally verified that if instead of full size image, if its thumbnail is used to derive feature vector then the system will be operating on reduced data and this will lead to faster retrieval [4]. Here image database consists of thumbnails of images and the query image is also converted into thumbnail and then feature extraction is carried out. Figure 1 shows schematic of image retrieval system using VQ codebooks.

3 SIMILARITY METRICS FOR CBIR

Image Database Query Image

Thumbnails

Query Thumb nail

Codebooks CI

Comparison of CI and CQ

Codebook CQ Similar images in rank order

Figure 1. Image Retrieval by comparing VQ Codebooks

2.2 Algorithm Above technique is applied on thumbnails of images in the database. The algorithm of proposed system is as given below. • Preprocessing database Ø Convert all mages into thumbnails of size 8 retaning the aspect ratio. Ø Derive feature vector from each thumbnail and construct a VQ codebook • For every Query image Ø Compress the query image Ø Get its codebook Ø Compare it with each codebook in the database Ø Rank the images of the database in order of the achieved distortion. The database included the thumbnails of images, the codebooks and links to original images. For the query step, forward compression was used where database codebooks were used to compress the query image. The method was tested with the Wangs image database consisting of 1000 JPEG images. All 1,000 images were either 256 x 384 or 384 x 256. They are reduced to thumbnails of size 8. The thumbnails were first transformed from the RGB color space to the perceptually uniform CIE  LUV  color color space. Then feature vectors are then formed from the mean and variance of each color channel for each 2 x 2 block of a thumbnail. Then the VQ codebook for a thumbnail was constructed using the standard splitting algorithm [5], with MSE as the distance function. The prototype is implemented using MATLAB 7.4 on Intel Pentium (R) M processor, 1.70 GHz, 504 MB RAM ma-

The distance metric can be termed as similarity measure. It is the key component in content-based image retrieval [8,9]. Hence different similarity measures are explored to find the best distance metric for thumbnail based content based image retrieval. An extensive study of eighteen similarity measures used for image retrieval from compressed images has been conducted. Following are the definitions. a. Euclidean Distance (L2) One of the commonest distance metrics in image retrieval literature is the Euclidean distance. It corresponds to the Minkowski-form distance for r=2, and is defined as: !1     =    

!

!" − !" !  

b. City Block Distance (or Manhattan Distance) The city block distance function is equivalent to the Minkowski-form distance for r=1. It requires fewer computations than many other distance metrics, and is defined as: !2 =  

! |!"

− !"|

c . Canberra Metric Canberra metric is very popular in CBIR applications. It has the advantage of a relatively low computational complexity and high retrieval efficiency. !3 =  

! ((|!"

− !"  |)/(!" + !"))

d. Histogram Intersection The histogram intersection metric was first proposed for color image retrieval in the spatial domain by Swain and Ballard. The intersection metric they used is not symmetric, but it can be modified in order to produce a true distance metric. !4 = 1 −     ((

! min  (!"

− !"))/    (min  (

! !" ,

! !" ))

e. Jeffrey Divergence The Jeffrey divergence metric is defined as: !5 =   Where

! [!" log

!" !" +  !" log (!" !")]

!" =     (!" + !")/2

f. Bhattacharyya Distance This distance metric it is not frequently used in image

17

retrieval applications mainly because of its complexity in computation. Bhattacharyya distance is given by:

form as Euclidean distance except that the square difference of the two distributions for every i is multiplied by a weight wi depending on the value of distribution x.

!6 = |  log  {  

!15 =  

!

|!" − !"|

! !"  }

! !"  . (!"

− !")!

where wi = xi if xi not equal to 1 , and wi =1 otherwise. g. Chi-Square The Chi-square statistic is applicable to unbinned distributions, but it can also be used for comparison between binned distributions such as histograms. Chi-square distance is one of the most popular metrics and is given by the formula:   !7 =  

! (!"

− !")!         / (xi + yi)

h. Bray Curtis Distance Bray Curtis distance is quite similar to Canberra metric. It is defined as: !8 =     (

! |xi

− yi|  )/( !(!" − !" )

4 TESTING FOR BEST SIMILARITY METRIC The prototype system is tested using eight similarity metrics one by one. The observations for some of the sample queries are as shown in table 1. Performance of the system is tested by using Average percent retrieval (accuracy) and average retrieval time as performance measure. The empirical findings are listed in table 1 and 2. The comparison of the performance of same system with different similarity metric is given in table 3 and plotted as shown in figure 2 and 3. Sample output is shown in figures 4 and 5.

5 OBSERVATIONS

i. Angular Separation Distance The angular separation metric is defined as: !9 = 1 −   (  

! !". !")  /  (  

! !"

!    

! !"

EMPIRICAL RESULTS

!  )

j. Chord Distance Chord distance measures the distance between the points where vectors cross a unit sphere. It also emphasizes qualitative aspects of data sets. It is defined as: !10 =   √(2 − 2. {  

! !"#"  /  (  

! !"

!  

TABLE 1

! !"

!  )

Q ue ry

1.

k. Matusita Distance The Matusita distance is defined as:

!12 =  

! {  

!" − !"   }! 2.

l. Kolmogorov Distance It is a simple measure. It is defined as !13 = max |  !" − !"| m. Wave - Hedges The Wave-Hedges metric is defined as: !14 =

! (  1

−   (min  (!", !"))/(max  (!", !")) 〗

n. WED Distance The WED (Weighted Euclidean Distance) has the same

3.

4.

Distance Measure

RT in sec

A

Eucledean Distance Cityblock Distance Canberra Jafery Divergence Bhattacharya Chi Square Distance Bray Curtis Distance Kolmogorov Eucledean Distance Cityblock Distance Canberra Distance Jafery Divergence Bhattacharya Chi Square Bray Curtis Kolmogorov Eucledean Distance Cityblock Distance Canberra Distance Jafery Divergence D Bhattacharya D Chi Square Distance Bray Curtis Distance Kolmogorov D Eucledean Distance Cityblock Distance

4.47 4.33 3.53 4.47 4.6 4.2 4 4 3.6 3.2 5 4.3 3.6 4.4 4.2 3.6 9 7.6 6.3 6.1 6.2 5.6 5.5 6.1 9.8 5.8

3 3 2 0 1 4 3 2 3 3 3 1 2 5 2 2 7 7 3 2 7 7 7 3 3

C

5

5

7

4

18

6.

7.

8.

9.

10.

11.

5.9 7.4 4.7 4.9 4.9 4.8 5.2 4.6 5.9 4.8 4.7 5.4 5.1 4.3 5.5 4.5 4.8 5.5 5.1 4.5 4.7 6 5.1 4.6 4.5 4.4 3.8 5.3 4.5 4.6 6.8 6.7 6.2 5.3 4.2 4.8 4.6 4.3 5.2 4.1 4.7 4.3 4.5 5.7 4 5.1 6.6 4.5 4.5 4.7 5.7 3.9 4.3 4.3 5 4.8 6.2

1 2 2 1 1 4 5 5 4 1 1 4 3 3 1 1 1 0 0 1 1 1 6 6 4 5 3 5 4 6 4 4 3 4 6 3 2 5 7 6 4 4 4 6 4 6 6 5 4 0 1 6 4 3 7 6 2

12. 5

13. 5

14. 6

Jafery Divergence Bhattacharya Chi Square Bray Curtis Kolmogorov Eucledean Distance Cityblock Distance Canberra Distance Jafery Divergence Bhattacharya Chi Square Bray Curtis Kolmogorov Eucledean Distance Cityblock Distance Canberra Distance Jafery Divergence Bhattacharya Chi Square Bray Curtis Kolmogorov Eucledean Distance Cityblock Distance Canberra Distance Jafery Divergence Bhattacharya Chi Square Bray Curtis Kolmogorov

4.5 4.2 3.9 4 3.5 5.3 4.9 6.2 4.7 5.9 4.4 4.7 4.6 5.7 5.2 4.9 5.8 5.2 6.2 5.4 5.2 4.8 5.3 4.8 4.1 3.6 4.1 3.8 4.1

4 3 3 2 7 7 6 3 4 3 3 3 7 6 6 3 5 4 5 4 6 4 4 1 3 5 2 1 4

7

6

4

RT – Retrieval Time, A – No. of relevant images retrieved, C – No. of relevant images in the database 5

TABLE 2 AVERAGE RETRIEVAL TIME REQUIRED FOR DIFFERENT TYPES OF SIMILARITY METRIC S.N. 1. 2. 3. 4. 5. 6. 7. 8.

7

6

Similarity Metric

Average Retrieval time in Seconds 5.70 5.17 5.02 4.98 4.66 4.62 4.59 4.60

Euclidean Canberra Cityblock Jeffery Divergence Chi Square Bray Curtis Bhattacharya Kolmogorov Distance

Figure 2 shows bar chart for Average Retrieval time vs various distance measures. Figure 3 shows Average percent retrieval vs. Distance measure in bar chart. Figure 4 and 5 shows a sample output using Euclidian Distance and using Kolmogorov distance for the same query image (one at the left corner)

7 6

Avg  time  taken  in  sec  Vs  Distance  Measure     5.17

5 4.59

4 ime  taken  in  sec

5.

Canberra Distance Jafery Divergence Bhattacharya Chi Square Bray Curtis Kolmogorov Eucledean Distance Cityblock Distance Canberra Distance Jafery Divergence Bhattacharya Chi Square Bray Curtis Kolmogorov Eucledean Distance Cityblock Distance Canberra Distance Jafery Divergence Bhatacharya Chi Square Distance Bray Curtis Distance Kolmogorov Eucledean Distance Cityblock Distance Canberra Distance Jafery Divergence Bhattacharya Chi Square Bray Curtis Kolmogorov Eucledean Distance Cityblock Distance Canberra Distance Jafery Divergence Bhattacharya Chi Square Bray Curtis Kolmogorov Eucledean Distance Cityblock Distance Canberra Distance Jafery Divergence Bhattacharya Chi Square Bray Curtis Kolmogorov Eucledean Distance Cityblock Distance Canberra Distance Jafery Divergence Bhattacharya Chi Square Bray Curtis Kolmogorov Eucledean Distance Cityblock Distance Canberra Distance

3

4.62

5.7

5.02 4.66

4.94 4.6

Bhattacharya  Distance Bray  Curtis  Distance Canberra  Distance Chi  Square  Distance Cityblock  Distance Eucledean  Distance Jafery  Divergence Kolmogorov  Distance

19

Canberra, Jafery divergence, Bhattacharya, Chi Square, Bray Curtis and Kolmogorov distances for image retrieval using VQ on thumbnails of images with empirical evaluation is presented. Wang’s colour image database of 1000 images is used to check the retrieval performance. It was observed that conventional distance metric i.e. Euclidian, distance is not always the best performer. In fact it has shown the maximum retrieval time, means it is the slowest of all the distance measure. With respect to retrieval time, performance of Bhattacharya Distance, Chi Square distance, Bray Curtis distance and kolmogorov distance is better than Euclidian distance and nearly same. When compared on the basis of accuracy, Kolomgorov distance outperformed other similarity metrics. Hence it is concluded that kolmogorov distance metric is the most effective distance measure with the image retrieval system based on thumbnails of images. Figure 2 Distance measure verses retrieval time TABLE 3 ACCURACY OF DIFFERENT SIMILARITY METRICS S.N 1. 2. 3. 4. 5. 6. 7. 8.

Similarity Metric Eucledean Cityblock Canberra Jeffery Divergence Bhattacharya Chi Square Bray Curtis Kolmogorov Distance

90.00

Avg. % Retrieval (Accuracy) 71.21 58.45 56.81 60.12 56.31 54.58 60.21 77.78

Average  Retrieval

80.00

70.00

Avg  %  Retrieval

60.00 Bhattacharya  Distance Bray  Curtis  Distance

50.00

Canberra  Distance Chi  Square  Distance

40.00

Cityblock  Distance Eucledean  Distance

30.00

Jafery  Divergence Kolmogorov  Distance

20.00

REFERENCES [1] Ritendra Datta, Dhiraj Joshi, Jia Li and James Z. Wang, ``Image Retrieval: Ideas, Influences, and Trends of the New Age,'' ACM Computing Surveys, vol. 40, no. 2, article 5, pp. 1-60, 2008. [2] G. Schaefer, “Compressed domain image retrieval by comparing vector quantization codebooks”, VCIP, , pp 959-966, 2002 [3] Md. M Islam, D. Zhang and G Lu, “Comparison of retrieval effectiveness of different region based image retrieval”, 6th IC on Information, Communications & Signal Processing, pp1-4, 2007 [4] A.H. Daptardar, J.A. Storer, “Reduced complexity Content Based Image Retrieval using Vector Quantization”, DCC 2006. Proceedings pp :342 – 351 2006 [5] P.Franti, T.Kaukoranta and O.Nevalainen, On the Splitting Method for VQ Generation, Optical Engineering, 36(11), 3043-3051, November 1997. [6] V.R. Khapli, A.S. Bhalchandra, “Compressed Domain Image Retrieval Using Thumbnails of Images”, IEEE CS First International Conference on Computational Intelligence, Communication Systems and Networks, CICSYN 2009, pp 392-396, 2009 [7] V.R. Khapli, A.S. Bhalchandra, “Performance Evaluation of Image Retrieval Using VQ for Compressed and Uncompressed Images”, 2nd International Conference on Emerging Trends in Engineering and Technology (ICETET), pp: 885- 888, 2009 [8] M. Hatzigiorgaki a and A. N. Skodras b,c , “Compressed Domain Image Retrieval: A Comparative Study of Similarity Metrics”, Proceedings of SPIE Vol. 5150, pp 439-448, 2003

10.00

0.00 Bhattacharya Bray  Curtis Distance Distance

Canberra Distance

Chi  Square Distance

Cityblock Distance

Eucledean Distance

Jafery Divergence

Kolmogorov Distance

Distance  Measure

6 CONCLUSIONS

[9] Kokare, M., Chatterji, B.N., Biswas, P.K ,“ Comparison of similarity metrics for texture image retrieval”, TENCON 2003. Vol.2, pp 571- 575, 2003 [10] http://wang.ist.psu.edu/

In this paper, a detailed comparison of eight different similarity metrics (SM) such as Euclidean., Cityblock,

Vidya Khapli BE Electronics and power, (1985), from College of Engg, Amravati, M.S. India and M.Tech (Electronics) from VNIT,

Figure 3 Distance measure verses Average % Retrieval

20

Nagpur, M.S. India, in 1996. She is in field of education from last 22 years. Currently she .is working as Principal, K K Wagh Womensʼ Polytechnic, Nashik, MS, India. She is a research scholar pursuing for Ph.D. at Govt. Engg. College Aurangabad, MS, India. Her areas of interest ares Image processing. Digital Signal Processing, Parallel Computing, and Basic Electronics. Her area of research is Content Based Image Retrieval. She has 18 publications to her credit published at various national and international conferences. Dr. Anjali Deshpande has received B.E. (Electronics & Telecommunication), M. E. (Electronics), and Ph. D. (Electronics) from SGGS College of engineering, Nanded, Maharashtra, India. She is in the field of Engineering Education for last 25 years. Her research interests are in the area of image and signal processing with special focus on Segmentation, CBIR techniques and Blind signal processing. She has 50 publications to her credit published at various international Journals, conferences and seminars. She has delivered several lectures on Image processing in Universities of India. Presently she is Head of the Department of Electronics and Telecommunications at Government Engineering College, Aurangabad, India.

     

Figure  4  (c)  Output  using  Kolmogorov  distance    

 

Figure  5  (a)  Output  using  Euclidian  distance    

Figure  4  (a)  Output  using  Kolmogorov  distance    

Figure  5  (b)  Output  using  Euclidian  distance    

Figure  4  (b)  Output  using  Kolmogorov  distance  

 

 

Figure  5  (c)  Output  using  Euclidian  distance  

 

Comparison of Similarity Metrics for Thumbnail Based ...

pressed domain so as to cater to the constraints imposed ... tion: Euclidean distance is not always the best metric. The ... but also on good similarity measures.

3MB Sizes 0 Downloads 242 Views

Recommend Documents

Biegel - Comparison of Similarity Metrics for ...
Mining Software Repositories. Honolulu, Hawaii. May 21, 2011. ... best candidates. SH b c a. Filtering Strategy 2 ambiguous candidates a b c a c b b a c ranking ...

A comparison of measures for visualising image similarity
evaluate the usefulness of this type of visualisation as an image browsing aid. So far ... evaluation of the different arrangements, or test them as browsing tools.

Query Expansion Based-on Similarity of Terms for ...
expansion methods and three term-dropping strategies. His results show that .... An iterative approach is used to determine the best EM distance to describe the rel- evance between .... Cross-lingual Filtering Systems Evaluation Campaign.

Query Expansion Based-on Similarity of Terms for Improving Arabic ...
same meaning of the sentence. An example that .... clude: Duplicate white spaces removal, excessive tatweel (or Arabic letter Kashida) removal, HTML tags ...

COMPARISON OF EIGENMODE BASED AND RANDOM FIELD ...
Dec 16, 2012 - assume that the failure of the beam occurs at a deformation state, which is purely elastic, and no plasticity and residual stress effects are taken into account during the simulation. For a more involved computational model that takes

A Solution for Comparison based Conversion of XML ...
XML specification leaves the interpretation of the data to the applications that read it. Due to this, each ... The development challenge, capabilities and limitations of the converter, and assumptions ... communities in World Wide Web requires.

A Comparison of Video-based and Interaction-based Affect Detectors ...
An online physics pretest (administered at the start of day 1) and posttest ... The study was conducted in a computer-enabled classroom with ..... detectors have been built to some degree of success in whole ..... Sensor-Free Affect Detection for a S

Similarity-Based Theoretical Foundation for Sparse ...
Similarity-Based Theoretical Foundation for Sparse Parzen Window. Prediction. Maria-Florina Balcan [email protected]. Avrim Blum [email protected]. Computer Science Department, Carnegie Mellon University ... doing so is by minimizing a loss (here the

SPEC Hashing: Similarity Preserving algorithm for Entropy-based ...
This paper presents a novel and fast algorithm for learning binary hash ..... the hypothesis space of decision stumps, which we'll call. H, is bounded. .... One way to optimize the search .... Conference on Computer Vision, 2003. [11] A. Torralba ...

Mutual Information Based Extrinsic Similarity for ...
studies. The use of extrinsic measures and their advantages have been previously stud- ied for various data mining problems [5,6]. Das et al. [5] proposed using extrin- sic measures on market basket data in order to derive similarity between two prod

Perceptual Similarity based Robust Low-Complexity Video ...
block means and therefore has extremely low complexity in both the ..... [10] A. Sarkar et al., “Efficient and robust detection of duplicate videos in a.

A Content-based Similarity Search for Monophonic ...
Nov 10, 2008 - by a feature vector contain statistical information about the notes and ..... [6] Burgess, C.J.C.: A tutorial on support vector machines for pattern ...

Similarity-Based Perceptual Reasoning for Perceptual ...
Dongrui Wu, Student Member, IEEE, and Jerry M. Mendel, Life Fellow, IEEE. Abstract—Perceptual reasoning (PR) is ... systems — fuzzy logic systems — because in a fuzzy logic system the output is almost always a ...... in information/intelligent

Frequency And Ordering Based Similarity Measure For ...
the first keeps the signatures for known attacks in the database and compares .... P= open close ioctl mmap pipe pipe access access login chmod. CS(P, P1) ... Let S (say, Card(S) = m) be a set of system calls made by all the processes.

A Proposal for Linguistic Similarity Datasets Based on ...
gory oriented similarity studies is that “stimuli can only be ... whether there is a similarity relation between two words, the ... for numerical similarity judgements, but instead to ask them to list commonalities and differences be- tween the obj

Perceptual Similarity based Robust Low-Complexity Video ...
measure which can be efficiently computed in a video fingerprinting technique, and is ... where the two terms correspond to a mean factor and a variance fac- tor.

Visual-Similarity-Based Phishing Detection
[email protected] ... republish, to post on servers or to redistribute to lists, requires prior specific .... quiring the user to actively verify the server identity. There.

Thumbnail-Strybrd-8X.pdf
There was a problem previewing this document. Retrying... Download. Connect more apps... Try one of the apps below to open or edit this item.

Similarity-based semilocal estimation of post ...
This requires choosing an appropriate. ▷ parametric model Fθ,g (Sándor Baran's talk). ▷ loss function for parameter estimation (Bernhard Klar's talk).