Local Patterns Constrained Image Histograms for ...

Viewer
Transcript

LOCAL PATTERNS CONSTRAINED IMAGE HISTOGRAMS FOR IMAGE RETRIEVAL Zhi-Gang Fan, Jilin Li, Bo Wu, and Yadong Wu Advanced R&D Center of Sharp Electronics (Shanghai) Corporation 1387 Zhangdong Road, Pudong, Shanghai 201203, China [email protected] ABSTRACT In this paper, we present local patterns constrained image histograms (LPCIH) for efﬁcient image retrieval. Extracting information through combining local texture patterns with global image histogram, LPCIH is an effective image feature representation method with a ﬂexible image segmentation process. This kind of feature representation is robust and invariant for several image transforms, such as rotation, scaling and damaging. LPCIH method is efﬁcient for several difﬁcult image retrieval tasks, such as rotated and damaged gray image retrieval. Because many traditional image retrieval methods are not suitable for the difﬁcult gray image retrieval tasks, LPCIH is valuable for many real-world applications of image retrieval. Experimental results show that the LPCIH method is consistently efﬁcient, effective and it offers advantages over the state-of-the-art image retrieval methods. Index Terms— Image Retrieval, Pattern Matching, Image Texture Analysis, Feature Extraction, LPCIH 1. INTRODUCTION Content-Based Image Retrieval has been an active research ﬁeld in recent years. The interest in this research ﬁeld has spurred from the need to efﬁciently manage and search large volumes of multimedia information [1], mostly due to the exponential growth of the World-Wide-Web (WWW). The very large amount of images available at the WWW can form the broadest collection of images. The management of such large size image collection is very difﬁcult. So the research on efﬁcient retrieval method is very necessary in this information exploding age. Image retrieval is performed based on feature representations of the image that are extracted during the image analysis phase . So image feature descriptors are very important for image retrieval. The most common categories of descriptors are based on color, texture and shape. An efﬁcient image retrieval system must be based on efﬁcient image feature descriptors. Image retrieval methods may also depend on the properties of the images being analyzed. These methods are usually distinct for different image domains, and gradually

978-1-4244-1764-3/08/$25.00 ©2008 IEEE

941

change when the focus moves from a narrow to a broad image domain. A narrow image domain has a limited and predictable variability in all relevant aspects of its appearance. For example, face image retrieval has its speciﬁed algorithm, such as boosting algorithm, which is difﬁcult to be used for general object retrieval. A broad image domain has an unlimited and unpredictable variability of the content of images. Therefore, image analysis and feature representations are very difﬁcult in the broad image domain. In this paper, our focus is on the image retrieval methods suitable for broad image domains. Low-level visual features such as color and texture are especially useful for image retrieval. The popular color features include global color histograms [2], color correlograms [3], color moments [4] and MPEG-7 color descriptors [5]. The MPEG-7 texture descriptors [5] are based on Gabor ﬁltering. The edge histogram [6], which is a block-based descriptor included in the MPEG-7 standard, and local binary patterns [7] have been used as texture descriptors. There are some other usable features like MR-SAR [8], Wold features [9] and the famous Tamura approach [10]. CVPIC [11] and Border-Interior classiﬁcation [12] are retrieval methods based on local textures. With SIFT and local descriptors [13], retrieval systems [14] in Google and global probabilistic models [15] have been developed. In several difﬁcult situations, such as rotated and damaged gray image retrieval, these above mentioned traditional features (except SIFT) are not effective because they are not rotation-invariant. The SIFT-based image retrieval systems [14][15] are rotation-invariant but they are not efﬁcient and too slow to be computed in real-time computing in some speciﬁc environments such as personal computers. In order to solve these problems, we propose local patterns constrained image histograms (LPCIH). LPCIH combines the local texture patterns with global image histogram to form a powerful image feature representation. The texture patterns have been locally encoded and global image histogram has been divided into several parts according to the local patterns in LPCIH. Therefore, both local and global information are considered by LPCIH. This kind of feature representation is robust and invariant for several image transforms, such as rotation, scaling and damaging. LPCIH is a compact and efﬁcient image retrieval method suitable for broad image domains.

ICIP 2008

2. LOCAL PATTERNS CONSTRAINED IMAGE HISTOGRAMS

and the same label of pattern code in our texture operator.

Local patterns constrained image histograms (LPCIH) can be obtained through three processing steps: (1) Image quantization; (2) Local pattern analysis; (3) Histogram construction. At the ﬁrst image quantization step, color images are converted into gray images at ﬁrst. Then, the classical Octree quantization algorithm is used to quantize the image gray intensities to 32 scales. The node index at the top level uses the most signiﬁcant bits of the gray components. The next lower level uses the next bit signiﬁcance, and so on. If much more than the desired number of gray intensities are entered into the Octree, its size can be continually reduced by seeking out a bottom-level node and averaging its bit data up into a leaf node. Once sampling is complete, exploring all routes in the tree down to the leaf nodes, taking note of the bits along the way, will yield the required number of gray intensities. Secondly, after the quantization step, we analyze the local patterns of the image pixels through a texture operator similar to local binary patterns [7]. Our texture operator is an extension of local binary patterns because we have modiﬁed the basic form of the local binary patterns by deﬁning a additional ﬂat pattern. As illustrated in Figure 1, the operator assigns a label (pattern code) to every pixel of an image by comparing the eight points of the 5×5-neighborhood of each pixel with the center pixel value. 3×3-neighborhood is often used because of its computational efﬁciency. If the eight

Fig. 1. The local pattern operator uses the information of 5×5-neighborhood of each image pixel. neighbors are all equal to the center pixel value, we deﬁne that such pattern is a ﬂat pattern. Beside the ﬂat pattern, we have additionally used the patterns of LBPu2 8,2 [7]. If a pattern is not a ﬂat pattern, it certainly is a pattern of LBPu2 8,2 . LBPu2 8,2 deﬁnes the local neighborhood as a set of sampling points evenly spaced on a circle centered at the pixel to be labelled. Bilinear interpolation is used when a sampling point does not fall in the center of a image pixel. The patterns of LBPu2 8,2 are produced by thresholding the 5×5-neighborhood of each pixel with the center value and considering the result as 8 binary numbers (0 and 1). The ﬁnal pattern code P is produced by multiplying the 8 binary numbers Bi by weights given by powers of two and adding the results in circular way as shown in the equation (1). As a result, every image pixel is assigned a label P of local pattern code. In LBPu2 8,2 , a local pattern is called uniform if the pattern contains at most two bitwise (the bits are the 8 binary numbers) transitions from 0 to 1 or vice versa when its 8 binary numbers are ordered in circular way. All non-uniform patterns are labelled with one

942

P =

7

Bi · 2i ,

Bi ∈ {0, 1}

(1)

i=0

For the real-world applications, we try to make the local patterns rotation-invariant through merging them. This merging process is operated by a circular shift operation. For LBPu2 8,2 , the 8 binary numbers of the thresholded neighborhood can be mapped into a 8-bit word in clockwise or counterclockwise order around the center pixel. On this 8-bit word, do circular shift which is a shift operation around the center pixel in circular style and make this 8-bit word meet the following condition: the longest 0 sub-sequence lies in the left end and the right end is 1 sub-sequence. As a result, some different local patterns are merged into the same pattern. After this merging process, the local patterns are made rotationinvariant because the edge orientation information have been removed other than the texture information. There are 11 rotation-invariant local patterns in total: one ﬂat pattern, 9 uniform patterns and one merged non-uniform pattern. Each of these rotation-invariant local patterns has a label (pattern code) to be distinguished from each other. If an image is processed by this local pattern operator, every pixel in this image can be assigned a label of pattern code. Our texture operator produces the local patterns which are invariant to any monotonic transformation of the gray scale and these patterns are quick to compute. This texture operator is a key component of LPCIH retrieval algorithm. At last, the histogram construction step is ﬁnished through a ﬂexible image segmentation process and a combination process of gray scale histograms. In this step, we separate the input image into 11 sub-images so that the pixels of each subimage have the same label of pattern code. This is a ﬂexible image automatic segmentation process. Then, we construct gray scale histogram for each sub-image. So there are 11 gray scale histograms in total produced for the 11 sub-images respectively. The resulting 11 gray scale histograms are simply combined through linking these 11 histograms end to end in any pre-deﬁned sequence order. As a result, a single large histogram is produced. This ﬁnal histogram is called LPCIH and it’s the ﬁnal feature vector extracted for the input images. This feature representation of LPCIH method is different from the spatially enhanced histogram (SEH) [7] which is used to encode the appearance of human face images. The SEH directly extracts local texture information of input images and it’s not rotation-invariant. But LPCIH method combines local texture information with global image histogram through fusing multiple sub-image histograms which are produced according to local texture patterns of image pixels. LPCIH is rotationinvariant and suitable for image retrieval because it has ﬂexible image segmentation process which is more robust comparing with the ﬁxed grid processing of SEH. Because of its simplicity, LPCIH is computational efﬁcient comparing with the SIFT-based methods [14][15] which are too complicated.

1 LPCIH BIC CVPIC GIH

0.7

0.7

0.6

0.6

0.5

0.8

0.5

0.4

0.4

0.3

1 L1 LPCIH L1 BIC L1 CVPIC L1 GIH

0.9

Precision

0.8

Theta

Precision

0.8

1 LPCIH BIC CVPIC GIH

0.9

0.8

0.7

0.7

0.6

0.6

0.5

0.5

0.4

0.4

0.3

0.3

0.3

0.2

0.2

0.2

0.2

0.1

0.1

0.1

0 0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

0 0

1

0.1

0.2

0.3

0.4

Recall

0.5

0.6

0.7

0.8

0.9

1

Recall

0 0

L1 LPCIH L1 BIC L1 CVPIC L1 GIH

0.9

Theta

1 0.9

0.1 0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

0 0

0.1

0.2

0.3

Recall

0.4

0.5

0.6

0.7

0.8

0.9

1

Recall

Fig. 2. The Precision vs. Recall and θ vs. Recall curves use the SCD or the L1 distance. 3. LPCIH IMAGE RETRIEVAL After LPCIH feature extraction, dissimilarities between input image and registered images have to be computed for image retrieval task. There are many possible dissimilarity measures for image matching and retrieval. Chi square statistic (χ2 ) can be used as image histogram dissimilarity measure because of its good performance. Image histogram distance using Chi square statistic is illustrated in equation (2). χ2 (S, M ) =

(Si − Mi )2 i

Si + M i

(2)

S and M represent the feature histograms to be compared and subscript i is the corresponding bin. This image histogram distance of Chi square statistic can be simpliﬁed into the following form: D(S, M ) =

|Si − Mi | i

Si + M i

(3)

This simpliﬁcation preserves the original ranking order of χ2 results and reduces the computation burden at the same time. We use D(S, M ) (called SCD) as our image histogram distance metric for our image retrieval method because it is more computationally efﬁcient and has high accuracy comparing with other image histogram distances, such as L1 distance, log-likelihood distance and histogram intersection.

In our experiments, we adopted 9 different measures of retrieval effectiveness. We used two graphical measures (Precision vs. Recall and θ vs. Recall), and 7 single value measures (p(10), r(10), p(20), r(20), p(30), r(30), and 11P P recision). θ vs. Recall curve is a variation of the Precision vs. Recall curve. The θ is deﬁned as the average of the precision values measured whenever a relevant image is retrieved. The main difference between θ and precision is that the θ value is accumulative whose computation considers not only the precision at a speciﬁc recall level but also the precision at previous recall levels. This accumulative computation is more consistent with the ranking imposed by image retrieval methods. The two measures, p(10) and r(10), correspond to the precision and the recall after 10 images are retrieved. The measures p(20), r(20), p(30), and r(30) have the similar meanings as mentioned above. The 11P -P recision is computed by averaging the precisions taking at eleven predeﬁned recall levels: 0%, 10%, . . . , 90%, 100%. In Figure 2, we compare LPCIH method with the three existing image retrieval methods through Precision vs. Recall and θ vs. Recall curves using SCD and L1 distances. For example, in Figure 2, “GIH” is the GIH method using SCD distance and “L1 GIH” is the GIH method using L1 distance. In Table 1, we evaluate the effectiveness of LPCIH method through the 7 single value measures (p(10), r(10), p(20), r(20), p(30), r(30), and 11P -P recision). According to the results shown in Figure 2 and Table 1, we can see that LPCIH method has outperformed the other three methods.

4. EXPERIMENTS In our experiments, we adopted the query-by-example as the way to submit queries in retrieval system and 3×3neighborhood was used. In order to evaluate the effectiveness of the proposed LPCIH method, we compared it with other three methods: global image histogram (GIH), BorderInterior classiﬁcation (BIC) [12], and Block Edge Histograms of CVPIC [11] (called CVPIC in our experiments). For image distance measures, SCD distance was compared with L1 distance in our experiments. We used the UCID database [16]. There are 1338 images (gray scaled in our experiments) in total together with a ground truth in the UCID database.

943

Fig. 3. An example result of LPCIH retrieval is shown in the experiment of rotated and damaged gray image retrieval. In Figure 3, an example result of LPCIH retrieval is shown in our additional experiment of rotated and damaged gray image retrieval. It can be seen that LPCIH method is effective

Table 1. Single-value effectiveness results are made for comparing image retrieval methods. Methods p(10) r(10) p(20) r(20) p(30) r(30) 11P-Precision LPCIH 0.147 0.529 0.105 0.462 0.079 0.439 0.215 0.191 L1 LPCIH 0.144 0.456 0.089 0.423 0.071 0.393 BIC 0.139 0.430 0.092 0.388 0.070 0.362 0.176 0.102 0.355 0.067 0.332 0.051 0.318 0.122 L1 BIC CVPIC 0.065 0.178 0.053 0.166 0.044 0.159 0.064 0.064 L1 CVPIC 0.047 0.121 0.044 0.124 0.042 0.116 GIH 0.086 0.295 0.061 0.243 0.049 0.237 0.100 0.082 0.302 0.052 0.295 0.045 0.254 0.096 L1 GIH

for the task of rotated and damaged gray image retrieval.

[7] Timo Ahonen, Abdenour Hadid, and Matti Pietikainen, “Face description with local binary patterns: Application to face recognition,” IEEE Transactions on PAMI, vol. 28, pp. 2037–2041, 2006.

5. CONCLUSIONS This paper presented LPCIH method which is an efﬁcient retrieval method for broad image domains. Through combining local texture patterns with global image histogram, LPCIH has ﬂexible image segmentation process and effective feature representation. In our experiments, LPCIH method is consistently more efﬁcient and more effective than the other three existing image retrieval methods. Especially, LPCIH is efﬁcient for the rotated and damaged gray image retrieval and it can be applied into the real-time online applications which are impossible for SIFT-based image retrieval systems [14][15].

[8] J. Mao and A. Jain, “Texture classiﬁcation and segmentation using multiresolution simultaneous autoregressive models,” Pattern Recognition, vol. 25, pp. 173– 188, 1992. [9] J.M. Francos, A.Z. Meiri, and B. Porat, “On a woldlike decomposition of 2-d discrete random ﬁelds,” in Proceedings of ICASSP, 1990, pp. 2695–2698.

6. REFERENCES

[10] H. Tamura, S. Mori, and T. Yamawaki, “Textural features corresponding to visual perception,” IEEE Transactions on Systems, Man, and Cybernetics, vol. 8, pp. 460–473, 1978.

[1] M.S. Lew, N. Sebe, C. Djeraba, and R. Jain, “Contentbased multimedia information retrieval: State of the art and challenges,” ACM Transactions on Multimedia Computing, vol. 2, pp. 1–19, 2006.

[11] G. Schaefer, S. Lieutaud, and G. Qiu, “Cvpic image retrieval based on block colour co-occurance matrix and pattern histogram,” in Proceedings of ICIP, 2004, pp. 413–416.

[2] M. Swain and D. Ballard, “Color indexing,” in Proceedings of ICCV. IEEE, 1990, pp. 11–32.

[12] R.O. Stehling, M.A. Nascimento, and A.X. Falcao, “A compact and efﬁcient image retrieval approach based on border/interior pixel classiﬁcation,” in Proceedings of CIKM, 2002, pp. 102–109.

[3] J. Huang, S.R. Kumar, M. Mitra, W.-J. Zhu, and R. Zabih, “Image indexing using color correlograms,” in Proceedings of IEEE CVPR, 1997, pp. 762–768. [4] M. Stricker and M. Orengo, “Similarity of color images,” in Proceedings of SPIE Conference on Storage and Retrieval for Image and Video Databases, 1995, pp. 381–392. [5] B.S. Manjunath, J.-R. Ohm, V. Vasudevan, and A. Yamada, “Color and texture descriptors,” IEEE Transactions on Circuits and Systems for Video Technology, vol. 11, pp. 703–715, 2001. [6] S.J. Park, D.K. Park, and C.S. Won, “Core experiments on mpeg-7 edge histogram descriptor,” ISO/IEC JTC1/SC29/WG11-MPEG2000/M5984, 2000.

944

[13] Krystian Mikolajczyk and Cordelia Schmid, “A performance evaluation of local descriptors,” IEEE Transactions on PAMI, vol. 27, no. 10, pp. 1615–1630, 2005. [14] J. Yushi and B. Shumeet, “Pagerank for product image search,” in Proceedings of WWW-2008, 2008, pp. 307– 315. [15] R. Fergus, L. Fei-Fei, P. Perona, and A. Zisserman, “Learning object categories from google’s image search,” in Proceedings of ICCV, 2005. [16] G. Schaefer and M. Stich, “Ucid - an uncompressed colour image database,” in Proceedings of SPIE Storage and Retrieval Methods and Applications for Multimedia, 2004, pp. 472–480.

Multichannel Decoded Local Binary Patterns for Content Based Image ...