Year

Viewer
Transcript

ARTRIEVAL: PAINTING RETRIEVAL WITHOUT EXPERT KNOWLEDGE Namil Kim, Yukyung Choi, Soonmin Hwang and In So Kweon Korea Advanced Institute of Science and Technology Daejeon, Republic of Korea ABSTRACT As people are becoming interested in paintings, various userinteractive search systems have been presented in recent times. Many systems encourage users to search paintings by prior knowledge on paintings. We discover the limitation for existing methods on how well the query is represented by the user, and propose a simple, yet effective way to search the painting by exploiting the color to express human visual memory. To achieve our goal, we suggest color clustering based on human color perception, and hierarchical metric learning to accommodate the locality of colors. With userinteractive drawing through learned colors, the user completes the abstract image to resemble the visual memory. We show that our system is easy to use, fast to process, accurate to search and fully extensible to cover deviation among users. Index Terms— painting retrieval, query-by-memory, color clustering, hierarchical metric learning 1. INTRODUCTION As popular art is experiencing a renaissance, the interest on paintings is growing more and more in the public. To keep up with the growing demand, the museum has been transformed into digital heritage. Google art-project has become a big success through which the public can access high-resolution artworks on the web. World-class museums have provided various searching systems for the user to get information regardless of time and place. But, most existing systems are essentially unorganized and rely on expert knowledge tags such as author, era and painting style. Therefore, these systems are not suitable to search directly what users think. To overcome this limitation, there are two major streams of image searching methods: query-by-example and queryby-memory. Query-by-example typically uses a similar image as a query and finds the relevant images, but it is not easy to obtain a similar image of the painting even without expert knowledge. Recently, a sketch-based retrieval system has been introduced as query by memory. Mindfinder [1] enabled users to sketch major curves of the target image and found similar images according to the users’ intentions. Furthermore, Xinghai Sun [2] proposed the advanced sketch retrieval, which draws the sketch with color information. However, the memory of shape shows more distortion than that

Artist Vincent van Gogh Year 1889 Catalogue F612; JH1731 Type Oil on canvas Location New York City

Searching Results Searching Results

Year ?

Artist ? Style ? Painting Database

Abstract input image for searching

Fig. 1. The Artrieval is the painting retrival system for user not expert. Our system helps the user to search a painting from a vague memory in user’s mind. After user interaction, similar paintings from a database are returned in real time.

of colors. As time goes by, people can remember the main colors of the visual memory [3], but can rarely express the contour, because higher frequency of shape (detail) is easily and naturally lost [4]. Therefore, we hope that color can be used as a bridge between memory and paintings. We focus on the task of the color based painting retrieval system, with human visual memory. We define the abstract image, drawn by the visual memory. To guide the abstract image, we present unsupervised color clustering by adopting color distance of human perception to select representative color. Then, we suggest a new hierarchical metric learning method considering the locality of colors. Through the hierarchical metric, we can relieve the color ambiguities, which are not remarkably separated and named by color perception. As another contribution, we introduce the neighboring region searching strategy dealing with the aspect ratio distortion, which is the difference to what the user draws and what the user remembers. In this paper, we propose a novel painting retrieval system, namely Artrieval, based on the perceived color and visual memory. Our system poses searching in terms of user interactive drawing of abstract images with representative colors filling primitive shapes. With these simple visual evidences, we show that our system shows a good performance in painting retrievals.

(1) patch extraction

(2) clustering

(3) metric learning

(1) memory (2) drawing (3) encoding (4) transformation (5) ranking

d = ( x - y )T M ( x - y )

Searching Results

Interactive Searching

M*

(a)

(b)

Fig. 2. Overview of Artrieval. (a) Learning process; (b) Retrieval process; 2. SYSTEM OVERVIEW Our retrieval system is made up of five processes: (1) Memory retrieval; (2) Drawing; (3) Encoding; (4) Transformation; and (5) Ranking. Retrieval ranking is consistently updated by the user interaction shown in Fig. 2 (b). Our system employs the abstract image as a query which is extracted from visual memory of the user through the memory retrieval process, and then the image is sketched by the user with drawing tools. The query is encoded in the augmented form with RGB and La*b* color spaces, and encoding data is transformed with a hierarchical learning metric. Finally, the system displays topranked images according to the similarity between encoding vectors and databases. Especially, with paintings, our system performs a learning process to get transformation matrices used in the transformation step of retrieval process. Fig. 2 (a) shows that our learning procedure consists of three steps: (1) patch extraction; (2) clustering; and (3) metric learning. To extract color patches, we oversegment paintings in homogeneous regions using a superpixel method [5]. The color patches are used to select representative colors by a new clustering algorithm. After that, we make more discriminative color patches through the proposed hierarchical metric learning method. The proposed algorithms in our system are described in detail in section 3. 3. OUR APPROACH 3.1. Unsupervised Color Clustering Algorithm We need to decide representative colors and label color patches for the metric learning in section 3.2. In general, the label of color patches is decided by color clustering, and modes of each cluster indicate representative colors. Unsupervised color clustering algorithm is greatly influenced by initial centers. In case of kmeans families [6, 7], a representative color can be assigned by similar colors. Recently, supervised color clustering algorithms [8, 9, 10] have been introduced. However, they require manually labeled colors as pre-assignment.

To overcome these limitations, we propose a specific way of choosing initial centers. The proposed method employs the National Bureau of Standards (NBS) distance [11]. When we select initial centers, we choose a completely different color with consideration of the human color perception. The relationship between NBS distance and human color perception is shown in Fig. 3 (a). From the values, we know that if the NBS color distance is over 12.0, humans will regard those colors as clearly different colors. The accuracy of color memorization degrades through perception and visual memory stage in turn. Authors in [3] measured how accurate humans memorize colors. Observers were shown real-world objects in random colors and were asked to recall the colors after a delay. According to this work, in the perception stage, the fidelity is extremely precise (±6◦ in hue angle), but, in the visual memory stage, the fidelity becomes significantly lower (±20◦ ). Therefore, we define the threshold of NBS distance as the α = 13, which experimentally reflects the color distortion in visual memory stage. The proposed NBS-kmeans algorithm is shown in Algorithm 1. Algorithm 1 NBS-Kmeans algorithm 1: Choose an initial color c1 , which is sampled randomly from

color-patches, X. 2: Take a new color ci , if Dnbs (xp , xq ) is more than α = 13.

Dnbs (xp , xq ) denotes the N BS distance from a color point xp to the closest existing color xq ={c1 , ..., ci−1 }. The NBS distance Dnbs (xp , xq ) between two colors is calculated as follows: kxp − xq knbs = 1.2 ∗

q 2Sp Sq (1 − cos( 2π 4 H)) + (4S)2 + (4V )2 K xi = (Hi , Si , Vi )

3: Repeat above steps, until we have taken k centers. 4: Proceed with the standard kmeans algorithm.

For testing the color clustering effect according to initial clusters, we compare three methods such as kmeans [6], kmeans++ [7] and ours. As shown in Fig. 3 (b), the proposed method shows that the result relatively improves as the diversity of color increases, compared with any other method. Therefore, these colors are suitable to be representative colors

NBS Value

Human Perception

0.0 – 1.5

Almost the same

1.5 – 3.0

Slightly different

Proposed

Algorithm 2 Hierarchical LMNN

Input: color patches X with hierarchical lables X = {x1 , ..., xn }, xi = [λL, a*, b*,R, G, B]T , λ = 0.1 3.0 – 6.0 Remarkably different Output: global metric Mg , local metric {Ml1 , ..., Mlm }, Kmeans 6.0 – 12.0 Very different m is the number of clusters in the secondary colors. 12.0 – Different color 1: Compute the global metric Mg with primary colors (8 colors). (a) (b) To learn the following transformed distance, LMNN method is Fig. 3. (a) NBS distance table with the human perception [11]; (b) used. Example of color clustering according to initial centers; D(xi , xj ) = (xi − xj )T MgT Mg (xi − xj ) 2: Transform the primary color space X into new color space X 0 [ Error : % ] with a[ Error : % ] global metric. Train X 0 Test = Mg X Train Test a set of local metrics Ml with secondary colors (green, Raw 5.93 Raw3: Estimate 5.93 5.42 5.42 blue and2.78 yellow). Ml = {Ml1 , Ml2 , Ml3 }. LMNN LMNN 2.75 2.75 2.78 T D(x0i , x0j ) = (x0i − x0j )T Mlk Mlk (x0i − x0j ) ours 1.29 ours4: Transform 1.29 1.39 1.39 the secondary color space X 0 into new color space 00 X with each local metric. (a) (b)(b) (c) (a) (a) (b) (c) X 00 = Mlk (Mg X) k 5: Repeat above [3-4] steps, until k is from 1 to m. Fig. 4. (a) Representative colors; (b) Limitation of global metric learning; Kmeans++

and can be used for a hierarchical metric learning. 3.2. Hierarchical Metric Learning The Large Margin Nearest Neighbor (LMNN) [12] is widely used for global metric learning to optimize the discriminative function, which decreases when the distance of similar colors is made small while increasing those of dissimilar colors by a large margin. Generally, color metric learnings rely on a single color space such as La*b*, Luv and RGB. Each color space has a specific organization of colors in respect to the main purpose. Using a single color space may be too restrictive when (a) representing a variety of color characteristics since it is lowdimension. To overcome these limitations, we attempt to allow aggregation of color spaces into “La*b*+RGB”. The primary reason lies in that the aggregation of color spaces with low correlation leads to complementary information. In these perspectives, we employ the augmented color feature to maximize each characteristics and enhance the discriminative power of colors. In addition, let us consider Fig. 4 (a). These color circles are representative colors obtained by the color clustering. The representative colors consist of the primary colors (C1,· · · ,C8) and secondary colors (C5-1,· · · ,C7-3). Secondary colors are more split than primary colors, for example, C5-1 and C5-2 belong to the same primary color. In Fig. 4 (b), there are three representative colors such as yellow, blue and green with different local structure respectively. The local structure is affected by the color distribution of the painting database. In this case, the global metric has strong limitations, because it makes use of a single linear metric to compare data over all the input data, which is inappropriate to handle local structure of each color. Therefore, we pro-

posed the Hierarchical-LMNN (HLMNN), which interatively learns among same branch colors as a tree structure. The proposed HLMNN algorithm is shown in Algorithm 2. From this method, we can get more discriminative color features for painting retrieval. This algorithm can be extended in more hierarchical colors. For testing the performance of proposed algorithm, we evaluate the error of the color classification as follows: [ Error : % ] Raw

Train

Test

5.93

5.42

LMNN

2.75

2.78

ours

1.29

1.39

The number of color-patches(c)is about 300,000. Half of the (b) patches is for training, and the remainder is for testing. This experiment shows that the proposed algorithm is more discriminative in comparison with LMNN (global metric) and raw data (before metric learning). 3.3. Correction of spatial distortion In this section, we explain on how to correct the aspect ratio distortion between the user’s drawing and the abstract image, and how to measure the similarity. To align the primitive shape drawn by the user, we examine the surrounding regions of the query like Fig. 5 (a). Gray box is a target region and brown boxes (P1∼P9) are searching regions for detecting. These searching regions are generated proportional to the both major and minor axes of primitive shapes. Comparing with all regions, the query is moved to one of the regions with the highest similarity. Fig. 5 (b) shows results of detecting the optimal candidate region. The blue shape is the target region and the red shape is the optimal region with highest similarity. The primitive shape moves to the region consisting of similar colors of the query.

P4

P3

P2

P5

P1

P9

Primitive Shape

Drawing Canvas

P6

P7

P8

P8

(a)

Primitive Color

Average Image

Searching Results

Fig. 6. The user interface of Artrieval system (b)

Fig. 5. (a) Generated searching regions for detecting; (b) Results of

Target

Memory

Rank-1

Rank-2

Rank-3

Rank-4

Rank-5

detecting the optimal region;

In the proposed system, similarity and ranking are iteratively calculated whenever the user draws the primitive shape. The similarity score between query image (Q) and a database image (I) is defined as: P D(~ x,~ y) ~ x∈Q,~ y ∈I exp(− σ 2 ) (1) sim(Q, I) = 2 2 kQk kIk where, σ is an experimental parameter, ~x and ~y are colorpatches and D(~x, ~y ) is the transformed distance in Algorithm 2. When the user draws additional primitive shape, our system re-visits the ranking procedure with only the top-50 percent of previous results. From this re-visiting process, we can reduce the computational complexity and achieve to run in real time for desired results.

Average Image of abstract images from User study

(a)

(b)

(c)

Fig. 7. (a) The target and abstract image in user’s memory (b) The corresponding top results after final searching; (c) Average image and some examples of user’s drawing;

4. EXPERIMENTAL RESULTS In this section we briefly introduce the experimental setup and evaluation results. The performance of our system is qualitatively evaluated, since the quantitative evaluation is difficult in the case of user interface aspects. Database To establish the database, we use the painting masterpieces from the Yorck Project [13], which consists of 10,000 paintings publicly available. This project provides high resolution images with an average height/width of 1,800 pixels and lots of metadata such as the filename, the artist, the date of origin, the painting technique, the place of exhibition and a rough painting style. In this experiment, we randomly select a database from all paintings and organize them in accordance with preferences from user study. Setup A user study was performed for the qualitative evaluation. First of all, we enable users to appreciate the masterpieces. After a delay, users try to find some target paintings with our interface. Our interface in Fig. 6 consists of three parts such as a drawing canvas, primitive toolbars (primitive shape, primitive color) and displaying panels (searching results, average image). The users choose the primitive shape and color among primitive toolbars and then draw their minds on the drawing canvas.

Results Fig. 7 shows some results of user study. First column in Fig. 7 (a) are target images in user’s mind and second are average images combining all user’s abstract images. Fig. 7 (b) are corresponding top results, and some examples of user’s drawing are shown as Fig. 7 (c). Target paintings are generally ranked on top with about 3∼4 times of interaction. Therefore, we show that our system is robust and reliable regardless of the color and spatial perception of users. Further results can be found in the supplementary materials.1 5. CONCLUSIONS We have presented a painting retrieval system without expert knowledge. The main contributions of the proposed system are summarized as follows: 1. Artrieval is the spatial-color based retrieval system for paintings. 2. Our system provides a convenient interface for users to freely express their mind, and enables real-time interactions to find their desired paintings more efficiently. 3. It is the first query-by-memory solution modeling a visual memory of the human brain for users rather than experts. Our system is evaluated with the painting masterpieces and the user study has shown best results. 1 https://sites.google.com/site/artrieval/

6. ACKNOWLEDGEMENT This work was supported by the Technology Innovation Program (No. 10048320), funded by the Ministry of Trade, Industry & Energy (MI, Korea). 7. REFERENCES [1] Y. Cao, H. Wang, C. Wang, Z. Li, L. Zhang, and L. Zhang, “Mindfinder: Interactive sketch-based image search on millions of images,” in ACM-MM, 2010. [2] X. Sun, C. Wang, A. Sud, C. Xu, and L. Zhang, “Magicbrush: Image search by color sketch,” in ACM-MM, 2013. [3] J. G. A. O. Timothy F. Brady, Talia Konkle and G. A. Alvarez, “Visual long-term memory has the same limit on fidelity as visual working memory,” Psychologocal Science, pp. 981–990, 2013. [4] T. Osugi and Y. Takeda, “The precision of visual memory for a complex contour shape measured by a freehand drawing task,” Vision Research, pp. 17–26, 2013. [5] R. Achanta, A. Shaji, K. Smith, A. Lucchi, P. Fua, and S. Susstrunk, “Slic superpixels compared to state-ofthe-art superpixel methods,” IEEE TPAMI, vol. 34, no. 11, pp. 2274–2282, 2012.

[6] S. P. Lloyd, “Least squares quantization in pcm,” IEEE TIT, vol. 28, pp. 129–137, 1982. [7] D. Arthur and S. Vassilvitskii, “k-means++: the advantages of careful seeding,” in ACM-SIAM symposium on Discrete algorithms, 2007. [8] J. van de Weijer, C. Schmid, J. Verbeek, and D. Larlus, “Learning color names for real-world applications,” IEEE TIP, vol. 18, no. 7, pp. 1512–1523, 2009. [9] Y. Liu, Y. Liang, Z. Yuan, and N. Zheng, “Learning to describe color composition of visual objects.,” in ICPR, 2012. [10] Y. Yang, J. Yang, J. Yan, S. Liao, D. Yi, and S. Z. Li, “Salient color names for person re-identification.,” in ECCV, 2014. [11] M. Miyahara and Y. Yoshida, “Mathematical transform of (r, g, b) color data to munsell (h, v, c) color data,” IEEE VCIP, vol. 1001, pp. 650–657, 1988. [12] K. Q. Weinberger and L. K. Saul, “Distance metric learning for large margin nearest neighbor classification,” JMLR, vol. 10, pp. 207–244, June 2009. [13] “York project,” http://commons.wikimedia. org/wiki/Commons:10,000_paintings_ from_Directmedia.

existing methods on how well the query is represented by the user, and propose a simple, yet effective way to search the painting ... Index Termsâ painting retrieval, query-by-memory, color clustering, hierarchical metric learning. 1. ..... Correction of spatial distortion. In this section, we explain on how to correct the aspect ra-.

Download PDF

2MB Sizes 1 Downloads 166 Views

Report

Year

Recommend Documents