Using the Knowledge of Object Colors to Segment ...

Viewer
Transcript

Using the Knowledge of Object Colors to Segment Images and Improve Web Image Search Christophe Millet1,2, Isabelle Bloch2 and Adrian Popescu1 1

CEA/LIST/LIC2M 18 Route du Panorama, 92265 Fontenay aux Roses, France [email protected] , [email protected] 2 GET-ENST - Dept TSI - CNRS UMR 5141 LTCI Paris, France [email protected] Abstract With web image search engines, we face a situation where the results are very noisy, and when we ask for a specific object, we are not ensured that this object is contained in all the images returned by the search engines: about 50% of the images returned are off-topic. In this paper, we explain how knowing the color of an object can help locating the object in images, and we also propose methods to automatically find the color of an object, so that the whole process can be fully automatic. Results reveal that this method allows us to reduce the noise in returned images while providing automatic segmentation so that it can be used for clustering or object learning.

1. Introduction The Internet contains many images that we would be willing to exploit, for example for object learning. Images can be accessed and searched through web image search engines: the users enter a list of keywords and gets, as a feedback, a list of images whose title match these keywords, or whose accompanying text on the web page contains them. However, the raw set of images obtained that way could not be used directly for object learning, because it is too noisy. With a quick evaluation of four web image search engines (Google, Yahoo!, Ask and Exalead) with 50 queries, we found out that in the first 50 images about 50% of the images are noise: they are not related to the query. One of the reasons is that the indexing and retrieval of these images is only based on text, and does not consider the content of the images. In this paper, we focus on the colors of the objects we are looking for to analyze the content of images, and improve the results returned by web queries. There is also no database describing the colors of objects, so, even though it is not impossible to construct such a database by hand, some ideas to obtain that information automatically are proposed in this paper. This method is applied to the following problems: – automatic segmentation of the main object in images, and especially segmentation of objects with more than one color, – reducing noise from web search engines, – re-ranking images from web search engines. 1.1 Automatic segmentation

When retrieving images for a keyword from the Internet, we would like to automatically isolate the object corresponding to that keyword in the image. This kind of automatic segmentation of objects is a difficult problem: most images contain more than one object, so, when we apply classical automatic segmentation, it is difficult to know what object is the one we are looking for. One way would be to use the focus of cameras to determine the difference

between the object in focus and the background which is blur, as Swain et al (Swain, 1995) did for video images segmentation. However, in the case of irrelevant images, there can be an object in focus that is not the object we have queried. To automatically locate the object in images from the Web, Ben-Haim et al. (2006) first segmented the images into several regions, and then clustered these regions in order to find similar regions in different images. The largest cluster, called “significant cluster” is considered as the one about the object. A similar approach is presented in (Russel, 2006) where the authors make use of multiple segmentations for each image to discover objects in an image collection, in order to improve the segmentation quality over techniques that use only one segmentation. This is an interesting approach, however, our personal experiments with queries about outdoor animals demonstrated that the largest cluster is often about context (sky, grass, ...) or is a cluster of dark regions, such as shadows or black background. Therefore, what we propose in this paper is to use the semantics of objects. Since we know what object the picture is about, we can use knowledge about that object, here its color, to guide the automatic segmentation. Another motivation for this method is the possibility to segment objects, like zebra or panda, which are made of many colors. Segmentation of such objects is not possible with classical algorithms because of the strong edges between the colors. In (Liapis, 2004), the authors were able to segment a zebra using luminance histogram, where each pixel is classified, and then the classes are propagated. The drawback of this method is that the number of classes must be specified prior to the segmentation, so that it would not work for intricate images where the background can be divided in several regions. 1.2 Cleaning and re-ranking web images

Even though it is an acknowledged issue that results of image search engines have a lot of noise and are not sorted, there has not been much work yet in this area. Lin et al (2003) used a relevance model on texts in the html web pages containing the images to re-rank web image search results. They reported an increase of precision from 30% to 50% if they consider only the first 50 documents, but without using the content of the image. Cai et al (2004) proposed to cluster web image search engines results using both the content of the image, and the texts and links from the web page. Their idea was that some queries are polysemic and may return results related to different topics. For example, the query pluto mixes images about the dwarf planet and about the Disney character; or the query dog returns images of various species of dogs, and by clustering, one can separate those different topics. However, they did not try to clean or rank the images. A way to clean images is to try and detect some similarities between the different images, for example using interest points. This method is very difficult to apply to web images because, as we said, not all images contain the object of interest: the noise is in average 50% and can be up to 85% for some queries. Furthermore, there are many variations in qualities of pictures, so that it can be hard to find a repetitive pattern. In Ben-Haim et al. (2006), images are reranked using as a distance for a given image the smaller distance between one of its blobs and the mean of their “significant cluster”, which has been described in Section 1.1. Fergus et al. (2004, 2005), published an approach giving promising results for cleaning, reranking and learning from Google's Image Search results. They first apply several kinds of circular region detector based on interest points, and then compute the SIFT descriptor on these regions. They are used to train a translation and scale invariant probabilistic latent

semantic analysis (TSI-pLSA) model for object classification. This model can then be applied on the raw data to re-rank images, and clean them if the last ones are discarded. Fergus et al. reported an improvement of about 20% precision in average at 15% recall (that is, discarding 85% of the images). In (Millet, 2006) we explained how it is possible to clean and re-rank images using clustering techniques. Images were first automatically segmented with a waterfall segmentation of the image in twenty regions, where we merged all regions that had no contact with the border of the images and considered it as being the object to study. This merged region was then indexed with texture and color features, and the indexes were used by a shared nearest neighbor (SNN) clustering algorithm. Images which were not clustered were removed, and for each cluster, the probable colors of the objects were used to sort the clusters, putting on top the clusters containing the most objects of the expected colors. It gave promising results, but the segmentation was not always able to identify the right object. In this paper, instead, we use the color information at the segmentation stage, so that all the obtained segmented regions are of the correct colors. Therefore, the color itself can not be used to sort and rank images, but the size and position of the segmented regions can serve that purpose. We first explain how we associate color names with the HSV values of the pixels in Section 2. Then, we discuss in Section 3 the possibility to know the most frequent colors of a given object, and how to recognize and deal with objects which do not have a particular color. In Section 4, we detail the algorithm we use to segment a picture given one or more colors we want to focus on. Eventually, we explain and evaluate the applications proposed in Section 5 to bring out the interest of the proposed method for segmentation, and then conclude with future works. 2. The color of a pixel In this section, we develop a model to match the pixel values with color names. Naming colors is not a trivial issue, firstly because the number of colors to consider has to be defined, and secondly the separations between colors are unclear. Berk et al. (1982) compared several systems for naming colors in respect to how easy it is for a user to name the colors, and found that “the load on the casual user's memory is too great for this system to be easily used”. Then, they proposed a color-naming system (CNS) consisting of 10 basic colors: gray, black, white, red, orange, brown, yellow, green, blue, purple. Then, several adjectives can be used such as dark, medium, light, grayish,vivid,... and they also add the -ish forms such as reddish brown which allows them to build a complex system of 627 color names. Though, when naming a color, the first reaction for most people is to use only one of the 10 basic colors listed above. In our opinion, pink is lacking in the CNS system, since it is often used for naming objects: a lot of clothes, often for girls, are of that colors, and animals such as domestic pig or lesser flamingo are usually described by people as pink. Therefore, we have decided to consider 11 colors, introducing the pink color to the CNS. In order to map these 11 colors with pixel values, the HSV (hue, saturation, value) color space is used, since it is more semantic than RGB, and therefore makes it easier to deduce the color name of a pixel. Especially, the hue value is close to the concept of color names. In our HSV color space, each component has been scaled between 0 and 255. Pixels with a low saturation (S < 20 ) are considered achromatic and are assigned a negative hue. Our first idea was to separate clearly the different colors in the HSV space, that is: to each (h,s,v) triplet, associate one and only one color name. Though, we noticed some pixels for

which we named different colors for different contexts, and we decided to take the ambiguity between colors into account and to associate sometimes more than one color name for a given (h,s,v) triplet. So, instead of associating a color name to any pixel value, we defined for each color name a range of (h,s,v) values that it occupies. The complete correspondence is detailed in Table 1. Color

Hue

Saturation

Value

<0

any

0 – 85

0 – 255

any

0 – 40

<0

any

80 – 180

<0

any

175 – 255

0 – 30

0 – 90

200 – 255

235 – 245

any

any

0 – 15

any

any

240 – 255

any

any

orange

14 – 30

any

any

yellow

20 – 50

any

190 – 255

20 – 50

any

0 – 200

50 – 125

any

any

blue

110 – 200

any

any

purple

200 – 235

any

any

0 – 40

25 – 140

75 – 200

230 – 255

25 – 135

55 – 190

<0

10 – 30

60 – 165

black gray white rose red

green

brown

Table 1: Translation of a color name in the HSV color space.

The values have been chosen from manual experiments and observations on color palettes and various images. It still needs to be validated by several users to obtain a more objective model. Other methods such as fuzzy logic could be used to model the fuzzy borderlines between two colors, but the method we developed here proved to be sufficient for our application. Also, in this method, all color definitions are independent, so that any additional color name (cyan, crimson, ...) can be easily added.

3. The color of an object Getting the color of objects is not an easy problem to solve automatically. Firstly because a change in illumination changes the perceived color of an object. Furthermore, not all objects have a unique color. Some objects have two or more colors that are well defined (zebra, panda, ...), while for some other objects, their color is not part of their definition, and they can be of different colors (chair, house, ...). Generally, objects with well-defined colors are natural objects, encompassing mineral objects, animals, or plants, whereas objects with no specific

color are mainly man-made objects. For animals, since subspecies have often been named according to their colors, the use of an ontology, and particularly the hyponymy relationship, can help us knowing if the animal can be found in different colors. For example, using the ontology WordNet (Fellbaum, 1998), we can know that “wolf” can be at least “grey wolf”, “red wolf” or “white wolf”; “bear” can be “brown bear”, “black bear”, “white bear” (synonym for “polar bear”). For that reason, we will only consider objects that are leaves in the WordNet ontology, that is, objects which do not have hypernyms, so that the object's color does not vary too much. Since man made objects can have different colors, the color of the object is usually specified in the web page, or in the image name, and the query “red car” returns several thousand results. For natural objects, “lion” for example, the queries “brown lion” or “tan lion” will return few results (respectively 58 and 38 on ask.com), many of which are teddy bears, which are man-made objects, and therefore are not brown per se. There is only a small portion of animals in the results, because on the one hand, Web search engines used image names and text near the image in the web pages to annotate and retrieve the images, and on the other hand, people usually do not state the obvious, and do not specify that the lion they took a picture of is brown. On the contrary, uncommon colors for animals are most likely to be annotated, so that a query like “white lion” returns more images (3820 on ask.com) than “brown lion” or “tan lion”. Therefore, when grabbing images from the Internet, we specify the color in the query for man-made object, and not for natural objects. We believe that if a user is looking for “chair”, we can run the queries with all colors (“red chair”, “green chair”, ...), then process each query and eventually merge them together. Our algorithms are tested on the following objects: – animals with one predetermined color: fire ant, beaver, cougar, crab, crocodile, – animals with two colors: blue-and-yellow macaw, ladybug, leopard, panda, zebra, – man-made objects: camera (white, black), cell phone (black, blue, green, red, white), chair (black, blue, green, red, white), cup (black, green, red, white), Porsche (red). In this section, we experiment two methods to know automatically the color of an object. The first method is based on words, using a text corpus, and the second method is based on the content of images. 3.1 Text based method

The colors of objects can be extracted automatically from a huge text corpus, and we propose to use the web to do so. The idea is to study if the object name often appears near a color name or not in the texts. We have experimented two variants to get the color of an object. For example, let us imagine that we want to get the color of a beaver. The first variant is to ask ''brown beaver'' on a web text query where brown can be any other color, and get the number of pages returned. The quotation marks here are needed and guarantee that the word for the color is directly before the word for the object: other color words in the same page may not be related to the object we are considering. The second variant uses the query ''beavers are brown''. The category of the object can be used to reduce the noise, so instead of the examples given above, we can query for ''brown beaver'' animal and ''beavers are brown'' animal. We have a vocabulary of 14 color words for web querying: black, blue, brown, gray, green, grey, orange, pink, purple, red, rose, tan, white, yellow. This is more than the 11 colors used in image color description, but some colors are merged together: gray and grey are synonyms, brown/tan and rose/pink are also considered as synonyms. For these colors, the corresponding

number of results are summed up giving the number of occurrences N( C | object) of a color C for a given object. In Table 2, we list the top five colors returned for beaver using Yahoo! Search, and the number of results in parentheses. Brown and black (in that order) are the two main colors we expect to get, and this is what is returned by the second variant. ''color beaver'' ''color beaver'' animal

''beaver is color''

''beaver is color'' animal

brown (43 200)

green (10 800)

brown (98)

brown (26)

black (28 100)

brown (7 550)

red (6)

black (1)

green (20 400)

black (2 800)

black (3)

-

gray (11 400)

red (1 050)

blue (1)

-

red (9 610)

gray (783)

orange (1)

-

Table 2: Querying the web to extract the colors of an object: the first variant (the first two columns) returns more results, but with the second variant (the last two columns), the real color of a beaver, brown, is better differentiated from the other colors. The expected color is underlined.

The beaver example is representative of what we observed in general for other objects: the second variant provides more accurate results, but returns fewer answers than the first. However, the first variant can be disturbed with proper nouns and phrases. For example, Green Beaver is a company and is also the name of a cocktail. Also, “white house” returns a lot of results, but that does not mean that all houses are white. Phrases have the same influence: the existence of the animal “blue whale” causes this method to return blue as the paramount color for whales, and “white chocolate” have more hits than “black chocolate” or “brown chocolate”. This issue does not arise using the second variant. However, sometimes this variant does not return any color, as for example with the word “passerine” (a type of bird), and in that case, the first variant can be used. 3.2 Image based method

Instead of using the text to know the color of objects, another method consists in using the content of images. The method we propose is to build statistics on all the images retrieved from the web using the object name as the query. Considering the hypothesis that most images are centered in the image and surrounded by a background, we take into account only the pixels in a window around the center whose width and height are half of the image width and height, as in Figure 1.

Figure 1: Window of pixels considered to find the color of an object.

Of course, some pixels from the environment are contained in this window, but we make the hypothesis that, if we take an average over several images, the color of the animal predominates over the colors of the environment, since the color of the animal will be the

same for all images, whereas the colors of the environment can vary. Table 3 reports the five main colors obtained for several objects. red Porsche red (44.5)

beaver brown (24.0)

blue (11.8) white (16.9) black (11.0)

brown (23.0)

zebra brown (20.7)

blue (11.9)

ladybug

blue-andyellow macaw

red (23.1)

blue (21.3)

red (22.8) white (20.6) white (20.7)

brown (16.8)

red (13.0) white (14.7) black (17.3)

brown (8.7) black (12.1) white (6.6)

crab

brown (14.7)

green (13.9)

blue (13.0)

blue (10.9)

green (9.5)

yellow (11.6)

black (8.7)

red (9.4)

black (8.1)

black (8.7)

Table 3: Five dominant colors for each objects. The number in parenthesis is the percentage of the color among all the pixels considered. The expected colors are underlined.

For “red Porsche” (first column), and any other man-made object where we specify the color, it works quite well: the dominant color is the color we specified in the text query. It is more interesting, though, to try it on objects for which the color has not been specified when querying for images. For these objects, it also leads to interesting results, considering that the amount of noise on the Internet is in average about 50%: the first colors for beaver, crab, ladybug and macaw are correct. Though, we get some unwanted colors. The presence of white in beaver and ladybug is due to the fact that many of these images are in fact cliparts, with a black drawing on a white background. The green for blue-and-yellow macaw is only partly wrong since the top of their heads are green. Though, the main contribution comes from the trees that are the most common environment of macaws, and the brown for zebra also comes from its environment which is mostly brown grass. The two methods we propose provide interesting results: in most cases, the correct color of the object is the first one. However, this color is not well separated from the other colors, because of proper nouns and of people not stating the obvious for the text based method, and because of the color of background for the image based method. One possible improvement would be to combine both methods. Another idea would be, for the text based method, to use a more reliable source than the web as a whole, such as Wikipedia to obtain the color automatically. This would need a linguistic processing specific to Wikipedia. For the following experiments, we consider that we have a manually built database giving the color of objects. In fact, creating such a database by hand, even for ten thousand objects, would not be unrealistic anyway. 4. Segmenting with colors Given an image and the color(s) of the object to search for, the segmentation of the image is done in several steps: – Classify each pixel as belonging to the colors of interest or not, as explained in Section 2. This builds a binary image. We call pixels of these colors ''object pixels'' and other pixels ''background pixels''. – Remove noise or small thin objects: this is done with an opening by a structuring element of size 1. – Apply a closing by a structuring element of size 5 to merge close object regions together.

– –

Select the largest region. Remove holes, defined by background pixels entirely surrounded by object pixels, based upon the assumption that the objects have no hole, which is the case for most objects.

Original image

Remove noise with an opening

Classification of each pixel

Closing to merge close regions together

Keep the largest region

Remove holes

Result Figure 2: Segmentation of a beaver using the color brown.

The step that uses the color information is the first step that builds a binary image classifying pixels as object or background according to their color. It is possible to use more than one color in this step, thus making it possible to segment animals like zebra using both black and white as object colors. The four other steps are about keeping only one object region, and making that region cleaner in terms of shape.

Removing noise with an opening (second step) is useful so that we do not amplify it when doing the closing (third step), as illustrated in Figure 3 (a): the pixels below the beaver get connected together so that some water is included in the final segmentation, after removing holes. The closing merges close regions together, to give a better shape. If we compare the top of the beaver without closing (Figure 3 (b)) and with closing (Figure 2), we observe that the closing leads to that hole on the top being disconnected from the background, so that it can be identified as a real hole and filled in the last step. We provide some more results on Section 5.1.

(a)

(b)

Figure 3: Same segmentation than in Figure 2 but, (a) left: without removing the noise, omitting step 2, (b) right: without performing the closure, omitting step 3.

5. Applications Several applications can benefit from this work: – Automatic segmentation in order to locate the object in the image and remove the context, so that the image is cleaner to be used for example for object learning. This method also allows us to automatically segment objects with many colors, as for example a zebra. – Removing irrelevant images from the web: if the image does not contain the color(s), or contains them in a too small quantity, then, it is discarded. – Re-ranking images in web image search.

5.1 Automatic segmentation

Humans use the fact that we recognize a zebra in order to associate the stripes together and consider the animal as a whole, but automatic segmentation algorithms based on gradients are expected to consider the stripes as separated objects, since the gradient between the white and the black stripes is strong. If we do not add any knowledge to the system about what is a zebra, it has no way to know how humans see that animal. We propose to use the known color of an object to segment images supposed to contain that object, as explained in Section 4. Figures 4 to 9 report some segmentation results.

Figure 4: Segmentation of a cougar with color “brown”. The tongue is included when holes are filled.

Figure 5: Segmentation of a red Porsche with color “red”.

Figure 6: Segmentation of a zebra with colors “black” and “white”.

Figure 7: Segmentation of a blue-and-yellow macaw with colors “blue” and “yellow”.

Figure 8: Frames cause wrong segmentations with color black: because of the “remove hole” step, the segmented image is the same than the original image. On the left, the black cell phone is not segmented, on the right, the segmentation will not find the small ladybug at the center.

Figure 9: Leopard segmented with yellow and black colors. Animals hidden in their environment are hard to segment with this method.

With these results, we understand better the purpose of removing holes: it is used to segment correctly objects that are mainly composed of some given colors, but not entirely. For example, the cougar (Figure 4) is mainly brown, but its mouth is red, white and black. Though, this region is entirely contained in a brown region, so that it can be associated to the object, whereas the black shadow below the mouth is not contained in a brown region and is classified as background. The same happens with the red Porsche (Figure 5), where the right light and window are considered as being in the object (but not the windshield, the thin red border on its left being considered as noise). The drawback is for images with frames: if the frame color is one of the color used for the segmentation, the whole image will be considered as a hole in the frame, and filled as being the object, as shown in Figure 8. It would be possible to automatically remove frames from pictures, but since the number of images with frames of the “wrong” color is low, it does not have too much consequences for the following applications on web image filtering. Our segmentation algorithm is also well suited for segmentations of objects with two or more colors, which are usually not possible with classical algorithms. See Figures 6 and 7 for examples. Some non-semantic algorithms can segment the zebra using texture criterion (Liapis, 2004), but the macaw is clearly composed of two distinct blue and yellow regions where no texture information can help. Segmenting with colors only as we do here, though, leads to poor results for objects hidden in their environment, as are some animals: the leopard in Figure 9 is not distinguished from the tree leaves. Therefore, we could probably improve our algorithm by coupling it with a texture based segmentation algorithm. 5.2 Filtering images from the web

As we know, for any query, about 50% of the images returned by current web image search engines are irrelevant to the query (see Table 4). Here, we propose to remove irrelevant images using the color of objects. The idea is to analyze what the region obtained from automatic segmentation with the color looks like. In a perfect situation, the object is centered in the image, totally contained in it, but not too small. Therefore, we can use the following criteria to remove irrelevant images: the image will be discarded if the resulting segmented object: – is too small by occupying less than 20% of the image surface, – touches more than 80% of the border pixels, – is such as the distance of the barycenter of the region (xR, yR) to the center of the image (xI, yI) is smaller than 40% of the distance of a corner of the image to its center:

2

2

 x R− xI ² y R− y I ²0.4∗ x I y I  Our results are given on Table 4 for each class. Object

Precision from raw web query

Precision after filtering

Object

Precision from raw web query

Precision after filtering

beaver

40.2%

72.7%

green cell phone

17.9%

46.2%

cougar

72.0%

93.0%

red cell phone

22.5%

37.5%

crab

55.8%

70.8%

white cell phone

12.9%

30.0%

crocodile

79.5%

85.7%

black chair

66.7%

83.3%

fire ant

61.0%

50.0%

blue chair

50.9%

58.8%

blue-and-yellow macaw

79.4%

82.6%

green chair

22.2%

50.0%

ladybug

72.8%

73.3%

red chair

70.6%

88.5%

leopard

84.9%

89.7%

white chair

62.0%

83.3%

panda

80.0%

81.5%

black cup

35.2%

66.7%

zebra

76.7%

89.3%

green cup

34.5%

80.0%

black camera

69.1%

79.1%

red cup

36.5%

50.0%

white camera

18.6%

50.0%

white cup

33.9%

66.7%

black cell phone

43.0%

87.0%

red Porsche

85.0%

85.7%

blue cell phone

28.6%

42.4%

mean precision on classes

52.0%

69.4%

mean precision on images

53.5%

72.0%

Table 4: Results on web image filtering. The precision increases from 53.5% to 72% on average, while 25% of the images are kept. Only “fire ant” suffers a loss in precision: this class contains images of irritated skin as a result of fire ants bites that are kept after filtering because of their red color.

Almost all class precisions are increased, except for “fire ant”, as explained on the table caption. Four classes still have less than 50% of precision after filtering (instead of 12 classes before filtering), which are all cell phone images, mostly due to images depicting some objects that are not the objects we looked for, but have the right color. Such noise cannot be removed with our method, but could with for example cluster-based methods. Having only 25% of images kept is not an issue when considering Internet images: since there are many images on the Internet, a user querying for images will only look at the first images, not all of them, so they are interested in quality (precision) rather than quantity (recall). For a possible use in object learning, precision also matters since it is best to have a clean set of images for learning, rather than a large set with 50% of noise. For Table 4, the parameters have been chosen to maximize the precision while keeping at least 25% of the images. We also tried to introduce a criterion where we rejected images bigger than some size, but it appeared that the criterion on the border pixels gave better

results. We can study the influence on precision of each criterion independently with the graphics of Figure 10.

Figure 10: Influence of each parameter individually on the precision. The border and barycenter parameters graphs have been computed for images whose size is between 10% and 50% of the images, corresponding to the best images according to the size criterion.

Considering the graph of the variation of the precision depending on the “minimum size of the object”, we can conclude that the highest proportion of relevant objects is found in the set of objects whose size is between 10% and 50% of the image size. The two other graphs have been computed only for images that are in this range of surface, because otherwise, the noise generated by (very) small objects and large objects prevents us to perceive the influence of these parameters. We observe that an object has more chances to be relevant if it does not touch the border of the image, and if it is centered. It is possible to increase further the precision, if we make the parameters more restrictive by: – increasing the minimum size of the object region, – reducing the number of proportion of pixels allowed to touch the border of the image, – reducing the maximal distance between the barycenter of the region and the center of the image. This allows the precision to be increased up to 86%, as represented in Figure 11, but at the expense of the number of images left after filtering, who drops then to 5% (and we can even reach 87.5%, but by keeping only 1.2% of the images, which is really few).

Figure 11: Relation between precision and number of images left after filtering for various parameters. Increasing the precision means discarding a lot of images.

Therefore, if we chose to keep 15% of the images, as in (Fergus, 2004), we have an increase of precision of 24% (from 53% to 77%), which is comparable to the increase of 20% they reported, even though the two algorithms have not been evaluated on the same database. 5.3 Re-ranking images from web search engines

Another remark about web image search engines is that images are visually not well sorted. Here, since the color has been used to segment the objects, it can not be used to re-rank the images as we proposed in (Millet, 2006). Therefore, we propose another method based on the criteria developed above to reject irrelevant images. To each image, we assign a score that is smaller if the object touches the edge of the pictures, or if its surface is too small or too big. For a given segmented region in an image, we define S as the ratio of the surface of the region divided by the surface of the image, B as the percentage of pixels in the border of the image that are contained in the object. We have seen in section 5.2 that the proportion of relevant objects is higher when B is close to 0, and for S between 0.2 and 0.4. Therefore, we propose the following score ∑: ∑=1−B∗f  S with

{

1 S f S= 0.2 1−S  0.6

if

0.2≤S≤0.4

if

S0.2

if

S0.4

}

Then, images are sorted with descending orders of score. Some results are illustrated in Figures 12 and 13.

Figure 12: Left: first 20 images returned by ask.com for the query “beaver”. Right: first 20 images after re-ranking the first 100 images returned by ask.com.

Figure 13: Left: first 20 images returned by ask.com for the query “green cup”. Right: first 20 images after re-ranking the first 100 images returned by ask.com.

As expected, after re-ranking, we have mostly images with a central object of the expected color. In the “beaver” example, this has the effect of discarding black and white drawings, which were labeled as relevant images, but as we stated above, we do not want to keep all relevant images, but we want to keep only relevant images, ideally. For the “green cup” query, we clearly notice the improvement: image web search engines have a lot of noise for queries like “cup” or “green cup”. With our system, if the color is given, we are able to sort the images, and increase greatly the precision. If the color is not given, we could automatically add various colors to the query, and display the results like clusters, with one for each color; or we could ask the user to specify which color he is interested in, with the possibility of querying for a “blue and white cup”. 6 Conclusion In this paper, we described a possible use of semantics, namely color, for processing uncertain images from the Web. We are able to segment the object in the image, with better results than classical algorithm for objects with more than one color. We also demonstrated how we can remove noise and re-rank images with good results. The method we proposed to remove noise

is independent of cluster-based methods developed in the literature. Therefore, we could probably use a clustering of these segmented images as a post-processing step to remove further noise. Future works will consist in extending this to texture: vocabulary such as ''stripe'' or ''spot'' can be extracted from Wikipedia, and could be used in image processing. We will also try to use the cleaned and segmented images to learn concepts, which is what originally motivated this work.

References Ben-Haim N. & Babenko B. & Belongie S. (2006). Improving Web-based Image Search via Content Based Clustering, Conference on Computer Vision and Pattern Recognition Workshop CVPRW, pages 106—111. Berk T. & Brownston L. & Kaufman A. (1982). A New Color-Naming System for Graphics Languages, IEEE Computer Graphics and Applications, Volume 2, No. 3, pages 37—44. Cai D. & Hi X. & Li Z. & Ma W.-Y. & Wen J.-R. (2004). Hierarchical Clustering of WWW Image Search Results Using Visual, Textual and Link Information, Proceedings of the 12th annual ACM international conference, New York, USA, pages 952—959. Fellbaum C. (1998). WordNet: An Electronic Lexical Database, MIT Press. ISBN: 0-262-06197-X Fergus R. & Perona P. & Zisserman A. (2004). A Visual Category Filter for Google Images, Proceeding of ECCV, Springer-Verlag. Fergus R. & Fei-Fei L. & Perona P. & Zisserman A. (2005). Learning Object Categories from Google's Image Search, Tenth IEEE International Conference on Computer Vision, ICCV, Volume 2, pages 1816—1823. Lin W.-H. & Jin R. & Hauptmann A. (2003). Web Image Retrieval Re-Ranking with Relevance Model, Proceedings of the IEEE/WIC International Conference on Web Intelligence, pages 242— 248. Lowe D. G. (2004). Distinctive Image Features from Scale-Invariant Keypoints, International Journal of Computer Vision, Number 60, Volume 2, pages 91—110. Liapis S. & Sifakis E. & Tziritas G. (2004) Colour and Texture Segmentation using Wavelet Frame Analysis, Deterministic Relaxation, and Fast Marching Algorithms, IEEE Transactions on Multimedia. Volume 6, Issue 5, pages 676—686, October 2004. Millet C. & Grefenstette G. & Bloch I. & Moëllic P.-A. & Hède P. (2006). Automatically Populating an Image Ontology and Semantic Color Filtering, International Workshop Ontoimage'2006 Language Resources for Content-Based Image Retrieval, Genoa, Italy, pages 34—39. Russel B. C. & Efros A. A. & Sivic J. & Freeman W. T. & Zisserman A. (2006). Using Multiple Segmentations to Discover Objects and their Extent in Image Collections, in IEEE Computer Society Conference on Computer Vision and Pattern Recognition, CVPRS, Volume 2, pages 1605— 1614. Swain C. & Chen T. (1995). Defocus-Based Image Segmentation, in Proceedings ICASSP-95, Volume 4, pages 2403—2406, Detroit, MI, May 1995.

Using the Knowledge of Object Colors to Segment ...

Images can be accessed and searched through web image search engines: the ... One way would be to use the focus of cameras to determine the difference ...... Based Clustering, Conference on Computer Vision and Pattern Recognition ...

Download PDF

1MB Sizes 0 Downloads 215 Views

Report

Using the Knowledge of Object Colors to Segment ...

Recommend Documents