Clustering and semantically filtering web images to create a large-scale image ontology S. Zinger 1, C. Millet, B. Mathieu, G. Grefenstette, P. Hède, P.-A. Moëllic Commissariat à l'Energie Atomique-LIST/Atomic Energy Agency of France-LIST LIC2M (Multilingual Multimedia Knowledge Engineering Laboratory), 18 Route du Panorama 92265, Fontenay aux Roses, France [email protected], {milletc, mathieub, grefenstetteg, hedep, moellicp}@zoe.cea.fr ABSTRACT In our effort to contribute to the closing of the "semantic gap" between images and their semantic description, we are building a large-scale ontology of images of objects. This visual catalog will contain a large number of images of objects, structured in a hierarchical catalog, allowing image processing researchers to derive signatures for wide classes of objects. We are building this ontology using images found on the web. We describe in this article our approach for finding coherent sets of object images. We first perform two semantic filtering steps: the first involves deciding which words correspond to objects and using these words to access databases which index text found associated with an image (e.g. Google Image seeach) to find a set of candidate images; the second semantic filtering step involves using face recognition technology to remove images of people from the candidate set (we have found that often requests for objects return images of people). After these two steps, we have a cleaner set of candidate images for each object. We then index and cluster the remaining images using our system VIKA (VIsual KAtaloguer) to find coherent sets of objects. Keywords: web image retrieval, image indexing, clustering, image ontology, semantics.

1.INTRODUCTION The constantly increasing amount of information in the Internet requires development of effective image retrieval strategies. While text processing methods have been successfully applied in information search engines, a lot of work still has to be done in the area of web image retrieval [2]. The goal is to locate the most relevant images that correspond to a query to a search engine. Currently, most of the works consider image retrieval based on the surrounding text that can be found on web-pages. Textual and link information is used in this framework. There are many strategies allowing to analyze the text around images, to segment web-pages on blocks in order to better locate the information [1]. Then the images are clustered using the data received from the textual and link data. Low-level features of images are applied for reorganizing the clustering results [1]. In our work, we are mostly interested in using low-level features in order to index image web search results. Our motivation comes from the fact that the existing image search engines use text processing for image retrieval, and the results obviously need some improvements in the retrieval methods [2]. The images of the Internet can be divided on large semantic groups [4]. In this article we consider one semantic feature. It consists of the presence of human faces. Our choice is based on the observation of image search engines: when a query is a name of an object, in the resulting images one may often find photographs of people. Excluding these images with people will provide better input for clustering and therefore better image retrieval results. Our approach is developed to improve image search results of online engines and therefore contribute to web-based data mining.

1

This work was carried out during the tenure of a MUSCLE Internal fellowship.

2.IMAGE INDEXING AND CLUSTERING The algorithm used for indexing is inspired by the approach based on border/interior pixel classification [9]. The idea of this article consists of building two histograms for an image. One histogram takes into account only border pixels, the other – interior ones. It implies that the first step of the algorithm is pixels classification on interior and border ones. This indexing algorithm is fast, simple and provides information not only on colors of an image but also on sizes of constant color areas in an image. In general, the indexing we use is developed for broad image domains. It is an important property, because WWW images are very diverse. The indexing method described above is implemented in PIRIA (Program for the Indexing and Research of Images by Affinity) [6]. The indexing leads to a vector of 128 elements for each image. We use the Riemann distance as a similarity measure between images. We calculate an array containing all distances between all the considered images. The next step is to cluster retrieved images in order to find prototype images to illustrate the query words. To achieve this task, we used a k-SNN clustering algorithm (Shared Nearest Neighbor). This algorithm is based on ideas from [5], a complete description can be found in [3]. For each image, the algorithm considers only the k most similar neighbor images. The main idea of the algorithm is that the more common neighbors two images have, the more similar they are. Then, images that are most similar to their neighbors are considered as topic images. Topic images are used to create clusters, and images that are strongly linked to topic images are aggregated to clusters. Other images are unclustered, the amount of unclustered images depends on parameters used previously. In our application, image collection retrieved from the Internet is noisy. We expect the clustering algorithm to ignore offtopic images, which are supposed to be isolated, and to extract some highly coherent clusters, that will carry prototype images for query words. In this approach, the SNN clustering algorithm has two main advantages. First, it doesn't suppose a predefined number of clusters. Second, it doesn't force images to belong to a cluster. By changing the parameters, we can adjust the focus of extracted clusters, and the amount of unclustered images. This clustering algorithm is linear in time and in space for the number of images. The main bottleneck of the system is the computing of images similarities.

3.REMOVING IMAGES WITH FACES Faces are a main topic of pictures that can be found on the Internet, and usually these pictures are not annotated as “face”, but with other keywords that are not related to face. Thus, when submitting a query to an image retrieval system based on surrounding text, most of the time (if not always) some found images are faces, and they are not relevant to the given query. Table 1 shows statistics about the proportion of faces obtained when querying AlletheWeb pictures finder. Except for glasses, these objects are not directly related to faces, whereas the average proportion of faces is 10% (glasses excepted). This proportion is high enough and cannot be ignored. Therefore, removing faces is a necessary step in our process. We expect to remove noise from an image set by deleting the images with faces. Some clusters may partly consist of images containing faces. It worsens the results since our goal is to create clusters of images that are representative for the given query. So we prefer to delete the images containing faces and only after that perform clustering. The algorithm used is the one based on a multi-stage AdaBoost proposed by Paul Viola and Michael Jones in 2001 [10] with the improvements of [8]. This algorithm is capable of processing images very rapidly, at different scales looking for a specific object (faces in our application). For color pictures, we validate the results of this algorithm with skin color detection: a detected face is validated if more than 30% of its pixels are skin colored.

Query

Number of images retrieved

Proportion of faces

Armchair

459

6%

Boat

399

23%

Glasses

415

52%

Knife

440

10%

Mug

472

8%

Tree

412

11%

Wristwatch

418

2%

Table 1: Proportion of faces on the Internet for seven queries.

False alarms do not have a strong influence: it is better to remove good images than to keep bad images, and the amount of images in the Internet is vast and quickly increasing. Table 2 represents the evaluation of our face detector performance.

Query

Precision (%)

Recall (%)

Armchair

19

38

Boat

51

40

Glasses

81

56

Knife

28

44

Mug

19

48

Tree

23

37

Wristwatch

14

55

Table 2: Performance of the face detector on web image search results.

The difficulty of face detection on web images consists of the high complexity of them. For example, often faces are turned, and our face detector does not recognise profiles.

4.EXPERIMENTAL RESULTS Our VIKA system (Fig. 1) uses a web image search engine to acquire images (Fig. 2) for a given query and then indexes and cluster these images using the algorithms described above. It is also possible to indicate the amount of images to be acquired. It is also possible to detect and remove images with faces before indexing. We evaluated the VIKA system on the seven sets of images presented in Table 1. The images of each set were manually sorted twice. At first, we marked the images containing faces.

Figure 1: Functional scheme of VIKA system.

Figure 2: VIKA (VIsual KAtaloguer) system

At second, we selected the images representative for a given object. The criteria applied for the manual selection of relevant images are the following: -

the whole object is present on the image,

-

the object occupies the biggest part of the image,

-

the object is pictured in its “habitual” form (for example, chair – from a side, and not from above).

A result of VIKA’s performance is in Figure 3.

Figure 3: The clustered results of web image search for the query “chair”; images of chairs form clusters, representative for the object “chair”.

The left image on Figure 4 shows the present web image search results curently available, and the right image is the results we would like to get. The images on this right image form clusters in the VIKA system. Tables 3 and 4 present our experimental evaluation of the VIKA system comparing to Alltheweb. The amount of images in clusters we consider as the retrieved images. We calculate precision and recall for VIKA before and after removing images with people in order to estimate the use of the face detector. The recall and precision of VIKA is compared to Alltheweb.

We are primarily concerned by the precision, because it coresponds to the number of relevant images inside clusters. Precision shows us how successful the system is in constructing sets of coherent images – we need them for building a large scale image ontology. While VIKA obviously outperforms Alltheweb for the query armchair (18% of improvement), we can see that for other queries the improvement is much smaller. One of the reasons for it is the manual sorting of images on relevant and irrelevant ones. The worst performance of VIKA is detected for queries when manual sorting was difficult. For example, only the images containing entire trees were considered as relevant, while images with some branches of trees were marked as irreleant.

Figure 4: Left image - web image search results in the original order by the web search engine , right image – desirable results (clusters from VIKA).

Query

Precision (%)

Recall (%)

Alltheweb

VIKA

Alltheweb

VIKA

armchair

58

77

44

58

boat

14

16

42

48

glasses

10

10

69

63

knife

26

24

61

55

mug

69

74

54

58

tree

21

20

53

51

wristwatch

64

66

43

44

Table 3: Alltheweb and VIKA performances before removing images with faces.

5.CONCLUSIONS The presented work is oriented versus content-based web image retrieval. We use image indexing and clustering in order to reorganize and improve web image search results. The applied indexing algorithm is oriented on the colors of an image and partially on the shape of objects on the image. The advantages of the clustering method include its ability to work with multidimensional data and form clusters of different size and shape. This clustering also determines unclustered images. It is an important property since one always finds irrelevant images in web image search results, and we are interested in isolating such images classifying them as unclustered.

Query

Precision (%)

Recall (%)

Alltheweb

VIKA

Alltheweb

VIKA

armchair

59

77

37

49

boat

14

18

38

48

glasses

11

12

43

46

knife

32

32

54

54

mug

75

78

45

47

tree

22

22

52

53

wristwatch

64

66

44

45

Table 4: Alltheweb and VIKA performances after removing images with faces.

During the experiments we noticed that search results often contain pictures of people. So we introduce a semantic feature presence of a human face - for improving the results. Both indexing and clustering methods have been implemented and tested on the images retrieved from the web. One can see that clustering the images from the web provides the results which are better organized. The next step is to keep only relevant images. Still, the problem of classifying the clusters is to be resolved in the future work. We also have to make a list of queries for which removing people’s photographs is not needed. For example, the query “fireman” should provide photographs of people and therefore images with faces must be kept. More experiments are to be done in the framework presented in this article. Other indexing methods as well as different clustering parameters can be explored. We are interested in exploring the issue of the choice of proper indexing method depending on the query. For example, we use a color correlogram to index images with man-made objects, but for indexing images with trees we prefer a method based on texture. Our future research will be concentrated on building a large scale image ontology. This work includes web-based data mining, text and image processing. We would like to explore the possibility of building a visual dictionary using web image retrieval.

REFERENCES [1] Cai, D., He, X., Li, Z., Ma, W.-Y., and Wen, J.-R. Hierarchical Clustering of WWW Image Search Results Using Visual,

Textual and Link Information. In Proceedings of the 12th annual ACM international conference on Multimedia, New York, NY, USA, 2004, 952-959. [2] Deselaers, T., Keysers D., and Ney, H. Clustering Visually Similar Images to Improve Image Search Engines. In Informatiktage 2003 der Gesellschaft für Informatik, Bad Schussenried, Germany, November 2003.

[3] Ertöz L., Steibach M. And Kumar V. (2001). Finding topics in Collections of Documents: A Shared Nearest Neighbor

Approach. Proc. Of Text Mine'01, Workshop on Text Mining, First SIAM International Conference on Data mining. [4] Frankel, C., Swain, M. J., Athistos, V. WebSeer: An Image Search Engine for the World Wide Web. Technical Report

TR-96-14, University of Chicago, IL, USA, 1996. [5] Jarvis R. A. And Patrick E. A. (1973). Clustering Using a Similarity Measure Based on Shared Nearest Neighbors.

IEEE Transactions on Computers, vol. C22, No. 11. [6] Joint, M., Moëllic, P. A., Hède P. and Adam, P. PIRIA: a general tool for indexing, search, and retrieval of multimedia

content. Journal of Image Processing: Algorithms and Systems III (Proceedings of the SPIE, Volume 5298), 2004, 116125. [7] Kherfi, M. L., Ziou, D., and Bernardi, A. What is Behind Image Retrieval from the World Wide Web? In Proceedings of the International Conference on Web-Based Communities, Lisbon, Portugal, March 2004. [8] Lienhart, R., Kuranov, A., Pisarevsky, V. Empirical Analysis of Detection Cascades of Boosted Classifiers for Rapide Object Detection. Microprocessor Research Lab Technical Report, May 2002. [9] Stehling, R. O., Nascimento, M. A., Falcão, A. X. A Compact and Efficient Image Retrieval Approach Based on Border/Interior Pixel Classification. In Proceedings of the eleventh international conference on Information and knowledge management. (McLean, Virginia, USA, 2002). ACM Press, New York, NY, 2002, 102-109. [10] Viola, J., Jones, M. Robust Real-time Object Detection. . L. Reasoning about naming systems. Second Int’l Workshop on Statistical and Computational Theories of Vision – Modeling, Learning, Computing and Sampling. Vancouver, Canada, 2001.

Proceedings Template - WORD

these words to access databases which index text found associated with an image ... The constantly increasing amount of information in the Internet requires ...

349KB Sizes 1 Downloads 233 Views

Recommend Documents

Proceedings Template - WORD
This paper presents a System for Early Analysis of SoCs (SEAS) .... converted to a SystemC program which has constructor calls for ... cores contain more critical connections, such as high-speed IOs, ... At this early stage, the typical way to.

Proceedings Template - WORD - PDFKUL.COM
multimedia authoring system dedicated to end-users aims at facilitating multimedia documents creation. ... LimSee3 [7] is a generic tool (or platform) for editing multimedia documents and as such it provides several .... produced with an XSLT transfo

Proceedings Template - WORD
Through the use of crowdsourcing services like. Amazon's Mechanical ...... improving data quality and data mining using multiple, noisy labelers. In KDD 2008.

Proceedings Template - WORD
software such as Adobe Flash Creative Suite 3, SwiSH, ... after a course, to create a fully synchronized multimedia ... of on-line viewable course presentations.

Proceedings Template - WORD
We propose to address the problem of encouraging ... Topic: A friend of yours insists that you must only buy and .... Information Seeking Behavior on the Web.

Proceedings Template - WORD
10, 11]. Dialogic instruction involves fewer teacher questions and ... achievment [1, 3, 10]. ..... system) 2.0: A Windows laptop computer system for the in-.

Proceedings Template - WORD
Universal Hash Function has over other classes of Hash function. ..... O PG. O nPG. O MG. M. +. +. +. = +. 4. CONCLUSIONS. As stated by the results in the ... 1023–1030,. [4] Mitchell, M. An Introduction to Genetic Algorithms. MIT. Press, 2005.

Proceedings Template - WORD
As any heuristic implicitly sequences the input when it reads data, the presentation captures ... Pushing this idea further, a heuristic h is a mapping from one.

Proceedings Template - WORD
Experimental results on the datasets of TREC web track, OSHUMED, and a commercial web search ..... TREC data, since OHSUMED is a text document collection without hyperlink. ..... Knowledge Discovery and Data Mining (KDD), ACM.

Proceedings Template - WORD
685 Education Sciences. Madison WI, 53706-1475 [email protected] ... student engagement [11] and improve student achievement [24]. However, the quality of implementation of dialogic ..... for Knowledge Analysis (WEKA) [9] an open source data min

Proceedings Template - WORD
presented an image of a historical document and are asked to transcribe selected fields thereof. FSI has over 100,000 volunteer annotators and a large associated infrastructure of personnel and hardware for managing the crowd sourcing. FSI annotators

Proceedings Template - WORD
has existed for over a century and is routinely used in business and academia .... Administration ..... specifics of the data sources are outline in Appendix A. This.

Proceedings Template - WORD
the technical system, the users, their tasks and organizational con- ..... HTML editor employee. HTML file. Figure 2: Simple example of the SeeMe notation. 352 ...

Proceedings Template - WORD
Dept. of Computer Science. University of Vermont. Burlington, VT 05405. 802-656-9116 [email protected]. Margaret J. Eppstein. Dept. of Computer Science. University of Vermont. Burlington, VT 05405. 802-656-1918. [email protected]. ABSTRACT. T

Proceedings Template - WORD
Mar 25, 2011 - RFID. 10 IDOC with cryptic names & XSDs with long names. CRM. 8. IDOC & XSDs with long ... partners to the Joint Automotive Industry standard. The correct .... Informationsintegration in Service-Architekturen. [16] Rahm, E.

Proceedings Template - WORD
Jun 18, 2012 - such as social networks, micro-blogs, protein-protein interactions, and the .... the level-synchronized BFS are explained in [2][3]. Algorithm I: ...

Proceedings Template - WORD
information beyond their own contacts such as business services. We propose tagging contacts and sharing the tags with one's social network as a solution to ...

Proceedings Template - WORD
accounting for the gap. There was no ... source computer vision software library, was used to isolate the red balloon from the ..... D'Mello, S. et al. 2016. Attending to Attention: Detecting and Combating Mind Wandering during Computerized.

Proceedings Template - WORD
fitness function based on the ReliefF data mining algorithm. Preliminary results from ... the approach to larger data sets and to lower heritabilities. Categories and ...

Proceedings Template - WORD
non-Linux user with Opera non-Linux user with FireFox. Linux user ... The click chain model is introduced by F. Guo et al.[15]. It differs from the original cascade ...

Proceedings Template - WORD
temporal resolution between satellite sensor data, the need to establish ... Algorithms, Design. Keywords ..... cyclone events to analyze and visualize. On the ...

Proceedings Template - WORD
Many software projects use dezvelopment support systems such as bug tracking ... hosting service such as sourceforge.net that can be used at no fee. In case of ...

Proceedings Template - WORD
access speed(for the time being), small screen, and personal holding. ... that implement the WAP specification, like mobile phones. It is simpler and more widely ...

Proceedings Template - WORD
effectiveness of the VSE compare to Google is evaluated. The VSE ... provider. Hence, the VSE is a visualized layer built on top of Google as a search interface with which the user interacts .... Lexical Operators to Improve Internet Searches.