5d world picmap: imagination-based image search ...

Viewer
Transcript

5D WORLD PICMAP: IMAGINATION-BASED IMAGE SEARCH SYSTEM WITH SPATIOTEMPORAL ANALYZERS

Nguyen Thi Ngoc Diep Faculty of Environment and Information Studies Keio University 5322 Endo Fujisawa Kanagawa 252-8520 JAPAN Submitted in partial fulfillment of the requirements for the degree of Bachelor

Supervisor: Yasushi Kiyoki

Abstract of Bachelor’s Thesis

5D WORLD PICMAP: IMAGINATION-BASED IMAGE SEARCH SYSTEM WITH SPATIOTEMPORAL ANALYZERS An imagination-based image search system with spatiotemporal analyzers, “5D World PicMap,” is a new computation environment for discovering knowledge by user’s imagination and spatiotemporal information in images. 5D World PicMap system provides users with five dimensional information overviews: spatiotemporal four dimensions and the degenerated color dimension. The main feature of this system is to dynamically create various context-dependent patterns of pictorial stories according to a user’s viewpoints and imagination processes. A dynamic query creation method is a novel approach to represent a user’s imagination process. This method enables a user to create dynamically a query, which reflects the user’s intention, impression and memory (imagination) as his/her own context existing only his/her mind by color-based combinations of existent images in the real world. The main feature of this query creation method is to extend analytical functions for image search, not only in retrieval processing, but also in query manipulation, according to the color-based combination of images with common features. The proposed method consists of five operations for creating image-query vector from combinations of images, which are “plus”, “intersection”, “accumulation”, as local operations and “minus” and “difference” as global operations. Spatiotemporal visualization functions for image data with time-series multi-geographical views are proposed as the new approaches to discovering latent knowledge in image data. Spatial analyzer functions evaluate geolocation information of all the target image data from Exif data, and map retrieved image to the world map. Temporal analyzer functions allow user to overview the world map by time-series or easily view the change of place by temporal granularity control. The 5D World PicMap system enables general users to retrieve scenery images and information of places where they have never been according to their imagination; and researchers to analyze the synchronism and the time-series variation of image data with spatiotemporal information in the research fields, such as geoscience, environmental analysis, cross-cultural art comparison, etc. In this thesis, several qualitative and quantitative experimental results are also presented to examine the feasibility and applicability of the 5D PicMap system. Keywords: Multimedia system, Imagination-based image retrieval, Exif metadata, Imagequery creation, Spatiotemporal analyzer, Data visualization

Nguyen Thi Ngoc Diep Faculty of Environment and Information Studies, Keio University

ii

Acknowledgements This thesis would not have been possible without the guidance and the help of professors, advisors and friends who in one way or another contributed and extended their valuable assistance in the preparation and completion of this dissertation. First and foremost, I would like to show my gratitude to my supervisor, Professor Yasushi Kiyoki for his professional and technical advice, comments and encouragement. I am heartily thankful to Assistant Professor Shiori Sasaki and Lecturer Shuichi Kurabayashi, who took care of me lot since I joined Multimedia database Laboratory (MDBL), for their great encouragement, guidance, advice and support from the initial to the final level enabled me to complete this thesis. Then, I would like to thank Mr. Nguyen Dinh Toan, Mr. Boris Anrnoux, Mr. Ali Ridho Barakbah and Ms Nguyen Thi Nhung for valuable and technical comments on this thesis. They have always encouraged and helped me. I would like to thank my “Vietnamese People at Keio” friends for their sharing and encouragement during two years in Keio SFC campus. I also want to show my gratitude to JICE, especially to Ms Hiroko Matsukura, for her care and support in living in Japan. Last but not the least, I appreciate my beloved parents for the consecutive support of my daily life, my older sister for her support and checking this thesis’s spelling mistakes, and my special friend BULL for his care, support and great encouragement.

January 11, 2011 Nguyen Thi Ngoc Diep

iii

Table of contents 1 Introduction 1.1 Background and Motivation …………………………………………….……1 1.2 Proposed 5D World PicMap System and Key Features ………………………3 1.3 Advantages …………………………………………….………………...……4 1.4 Related Works …………………………………………….……………..……5 2 Method of Query Creation and Spatiotemporal Analysis 2.1 Query Creation Method 2.1.1 Main Features and Operations ……….……………………………..……6 2.1.2 Color-based Combination Settings for Image Data ………..……………9 2.2 Spatiotemporal Analyzers ……………………………………………………10 3 Implementation of 5D World PicMap System ……………………………………..13 4 Experiments 4.1 Exp.1: Dynamic Query Creation Method 4.1.1 Examination of the Feasibility of Query Creation Method ………...….17 4.1.2 Qualitative and Quantitative Experiments by User Comparison……….23 4.2 Exp.2: Dynamic Query Creation and Spatiotemporal Visualization 4.2.1 Dynamic Query Creation and Spatial Visualization …………………...24 4.2.2 Temporal Visualization With Time Granularity Control ………………26 4.2.3 Overview of the Database by Spatiotemporal Visualization …………..27 4.3 Examination of the Performance of 5D World PicMap System ……………..28 5 Discussions …………….…………………………………….……………….….…31 References ………………..………………………………………………….…….…34

iv

List of tables 1.

Table 1. Operations of our method ……………………………………………..10

2.

Table 2. Results by Plus operation: Images for query creation, image histograms, a combined query histogram and the search results ………………………………18

3.

Table 3. Results by Intersection operation: Images for query creation, image histograms, a combined query histogram and the search results …………..……19

4.

Table 4. Results by Accumulation operation: Images for query creation, image histograms, a combined query histogram and the search results ………………..20

5.

Table 5. Results by Minus operation: Images for query creation, image histograms, combined query histogram and the search results ………………………….……21

6.

Table 6. Results by Difference operation: Images for query creation, image histograms, a combined query histogram and the search results …………..……22

7.

Table 7: Costing time for collecting and analyzing image data …………...……29

8.

Table 8: Comparing implemented kd-tree and kmeans algorithms on searching nearest neighbors with same precision ……………………………………..……29

9.

Table 9: Costing time to execute PCA on whole image data (52627 images)……30

v

List of figures 1. Figure 1. Color-Image set Matrix C

……………………………………….7

2. Figure 2. Relations between dataset and operations …………………………9 3. Figure 3: System architecture of the 5D World PicMap system ……………13 4. Figure 4: Tabs and main functions in the web interface of the 5D World PicMap system ……14 5. Figure 5. F measure evaluation on Experiment 4.1.1 ………………………23 6. Figure 6. User Comparison: Performance Evaluation ……………………..24 7. Figure 7: 5D World PicMap interface and the spatiotemporal visualization results of image search by a query created by three images “sunset with purple cloud on dark sky” using “plus” operation ……………………………….…25 8. Figure 8: The spatial visualization results of image search by a query created by 2 images “sunset without blue sea” using “minus” operation ………………26 9. Figure 9: The time-series changes at Burgh Island observed by time granularity control ………………………………………………………………………27 10. Figure 10: Overview of painting image collection by timeline (European area) …………………………………………………………………………….…28

vi

Chapter 1 Introduction

1.1 Background and Motivation The adage “a picture is worth a thousand words” refers to the idea that a complex idea can be conveyed with just a single still image. It also can be said that image data has a latent prospect to discover the new knowledge. Especially in the research fields where images are taken as an important role in analyzing the data, such as geoscience, environmental analysis, history, cultural anthropology, cross-cultural studies, etc., retrieving the appropriate images efficiently and analyzing the images by the spatiotemporal aspects are very important. In the field of information retrieval, Sasaki et al. [9,10] have been proposed a document analysis system with semantic and spatiotemporal analyzers. In this paper, we extend this system to image analysis and propose an imagination-based image search system with dynamic query creation and spatiotemporal analyzers, 5D World PicMap, which enables users to retrieve images by their imagination and acquire the overview of the whole target image data on a 4D World map. As rapidly growing multimedia technology, a large number of various types of multimedia data such as images, audio and motion pictures are created and distributed widely on WWW. Especially image database is increasing in quality by the development of picture captured devices and storage or sharing such as communitybased multimedia sharing sites Flickr [22], Picasa [29], Webshots [30], Photobucket [31], etc. To increase opportunities to access to enormous amounts of image data efficiently and appropriately, there are two major approaches: text-based image retrieval and content-based image retrieval (CBIR). The text-based approach is to implement image retrieval by attaching keywords to images or retrieving text around images. Image

1

search systems such as Imagery [32], Google Images Search [34], Yahoo! Image Search [34] adopts this approach. The other approach, content-based image retrieval is increasing in the very wide domains [6] and a lot of systems have been developed in the academia and the industry. This approach has two significant query techniques: query-by-an-image and query-by-sketch. Query-by-an-image search systems such as PicToSeek [14], SIMPLIcity [15], TinEye [19] and GazoPa [25] have been implemented by extracting low-level visual features such as color histograms, shapes, textures and structure to calculate the “similar” images. Query-by-sketch systems have been constructed to express the user’s intentions by sketching features of the image, which the user wants to retrieve [16]. Some recent online systems such as Multicolr Search Lab [23] and GazoPa [25] demonstrate this approach. However, a system to express the user’s intentions directly has not been established even in these conventional query methods. From the point of view expressing intentions by using images, query-by-an-image is limited to a context shown by one sample input, and query-by-sketch requires detailed drawings to users for expressing their precise intentions. Thus, an intelligent multimedia search system, which responds to human’s imagination process as adequately as possible, - leads to a new computational environment. Each captured image has also latent knowledge about itself, such as places, date, type of camera, etc.

Analyzing image data to discover new knowledge without

considering the information may be a missing. There is a convenient way to work with the information that is working with exchangeable image file format (Exif). Exif is a specification for the image data with the addition of specific metadata tags. The metadata tags defined in the Exif cover a broad spectrum, but the most used tags in recent academic researches and online applications are geolocation tags. For example, a method to use geotagged images for estimating geographic information from a single image [7], and a method to analyze images by text tags and image data [5] have been proposed. Some popular Web applications using geotagged images such as Panoramio from Google [24], MyPicsMap [20], or map on Flickr [21], etc. give users a new way to explore the image data more visual by mapping images onto world maps. These

2

researches and applications motivate us using not only geolocation tags, but also time tags, because of the requirements of changing image data viewpoints by time-series.

1.2 Proposed 5D World PicMap System and Key Features

In this thesis, we present the 5D World PicMap system with imagination-based image search and spatiotemporal analyzers. The 5D World PicMap system provides users with five-dimensional information overview: spatiotemporal 4 dimensions and the degenerated color dimension. The system consists of two kinds of main functions: (1) color-based image analytical functions by dynamic query creation method, and (2) spatiotemporal visualization functions for image data with time-series multigeographical views. The main feature of this system is to dynamically create various context-dependent patterns of pictorial stories according to a user’s viewpoints and imagination processes. First, we present a novel method to create an image-based context query by colorbased combination of image in the real world. Our system extends analytical functions for image search, not only in retrieval processing but also in query manipulation. In this system, user’s intention, impression and memory are represented as context query by the combinations of colors. Though a color palette is available to generate simple color combinations [23], image data are more useful for creating complex color combinations. Moreover, image data in the real world such as photos of scenery are more effective to represent user’s context because those are highly connected to human’s impressions and memories. A user of this system creates a context query that represents his/her own inner intention, impression and memory for specific things or places, and retrieve unknown but desired images with the associated information according to his/her own imagination by using the operations in our method.

3

In this system, two steps define user’s imagination process. The first step is to select multiple images. In this step, a user selects a set of images, which includes desired colors (“With” image data set) and another set of images, which includes undesired colors (“Without” image data set) to represent their own intentions, impressions and memories. The second step is to create the combinations of colors. In this step, a user creates a query by using operations equipped to this system. To create image-query, our system provides several operations, which are plus, intersection, accumulation, difference and minus. Plus, intersection, accumulation operations can be used for “With” image data set and difference and minus operations can be used for “Without” image data set. By the combinations of these operations and image data sets, a user can create a query to represent his/her complex context dynamically. Second, we present spatiotemporal analyzers that manage the images visually and effectively for the analysis in broad range of research fields, we propose spatiotemporal analyzers which utilize exchangeable image file format (Exif). Compared to these researches and applications, in our system, we use not only geolocation information but also the time tags to construct spatiotemporal analyzers for visualizing the image data and text tags: mapping images onto a set of chronologically-ordered world maps. In our system, spatial analyzer functions map retrieved image to the world map, temporal analyzer functions allow user to easily view the change of place by temporal granularity control.

1.3 Advantages We have implemented our system as an easy-to-use Web application. This application can be used to support general users to find place they have never been by their imagination for travelling, to support academic fields such as nature geoscience learning and researching by observing the changing of image data by space and time, to support cross-culture comparison in the fields of painting arts, aboriginal people’s customs and to support environmental issues analysis. In this thesis, we also present

4

several experimental results by using data of over two million images (scenery photos of nature and painting arts in an online museum), to examine the feasibility, effectiveness and applicability of the system as a real application.

1.4 Related Works As related work, we refer to several researches on the relationships between color combinations and impressions. A psychological research on color combinations and human impressions [4], a semantic image retrieval method using the knowledge on colors and impressions [2], an impression metadata extraction method from image data [17] have been proposed already. Based on these previous researches, we also select color features as an important element to represent the user’s complex imagination. Based on culture-based image-query creation method [1] and shape and color features based combined image-query creation method [2], we developed our system as a web application and implemented system to be almost automatic executing. Referring to researches of spatiotemporal analyzing such as 4D World Map system of Sasaki et al. [9,10], we analysis image data with query creation method for image retrieval and utilizing Exif metadata tags for spatiotemporal visualization.

5

Chapter 2 Methods of query creation and spatiotemporal analysis 2.1 Query creation method Our image-query creation method is based on the following observation results: (1) user’s context is too complex to be represented easily by a single image; (2) a part of user’s context can be represented by a set of multiple images in the real world such as photos of scenery because those images are highly connected to human’s impressions and memories, and (3) color feature of images is effective to create user’s context query because it evokes human’s imagination.

2.1.1 Main features and operations The main feature of our method is to create dynamically a query, which reflects the user’s intention, impression and memory as his/her own context existing only his/her mind by color-based combinations of existent images in the real world. First, the system extracts color histograms as perceptual features of a set of images, which a user input as his/her own context. Second, the system generates an imagequery according to the color-based combinations of images with common features, which cannot be extracted from a single image. Third, the system retrieve target images from collected image database according to the user’s complex context, and provide associated information such as desired images and information of places where he/she never been to. After calculating the area ratio of color distribution of each sample image data, the image-query is created by the combination of multiple image sets and the following operations: Operation 1 to create vector Qplus by the sum of each color bin from all the sample images in a set to increase color-features, Operation 2 to create vector Qintersection by the commonly-used color from all the sample images in a set, Operation 3 to create vector Qaccumulation by taking the dominant colors among all images in a set, Operation 4

6

to create vector Qminus by decreasing color-features of a single sample image to any other sample images in a set, Operation 5 to create Qdifference by the colors less frequently used in a single sample image compared with any other sample images in a set. For given n sample images (sl1, sl2,…, sln: l is a set identifier) which represent p sets of images (l1,l2,…,lp), color histograms are generated. The generated m-bin histogram consists of m basic color (c1,c2,…,cm). The process can be described as a function fcolor_extractor. fcolor_extractor(image)→{c1,c2,…,cm}

...(F1)

The m-bin histogram for each sample image representing each image set is defined as color-image set vector sk={slk1, slk2,…, slkm: l is a set identifier}, and n by m matrix consisting of the color-image set vectors as row vectors is defined as color-image set matrix C as shown in Figure 1. In other words, the color-image set matrix C represents the color features of each image set as numerical values (q11,q12,…,qnm) of color histograms of n sample images data. The color histogram process can be described as a function fcombinator. fcombinator((color-image matrix))→ {q1,q2,…,qm}

…(F2)

Figure 1. Color-Image set Matrix C After calculating the area ratio of color distribution of each sample image data, the image-query is created by the combination of multiple image sets and the following five operations. 

Operation 1: creation of Qplus by the sum of each color bin from all the sample images in a set to increase color-features

7

……(1) 

Operation 2: creation of Qintersection by the commonly-used color from all the sample images in a set Qintersection=(min(q11,…,qn1),…,min(q1m,…,qnm))



……(2)

Operation 3: creation of Qaccumulation by taking the dominant colors among all images in a set Qaccumulation=(max(q11,…,qn1),…,max(q1m,…,qnm))



…….(3)

Operation 4: creation of Qminus by decreasing color-features of a single sample image to any other sample images in a set Qminus= (q11-q21-…-qn1,q12-q22-…-qn2,…, q1m-q2m-…-qnm) if (q1k-q2k-…-qnk<0) then q1k-q2k-…-qnk =0 (k=1..m)



……(4)

Operation 5: creation of Qdifference by the colors less frequently used in a single sample image compared with any other sample images in a set Qdifference = (q1,q2,…,qm)

……(5)

if (q1k>Hdif.max(q2k,…,qnk) then qk= q1k else qk=0 (k=1..m) while Hdif is a constant (Hdif >=1) In this method, we set two kinds of sample image data set: “With” set composed of desired images which user input, and “Without” set composed of undesired images which user input. After calculating the area ratio of color distribution of each sample image data, a query vector Q is generated by the combinations of Operation 1- 5. After creating a query histogram (q1, q2,…, qm), normalization process will be applied to it to create normalized query histogram, and the correlation between a query vector and target image vector is calculated in a color space.

8

qj =

qj

( j =1..m)

m

"q

i

i=1

2.1.2 Color-based combination settings for image data ! In our system, we set two kinds of image data set: “With” set and “Without” set. After calculating the area ratio of color distribution of each sample image data, an imagequery is generated by the combination of two kinds of operations: “Local” operation and “Global” operation. The “With” image set is composed of desired images which user input. The colors used in this image set represent a user wants to use for creation of his/her imagination about things or places. The “Without” image set is composed of undesired images which user input. The colors used in the image set represent the user want to delete for creation of his/her imagination. Each image set consists of multiple images. By each of the “With” and “Without” image sets, the “Local” operation generates a “sub-query”, respectively. Then, the “Global” operation generates an integrated query by two sub-queries. The relations between dataset and operations are represented in Figure 2, and the proposed operations are listed in Table 1.

Figure 2. Relations between dataset and operations

9

Table 1. Operations of our method “Local” operation

“Global” operation

Qplus(Set)

Qminus(Sub-query1, Sub-query2)

Qintersection(Set)

Qdifference(Sub-query1, Subquery2)

Qaccumulation(Set)

Qintersection(Sub-query1, Subquery2)

The “local” operation can be applied to color-image vector matrix C to create subquery vector. The “global” operation can be only applied to two sub-query vectors s1(q11,q12,…,q1m) and s2(q21,q22,…,q2m) to create image-query histogram (q1,q2,…,qm).

2.2 Spatiotemporal analyzers Here we overview our dynamic query creation operations by using multiple sample image data, which has been proposed already (Sasaki et al. 2009, Sasaki et al. 2010). The details are described in the published paper (Sasaki et al. 2009, Sasaki et al. 2010). When a set of image data is selected as target data to analyze, and an image-query by color combinations and conditions of time and space are given by a user, Spatiotemporal analyzer evaluates all the target image with temporal and spatial conditions respectively after user's inquiry, and then, integrates every result to output. (1) Spatial analyzer By this function, geolocation information in Exif data is converted to latitude & longitude information to be mapped on a 4D world map sets. A Reverse Geocoder function converts latitude & longitude information to addresses and stores them as relational database. These functions can be described as the following: fspatiotemporal_extractor (Exif) →{latitude,longitude,datetaken}

10

……(F3)

freverse_geocoder(latitude,longitude)→{address}

……(F4)

For the spatial comparison, a function fnear_address selects and fixes the position of each image data on a map. The function can be represented as the following, where gc_dist is Great Circle distance, and D is the longest distance (km). fnear_address(address,D)→{addr|gc_dist(address,addr)<=D} fnear_address(latitude,longitude,K) →{(lat,lng)|gc_dist(latitude,longitude,lat,lng)<=D} …...(F6)

…(F5)

(2) Temporal analyzer Temporal information also extracted by F3 as certain dates and time with format mm/dd/yyyy hour:minute:second or Unix TimeStamp is stored in relational database. For the temporal comparison, a function fnear_time can be described as the following, where time, T, time_duration has the same time unit. fnear_time(time,T)→{time_duration=(time-T,time+T)}

……(F7)

We designed a Temporal granularity controller, which enables user to switch the temporal granularity such as day/month/year for temporal scale of analysis. Users can explore image data in one specific place by time series. day month year

fday(address) →{images listed by each day} fmonth(address)→{images listed by each month} fyear(address) →{images listed by each year}

……(F8) ……(F9) ……(F10)

(3) Spatiotemporal projector This function for projection select the visualization results as a global evaluator.

11

fprojector(spatio-temporal context,spatio-temporal metadata) →{sub-spatiotemporal metadata} …… (F11)

12

Chapter 3 Implementation of 4D PicMap System

We have implemented a prototype system of the 5D World PicMap as an easy-to-use Web application. The system architecture is shown in Figure 3. Output: 5D World PicMap

Input: multiple images + spatiotemporal context

Client

Color Context Interpreter

Server

Spatiotemporal Context Evaluator (F4,5,6,7)

Visualization

Global Analyzer

Combined query creator (F2)

Metadata Extractor

DB

Spatiotemporal projector (F11)

… Time g ranularity controller (F8,9,10)

t

Correlation Calculator Principal Component Analysis

Spatiotemporal Extractor (F3)

Color extractor (F1)

Image Crawler

Ranked results

Image + Exif Database

Metadata

(Color, Space, Time)

Figure 3: System architecture of the 5D World PicMap system The 5D World PicMap is implemented as a Web application with three-tie architecture Client/Server/Database.

(1)The Client side: In the client side, to provide a flexible user interface, we implemented several interface tabs: the main “Try it”, the “Expert Area” and the “Overview” (Figure 4).

13

Figure 4: Tabs and main functions in the web interface of the 5D World PicMap system The “Try it” tab is the basic interface of our system for general users. Users can input images, specify the spatiotemporal context and select operation for querying the targets. The system maps equivalently each result to the map and allows user view the results by timeline with granularity controller. The “Expert Area” is the extension interface of the “Try it” interface. It allows specialist users to execute more complex combination of query and control the precision/costing time when querying targets. The “Overview” tab allows user to view the whole image databases by spatiotemporal context or other attributes of images in relational database. (2)The Sever side: There are four important components in Server side: Metadata extractor, User’s context interpreter , Global analyzer and Visualizer. 

Metadata extractor:

14

Image Crawler collects Exif data from the Internet. For each Exif data collected, Exif data extractor function will extract the text metadata and index and store the image and this text information to a relational database DB1. This function also executes image dimensional resizing before storing to the database to reduce the database storage memory. After collecting images, two functions color extractor and spatiotemporal analyzer will be applied to create meta-level database (DB2). Color extractor function is a function to extract color features of each image, and generate color histogram of it. In this function, we use a method to extract color histogram with 284 bins (252 chromatic colors and 32 monochrome colors) [13]. Spatiotemporal analyzer function is a function to geocode/reverse geocode for geo information of each image indexed in DB1 by functions (F4.1) and (F4.2) in section 2.3.1. 

User’s context Interpreter:

This

components

consists

two

functions:

Context

interpreter

and

Spatiotemporal Context Evaluator. The Context Interpreter is the query processing function to specify user semantic context and spatiotemporal context. The semantic context is represented by inputted images and selected operations.

For each

inputted image, the Color Extractor will extract the color histogram and transfer them to Combined Query Creator. The Spatiotemporal Context Evaluator is the function to evaluate the user’s spatial information by (F4.1, F4.2) and temporal information by (F5, F6, F7). This information is used to specify “space” and “time” that user desired to retrieve images and is transferred to Spatiotemporal Projector function. 

Global analyzer:

This component executes combining color histogram by equivalent operation and projecting spatiotemporal information (Spatiotemporal Projector by (F11)) to database to generate a sub-space of database to retrieve result images. The image retrieval process will be applied after Principal Component Analysis and this will

15

find the nearest neighbors to query histogram and return results to Results Visualizer. Principal component analyzer is implemented by using Modular toolkit for data processing MDP [11]. This function is used to reduce the tree building time and less memory to store the query tree. The Nearest Neighbors finder function is implemented by using Fast Library for Approximate Nearest Neighbors FLANN [12]. 

Visualizer:

This component is constructed to provide user a more visual observation of database and retrieved results by spatial and temporal information. Each result image and associated information with it will be used to map on the map. Spatial information provides address or latitude/longitude information for mapping image to a place on the map. Temporal information provides date taken time of image to control time granularity by functions (F8, F9, F10).

16

Chapter 4 Experiments 4.1 Experiment 1: Dynamic query creation method 4.1.1 Examination of the feasibility of query creation method In this experiment, to verify the feasibility of our method of query creation and compare the different meaning of each operation, we performed qualitative experiments with the image data of scene and flower collected from the Internet. The number of target image data is 225. The types of image and included perceptual colors are: flower (yellow, orange, pink), sky (blue), sea (blue), sunset (orange), field (yellow), mountain and forest (green). With queries to which we applied each operation, we performed image retrieval and evaluated the correctness of the results by visual judgment. We calculated precision rate, recall rate and F-measure in the top ten result images. For this experiment, we selected Cosine Distance for distance calculation between color histograms. The search results by each operation are shown in Table 2, Table 3, Table 4, Table 5 and Table 6. Table 2 shows the image search results by using Plus operation for query creation. The input images are an image of blue sky with white clouds and an image of yellow flower with green background for “With” (desired) images. The created and combined image-query is shown in the visualization of color histogram. The histogram is created by addition operation on mathematics (Arithmetic operation), which means the combination of values of color bins in two histograms. The search results show that nine images of top 10 are correct because these images are including blue, white, yellow and green in the reasonable proportion.

17

Table 2. Results by Plus operation: Images for query creation, image histograms, a combined query histogram and the search results

Plus

Combined query histogram Blue and yellow are added for “yellow flower on blue sky”

Result precision = 9/10 (90%) Recall = 9/27 (33%) F-measure = 0.49

Table 3 shows the image search results by using Intersection operation for query creation. The input images are an image of yellow flower with blue sky and an image of yellow flower with green background for “With” (desired) images. The created and combined image-query is shown in the visualization of color histogram. The histogram is created by intersection operation on set operations, which means the combination of the same (common) color bins in two histograms. The search results show that seven images of top 10 are correct because these images are including yellow, a little green in the reasonable proportion.

18

Table 3. Results by Intersection operation: Images for query creation, image histograms, a combined query histogram and the search results

Intersection

Combined query histogram Yellow is extracted for “only yellow flower”

Result precision = 7/10 Recall = 7/14 F-measure = 0.58

Table 4 shows the image search results by using Accumulation operation for query creation. The input images are an image of sunset sky and an image of red leaf in blue sky for “With” (desired) images. The created and combined image-query is shown in the visualization of color histogram. The histogram is created by union operation on set operations, which means the combination of the dominant values of color bins in two histograms. The search results show that ten images of top 10 are correct because these images are including red, orange, and dark brown in the reasonable proportion and red leaf and sunset in sense.

19

Table 4. Results by Accumulation operation: Images for query creation, image histograms, a combined query histogram and the search results

Accumulation

Combined query histogram Many red-orange is mixed for “sunset sky and red leaves”

Result precision = 10/10 (100%) Recall = 10/21(48%) F-measure = 0.65

Table 5 shows the image search results by using Minus operation for query creation. The input images are an image of an image of yellow flower with green leave background for “With” (desired) image and an image of green leaf for “Without” (undesired) images. The created and combined image-query is shown in the visualization of color histogram. The histogram is created by subtraction operation on mathematics (Arithmetic operation), which means the combination of subtracted values of color bins of “without” histogram from “with” histogram, and setting minus values

20

as 0 values. The search results show that eight images of top 10 are correct because these images are including yellow and green in the reasonable proportion. Table 5. Results by Minus operation: Images for query creation, image histograms, combined query histogram and the search results

Minus

Combined query histogram Green is subtracted for “less green”

Result precision = 8/10 (80%) Recall = 8/14 (57%) F-measure = 0.67

Table 6 shows the image search results by using Difference operation for query creation. The input images are an image of an image of sunset on the sea with blue sky for “With” (desired) image and an image of sunset sky for “Without” (undesired) image. The created and combined image-query is shown in the visualization of color histogram. The histogram is created by complement operation on set operations, which means the combination of color bins existing only in “With” image histogram. The

21

search results show that eight images of top 10 are reasonable because these images are including blue color in the reasonable proportion.

Table 6. Results by Difference operation: Images for query creation, image histograms, a combined query histogram and the search results

Difference

Combined query histogram Only blue is remained for “blue sky or sea”

Result precision = 8/10 (80%) Recall = 8/25 (32%) F-measure = 0.46

The evaluation of F-measure in this experiment is shown in Figure 5. The figure show that the mean of F-measure is 0.57 and the query by Accumulation operation and Minus operation leaded good results at least in this experiment.

22

Figure 5. F measure evaluation on Experiment 4.1.1

4.1.2 Qualitative and Quantitative Experiments by User Comparison In this experiment, we apply performance evaluation by users [8]. This method of performance evaluation is User Comparison, which is known as an interactive method. In this method, the users judge the success of a query directly after the query. We requested the users to evaluate the search results by 50 queries and asked them about the “precision” as the success of a query at the top 20 result images. The users were able to use any query image and apply any operations, which represent their imagination. As a result in figure 6, the mean of precision was 72%, which shows that our search system with proposed dynamic query creation method is reasonable.

23

Figure 6. User Comparison: Performance Evaluation

4.2 Experiment 2: Dynamic query creation and spatiotemporal visualization To examine the feasibility and applicability of our 5D World PicMap system, we performed several experiments with scenery photos with Exif and art images with spatiotemporal information in a Web gallery. Using Image Crawler function, we collected 53627 nature images from Flickr [22] and 26082 art images from Web Gallery of ART [26] as the image database. The nature images were collected from Flickr with the best effort of removing unrelated images by inserting negative tags such as “-people -architecture -animal -outdoors...” 4.2.1 Dynamic query creation and spatial visualization In this case, we input three images collected from Flickr: (1) sunset, (2) purple cloud and (3) dark sky as parts of the imagination and used “plus” operation to create a combined query histogram. The forty nearest neighbors to the combined query

24

histogram are returned and mapped to the world map. The result is shown in Figure 7. In “With” box, three input images are shown. The selected operation by a user is “plus”. “Combination” area shows combined query histogram. “Results” area shows the listed 40 result images. In the map, the spatial information of each retrieved image with markers. The number of markers represents the ranking. One marker selected in the map shows an image with the information of place and date of the image. “Time” area shows the time-series rendering of January 26th, 2003.

Figure 7: 5D World PicMap interface and the spatiotemporal visualization results of image search by a query created by three images “sunset with purple cloud on dark sky” using “plus” operation. Figure 7 shows that the system returned result images and spatial visualization for these images onto the map.

The results generally have the impression as given

imagination; and the map brings a visual way to discover the result images. Figure 8 shows another example case of dynamic query creation with spatiotemporal visualization. We input two images: (1) blue sea with sunset, and (2) blue sea with blue

25

sky as parts of the imagination and used “minus” operation to create a combined query histogram without blue colors. Figure 5 shows images of sunset in the search results. This means that the spatial mapping results of the image search results vary by different imagination reasonably in our system.

Figure 8: The spatial visualization results of image search by a query created by 2 images “sunset without blue sea” using “minus” operation. 4.2.2 Temporal visualization with time granularity control From the experimental results of 4.2.1 shown in Figure 9, we selected an image ranked as 6th as an example to perform temporal visualization with time granularity control function. The place where the image was taken was Burgh Island. Figure 6 shows the temporal visualization results by “time”, “day” and “year”.

26

Figure 9: The time-series changes at Burgh Island observed by time granularity control As shown in this figure, a user can analyze the time-series change of scenery images of a specific place, if the user set the scenery images as target data for image retrieval. These results indicate that our system can also be applied to the data analysis in the research fields of geosciences or environmental analysis. 4.2.3 Overview of the database by spatiotemporal visualization To examine the advantages of our overview function using spatiotemporal visualization, we performed several experiments using art images by specifying a timeline to view the whole database mapped to the map. We selected “painting” as type of art images, and set 26082 art images from Web Gallery of ART [26] as target image data.

27

Figure 10: Overview of painting image collection by timeline (European area) Figure 10 shows the mapping results for four timeline: 1351-1400, 1401-1450, 1451-1500, 1501-1550. By this figure, we can observe that from 1351-1450, the red colors were used in general and from 1451-1550, the artists seems using more dark colors. These visualization results indicate the possibility that our system can be applied to the study area such as cross-cultural studies, art analysis, history and cultural anthropology.

4.3 Examination the Performance of the 5D World PicMap System In this section, we want to examine the performance of the system by costing time for each process. These costing time are calculated for whole nature images database and for each image. Table 7 shows the costing time for collecting and analyzing the Exif data.

The

collecting time took a long time because of the need of removing duplicate images and depending on the connection between image crawler program and Flickr server.

28

Another costing time for resizing image, analyzing Exif data and extracting color histogram are very fast. It’s possible to say that our system can be used in general personal computer; users can collect, analyze data by themselves. The required storage memory, time processing is acceptable.

Table 7: Computation time for collecting and analyzing image data Time cost

Whole

image

data

set

Per one image

(53,627 images) Collecting

10 hours

Extracting Exif metadata

3.7 seconds

0.000069 seconds

Resizing image to 32x32

466.3 seconds

0.0087 seconds

Generatingcolor histogram

44.2 seconds

0.00082 seconds

Table 8 shows the tree building time and querying time comparison between method using FLANN with PCA and without PCA. With PCA of reducing color histogram to 150 colors, the tree building time is reduced more than a half and the querying time is less than no using PCA.

Table 8: Comparing kd-tree and kmeans algorithms on searching the K-nearest neighbors with same precision (K=40) Algorithm Tree building kd tree

time Querying time Tree building

kmeans

time Querying time

With PCA (remaining 150 important colors)

Without PCA

0.34 seconds

0.82 seconds

0.00060 seconds

0.00082 seconds

2.9 seconds

8.68 seconds

0.00052 seconds

0.00067 seconds

29

Table 9 shows the costing time to execute PCA on whole image data. By this result and the results showed in table 2, we can see that PCA does not consume so much time but brings more merits of reducing costing time and tree building memory. Table 9: Computation time to execute PCA on whole image data (52627 images) Training pca node by 1000 images

0.026 second

Executing pca

0.70 second

The results in table 7, 8, 9 show our system is reasonable on executing time.

30

Chapter 5 Discussions In this thesis, we have presented an imagination-based image search system with spatiotemporal analyzers, 5D World PicMap. The main feature of this system is to dynamically create various context-dependent patterns of pictorial stories according to a user’s viewpoints and imagination processes. 5D World PicMap system provides users with five-dimensional information overview: spatiotemporal 4 dimensions and the degenerated color dimension. The applicability of this system vary from general image search or travel information search to analytical tools for the various research fields, such as geoscience, environmental analysis, cross-cultural art comparison, etc. The presented dynamic image-query creation method for imagination-based image search system is a novel approach to represent a user’s imagination process. This method can be used to extend analytical functions for image search, not only in retrieval processing, but also in query manipulation, according to the color-based combination of images with common features. Dynamic image-query creation method enables a user to create dynamically a query, which reflects the user’s intention, impression and memory as his/her own context existing only his/her mind by colorbased combinations of existent images in the real world. The proposed method consists of five operations for creating image-query vector from combinations of images, which are “plus”, “intersection”, “accumulation”, as local operations and “minus” and “difference” as global operations. We have implemented the web application for this system, with a large amount of image data that is crawled automatically from the Internet. We have showed the feasibility and effectiveness of the proposed method by experimental results when using this system. The experimental results showed that our proposed method is reasonable and spatiotemporal visualization provide a novel way to users to discovering new knowledge from image data.

31

As future work of this research, we will use more features of image such like shape or structure to allow user to represent his/her imagination context more sufficient and easier. We also plan to develop the 5D World PicMap system to be a real application and do analyzing the effectiveness and usability of the system based on users in the Internet. We believe that the 5D World PicMap with imagination-based image search and spatiotemporal analyzers is a novel system that leads to a new computation environment for discovering knowledge by user’s imagination and spatiotemporal information in images. While our results look quite promising, much work remains to be done. We hope that this work might jump-start a new direction of research in imagination-based spatiotemporal data mining. This thesis is the developed research works from conference paper [35] that had been presented in IASTED SEA 2010 conference.

32

References

[1] Shiori Sasaki, Yoshiko Itabashi, Yasushi Kiyoki, Xing Chen, An Image-Query Creation Method for Representing Impression by Color-based Combination of Multiple Images, Frontiers in Artificial Intelligence and Applications; Vol. 190 Proceeding of the 2009 conference on Information Modelling and Knowledge Bases XX, p. 105-112, 2009. [2] Yasuhiro Hayashi, Yasushi Kiyoki, Xing Chen, A Combined Image-Query Creation Method for Expressing User’s Intentions with Shape and Color Features in Multiple Digital Images, The 20th European-Japanese Conference on Information Modeling and Knowledge Bases, June 2010. [3] A. Torralba, R. Fergus, W. T., 80 million tiny images: a large dataset for non-parametric object and scene recognition, Freeman IEEE Transactions on Pattern Analysis and Machine Intelligence, vol.30(11), pp. 1958-1970, 2008. [4] Shigenobu Kobayashi, Color Image Scale, The Nippon Color & Design Research Institue ed., translated by Louella Matsunaga, Kodansha International, 1992. [5] David J. Crandall, Lars Backstrom, Daniel Huttenlocher, and Jon Kleinberg, Mapping the world's photos, Proceedings of the 18th international conference on World wide web (WWW '09). ACM, New York, NY, USA, 761-770. [6] Ritendra Datta, Dhiraj Joshi, Jia Li and James Z. Wang, Image Retrieval: Ideas, Influences, and Trends of the New Age, ACM Computing Surveys, vol. 40, no. 2, 2008. [7] James Hays, Alexei A. Efros. IM2GPS: estimating geographic information from a single image, Proceedings of the IEEE Conf. on Computer Vision and Pattern Recognition (CVPR), 2008. [8] ] Henning Müller, Wolfgang Müller, David McG. Squire, Stéphane Marchand-Maillet and Thierry Pun, Performance evaluation in content-based image retrieval: overview and proposals, April 2001.

33

[9] Sasaki, S., Takahashi,Y. and Kiyoki,Y., The 4D World Map System with Semantic and Spatio-temporal Analyzers, Information Modelling and Knowledge Bases, Vol. XXI, IOS Press, pp. 1 - 18, May 2010. [10] Sasaki, S., Takahashi,Y. and Kiyoki,Y., The 4D World Map System with Semantic and Spatio-temporal Analyzers, the 19th European-Japanese Conference on Information Modelling and Knowledge Bases, pp.11-24, June 1-5th, Maribor, Slovenia, 2009. [11] Zito, T., Wilbert, N., Wiskott, L., Berkes, P. (2009) Modular toolkit for Data Processing (MDP): a Python data processing frame work Front. Neuroinform. (2008) 2:8. Homepage: http://mdp-toolkit.sourceforge.net [12] Marius Muja and David G. Lowe, Fast Approximate Nearest Neighbors with Automatic Algorithm Configuration, in International Conference on Computer Vision Theory and Applications (VISAPP'09), 2009 [13] A. Vadivel. A.K. Majumdar and Shamik Sural, Perceptually Smooth Histogram Generation from the HSV Color Space for Content Based Image Retrieval, International Conference on Advances in Pattern Recognition (ICAPR), Calcutta, India, 2003, pp. 248251. [14] T. Gevers and A.W.M. Smeulders, PicToSeek: Combining color and shape invariant features for image retrieval, IEEE Trans. on IP, 9 (2000) pp. 102-119. [15] J.Z. Wang, J. Li and G. Wiederhold, SIMPLIcity: Semantics-sensitive integrated matching for picture libraries, IEEE Trans. on PAMI, 23 (2001) pp. 947-963. [16] W.H.Leung, T. Chen, Trademark retrieval using contour-skeleton stroke classification, IEEE Int. Conf. on Multimedia and Expo., vol. 2, 2002, pp. 517-520. [17] Yasushi Kiyoki, Takashi Kitagawa, Takanari Hayama, A metadatabase system for semantic image search by a mathematical model of meaning, ACM SIGMOD Record, Volume 23 Issue 4 , December 1994. [18] T. Kitagawa, T. Nakanishi, Y. Kiyoki, An Implementation Method of Automatic Metadata Extraction Method for Image Data and Its Application to Semantic Associative Search, Information

Processing

Society

of

VOl.43,No.SIG12(TOD16), pp38-51, 2002.

34

Japan

Transactions

on

Databases,

[19] TinEye Reverse Image Search, Idee, 2008, http://www.tineye.com/ [20] MyPicsMap – Photos of the World by Webzardry, http://www.mypicsmap.com/, October 2009 [21] Flickr: Explore everyone’s photos on a Map, http://www.flickr.com/map/ [22] Flickr Photo Sharing, www.flickr.com [23] Multicolr Search Lab, Idee, 2008, http:/labs/ideeinc.com/multicolor/ [24] Panoramio from Google, http://www.panoramio.com/, July 2007 [25] GazoPa Similar Image Search, http://www.gazopa.com [26] Web Gallery of ART, http://www.wga.hu/index1.html [27] Online Travel Guides of Travel Destinations, www.destination360.com [28] Google map API, http://code.google.com/apis/maps/documentation/javascript/ [29] Picasa Web Albums, picasaweb.google.com [30] Webshots, http://www.webshots.com/ [31] Photobucket, http://photobucket.com/ [32] Imagery, http://elzr.com/imagery [33] Google Images Search, http://www.google.com/ [34] Yahoo! Image Search, http://images.search.yahoo.com/. [35] Diep Nguyen-Thi-Ngoc, Shiori Sasaki, Yasushi Kiyoki, Imagination-based Image Search System with Dynamic Query Creation and its Application, The IASTED International Conferences on Informatics 2010, Software Engineering and Applications, November 8-10, 2010, 725-044

35