CSE 484 Project: Near Duplicate Image Retrieval In this project, you need to build a system for content based image retrieval (CBIR). The materials used by the project, including the data set and the tools for large-scale data clustering, are available at http://www.cse.msu.edu/~tongwei/cse484project.tar.gz.
1. The data set There are a total of 10,000 color images in this data set under the folder ‘./img’. For each image I, we have extracted a number of key points to represent its visual content: each key point is a vector of 128 dimension that is used to represent the visual patterns of the image I. The key points of all images are stored in a single file ‘./feature/esp.feature’: each line is a vector of 128 elements that corresponds to a key point; elements in each line are separated by blank spaces. The name of each image file is stored in the file ‘./imglist.txt’. These names will be useful for final evaluation. The number of key points in each image can be found in the file ‘./feature/esp.size’. Here is an example of the three files. Consider we have an image database with three images: imgB.jpg, imgA.jpg, imgC.jpg. Suppose we extract 2 key points for imgA.jpg, 3 key points for imgB.jpg, and 2 key points for imgC.jpg. The content of the three files are given as below
imgB.jpg imgC.jpg imgA.jpg
imglist.txt
imgB-key point 1 imgB-key point 2 imgB-key point 3 imgC-key point 1 imgC-key point 2 imgA-key point 1 imgA-key point 2
3 2 2
esp.size
esp.feature
2. Key point quantization We have discussed the bag-of-words model for image retrieval, in which each key point is mapped to one of the visual words. Given the bag-of-words representation for images, we can view each image as a visual document, and apply the standard text retrieval technique (i.e., inverted index
and tf.idf model) for efficient image retrieval. The key steps in constructing bag-of-words models for images are (a) the construction of visual vocabulary and (b) mapping each key point to one of the visual words. A typical approach for constructing the visual vocabulary is to cluster all the key points into a large number of clusters, with the center of each cluster corresponding to a visual word; each key is mapped to the visual word with the shortest distance. We refer to these two steps are key point quantization. Here is an example. Let us continue using the settings in the previous example. First, by applying clustering algorithm, we cluster the 7 key points from three images into three clusters. Let’s denote by cnt1, cnt2, and cnt3 the centers of the three clusters; each center is a visual word, denoted by w1, w2, and w3. By mapping each key point to a visual word with the shortest distance, we have the following results: imgA.jpg 1st key point Æ w2 2nd key point Æ w1 imgB.jpg 1st key point Æ w3 2nd key point Æ w3 3rd key point Æ w2 imgC.jpg 1st key point Æ w3 2nd key point Æ w2 We thus have the following bag-of-words representation of the three images as: imgA.jpg: w2 w1 imgB.jpg: w3 w3 w2 imgC.jpg: w3 w2 Based on the above bag-of-words models, we can view each image as a visual document and directly apply the standard text search engine to image retrieval. In the project, you need to write a program, using the FLANN library, to: (a) Cluster the key points of all the images in the data set (the key points in the file esp.festure) into ‘visual words’, and (b) Map each key point to a visual word. Task-(a) is accomplished by using the routines in FLANN library for large-scale clustering, and task-(b) is accomplished by using the routines in FLANN library for nearest neighbor search. The library can be found under the folder ‘./flann’. You can refer to the manual ‘./flann/manual.pdf’ for the detailed instruction for installation and usage. For the details of the functions, you may need to refer to the “flann.h” under “./flann/header” More specifically, in order to construct the bag-of-words model for images, your program needs to
do the following: 1. include “./flann/header/flann.h” and other necessary headers. 2. Read in the key points of all images from the file ‘esp.feature’. 3. Read in additional files that are necessary. 4. Prepare for calling FLANN functions, such as memory allocation and parameter initialization. 5. Call FLANN function flann_compute_cluster_centers() to cluster the key points into 150,000 cluster center (i.e., generate a vocabulary of 150,000 visual words). Set the fields of the sixth parameter IndexParameters in calling this routine as follows: { algorithm = KMEANS; checks = 2048; cb_index = 0.6; branching = 10; iterations = 15; centers_init = CENTERS_GONZALES; target_precision = -1; build_weight = 0.01; memory_weight = 1;
} Set the seventh parameter to be NULL 6. Call FLANN function flann_build_index() to build the necessary index for the cluster centers. The first parameter in this function is the cluster centers that are obtained after calling flann_compute_cluster_centers()
Set the fields of the fifth parameter IndexParameters as follows: { algorithm = KDTREE; checks =2048; trees = 8; target_precision = -1; build_weight = 0.01; memory_weight = 1; }
The last parameter in the function call is set to be NULL. 7. Call FLANN function flann_find_nearest_neighbors_index() to find the nearest cluster center for each key point. The first parameter in this function call is the index built by flann_build_index(); the second parameter is the data structure that stores all the key points of the images; the fifth parameter nn is set to be 1 and the sixth parameter check is set to be 1024; the fields of the last parameter FLANNParameters is set as follows; { log_level=LOG_NONE; log_destination=NULL; random_seed=CENTERS_RANDOM; }
8. Write the cluster centers to a file. You can write the cluster centers in any format. These
cluster centers will be used later on to map key points in a query image to visual words. 9. Write the bag-of-words representation of each image into a file in the “trec” format. When separating the representation of each image, use the relationship between the files “imglist.txt” and “esp.size”. Put the name of each image into the field of “
” in “trec” fromant. Here is an example of the format of the file you need to generate:
imgB w3 w3 w1
imgA w2 w1
10. Free up the memory and quit the program
3. Build index using Lemur Given the “textual” content of each image, we will apply Lemur to index all the images and furthermore for image retrieval. We have discussed how to build document index using Lemur in previous homework.
4. Extract key points for a query In content-based image retrieval, each query is an image. In order to apply text search engine for image retrieval, we need to extract the key points of the query image and generate the same bag-of-words for queries as we did for images in the database. In this step, you need to use the tool provided to extract key points for each query image. There are three sample queries located under the folder ‘./sample query/’. Each query is a gray scale image in the PGM format. The tool can be found under the folder ‘./sift tool/’. If you use Windows, use “siftW32.exe”. Under Linux/Unix, use “sift”. The details of how to use this tool and the format of the output file can be found in the “readme” file under the same folder.
5. Generate a bag-of-words model for a query After we obtained the key points of a query, we need to generate the same bag-of-words for queries as we did for images in the database. In particular, we need to map each key point of a given query to a visual word.
In this step, you need to write a program to compute the bag-of-words representation for a query and write the corresponding “textual” content into a file, as we did for indexing the images in the database. In your code, you need to do the following 1. include “./flann/header/flann.h” and other necessary headers. 2. Read in the cluster centers which were generated in step 3. 3. Call FLANN function flann_build_index() to build index for the clustering centers. 4.
5.
Read in the key points for the query image from the output file in the step 4. Note that, for each key point in the file, you need discard the location information (4 floating point numbers) and only use the rest 128 integers. Call FLANN function flann_find_nearest_neighbors_index() to find the nearest cluster center
6.
for each key point. Write the “textual” content of the query image in the following format. The mapped cluster ID for the 1st key point The mapped cluster ID for the 2nd key point … The mapped cluster ID for the 1st key point
6. Image Retrieval by Lemur After obtaining the bag-of-words representation for a query image, we can apply the Lemur for image retrieval. In particular, you will use the Lemur command ‘RetEval’ to retrieve images from the database by issuing the following command: RetEval where the is the path to the parameter file used for your queries. We have discussed how to set up the values of parameters in parameter_file in both classes and previous homework.. Below is an example of the parameter file: /home/user1/myindex/myindex.key tfidf /home/user1/query/q1.query /home/user1/result/ret.result 1 10
The parameter file's structure and options are below: 1. index: the complete name of the index table-of-content file for the database index. 2. retModel: this is the retrieval model to use. In this project, we used model ‘tfidf’
3. textQuery: the query text stream 4. resultFile: the result file 5. TRECResultFormat: whether the result format is of the TREC format (i.e., six-column). In this project, set it to be 1. 6. resultCount: the number of documents to return as result for each query. In this project, set it to be 10.
7. Graphical User Interface for Image Retrieval You need to build a GUI for an image retrieval system by having the following functions: 1. Allow an user to browse the image database to select an image to query the database to find the visually similar images. To achieve this, you can simply extract the bag-of-words representation of the query from the results of step 3, write it into the file with the format specified in step7 and run the “RetEval” command for retrieval. 2. Load in the external query image and find the images in the database that are visually similar to the given query image Your GUI can be written in any programming language or script language.
8. Evaluation and Hand in To evaluate your project, every group will demo their program in the classes of the last week. During the demo, we will provide a number of test query images. You need to run your GUI, load in each test query image and display the first ten most similar images from the database. It is important to note that step 2 and 3 (i.e., generating bag-of-words models for images in the database and constructing the Lemur index for the generated bag-of-words model) are finished offline before the demo. In addition to retrieval accuracy, your system will also be evaluated by (a) its efficiency in retrieving visually similar images and (b) the graphical design of your GUI interface. You need to hand in the following item 12/17/2009, midnight, by sending all the files to the instructor and TA ([email protected], [email protected] ). 1. Your source code used in step 2 for generating bag-of-words for images in the database 2. Your source code used in step 5 for generating bag-of-words for any query image 3. Your source code of GUI interface 4. A brief report that summarizes the design of your image retrieval system, and any creative components you introduced to your system.