Scene Segmentation and Interpretation Pascal Project

Viewer
Transcript

Scene Segmentation and Interpretation Pascal Project Chengjia Wang, Universitat de Girona, Spain, June 2010

Abstract— In this project we explore the problem of recognizing scene categories based on histograms of local features. SIFT has been used as the descriptor of the local features. This method works by first finding the bag of words using Kmeans clustering. Then, the method uses the clusters to build the feature space for each image. The effect of parameterization of clustering has been studied. Various classification methods have been applied to find the correct match. An extensive comparison of the results of different classifiers is presented.

I. INTRODUCTION The Pattern Analysis, Statistical Modelling and Computational Learning Visual Object Classes Challenge, usually known simply as the PASCAL challenge, is a competition where teams all around the world try to solve some of the not yet solved image processing problems over a large, complicated and real life dataset of images [1]. The Challenge has evolved over years and has increased the number of competitions. In 2006, the two main competitions were the classification (prediction of the presence or absence of an example of a class in an image), and the detection (prediction of the bounding box and label of the object). The newest version, 2010, includes some more challenges as the segmentation competition, person layout taster competition, action classification taster competitions and large scale visual recognition taster competition. One of the most interesting applications of image processing is the ability of identifying objects in images. Classification is a computational procedure that sorts images into groups called classes according to their similarities. Images can be similar in all kinds of ways, but in this context the similarity is measured by the presence of a certain type of object in the image. Even though similar objects belong to the same type they can present notorious differences. For instance, a car can have two doors, four doors, can be sportive, luxurious, can have different colors, designs, etc. Human beings learn to distinguish between different types of objects with time using processes that are still neither well understood nor computationally reproducible, due to hardware restrictions and proper human-like artificial thinking methods. However, to distinguish the presence of an object in an image remains a challenging task for an artificial system and there is no general approach that works properly in all the cases. Additional difficulties are posed by problems like occlusions of the image [2]. This project, named Pascal Project, is based on the Pascal 2006 challenge and concerns only the classification competition. The approaches taken to solve this problem will be described in this work.

II. PROBLEM DEFINITION As in the original Pascal Challenge, the problem consists in predicting the presence or absence of an example of a certain class in the test images. There are 10 classes: bicycle, bus, car, motorbike, cat, cow, dog, horse, sheep and person and the image set has been reduced to fewer images as compared to the real challenge, excluding the complicated ones that involve, for instance, occlusions and that are labelled as that. There are a number of sub problems that appear when trying to classify images, like: what is the best approach for the detectors and descriptors of the image?, Which images should be taken to obtain better results? How should features be combined to form a robust input to a classifier? What classifier should be used? There is no general answer to these questions, but practice. Experiments can answer partially depending of each particular case of class [2]. III. STRATEGY ANALYSIS A. General description of the program The program is divided in two main parts: training and testing. An additional step for testing is the computation of the ROC curve as a tool to evaluate the results [2]. 1) Training: One of the best well-known approaches for classification is to compute first a bag of words that is composed of feature descriptors, and then determine which words are present in each set of images belonging to a certain class. Knowing the words in the class images, a classifier can be trained using a training set of images to identify them among the other words. However, this process should be considered carefully without overfitting the classification, otherwise the performance of the classifier will be poor. 2) Testing: In this step, a set of test images are presented to the classifier previously trained, and the output will be a classification of whether the object exists in the image or not, and the degree of certainty of the output. This certainty is a crucial computation since it constitutes the base for drawing the ROC curve used to evaluate the methods. The following sub sections will provide details about the steps mentioned above. B. Feature descriptors Due to the variability of objects of the same type, the differences in view point, scale, orientation, appearance, color, etc, the object as an entity itself cannot be directly used to identify whether it is present or not in an image, but the information of the descriptors of the most salient points

can be profited. In this work, SIFT was used to obtain the features. 1) SIFT: The Difference of Gaussians (DoG) was used as the detector to obtain the features, and the Scale Invariant Feature Transform (SIFT) was used as descriptor of those features. Due to this fact, the features belong to the R128 space [3]. The implementation used in Matlab was the one provided by the open source library VLFeat [4]. Figure 1 shows SIFT applied to one of the cats in the train set belonging to the class cat. Also SIFT applied to the three color channels was performed, which can somehow be considered as color SIFT.

the same positions for different images with the same size. This gives the advantage of avoiding unknown positions but forces to the appearance of features that might not be too strong.

Fig. 3.

Dense SIFT features in the top upper part of the image.

C. Bag of Words

Fig. 1.

SIFT applied to an image belonging to the class cat.

2) Dense SIFT: This consists in computing descriptors for densely sampled keypoints with identical size and orientation. Figure 1 shows the geometry of the Dense SIFT descriptor. Keypoints are sampled in such a way that the centers of the spatial bins are at integer coordinates within the image boundaries. For instance, the top-left bin of the top-left descriptor is centered on the pixel (0,0) and the bin immediately to the right at (binSizeX,0).

It constitutes the dictionary of the features, which can be considered as the words in a dictionary. The features obtained from the images are grouped using the K-means algorithm to get clusters whose middle point will be the word. Figure 4 shows the clustering done by k-means in the 3D space. The features obtained are in the 128-D and the clustering is in this space, which obviously cannot be visually represented.

Fig. 4.

Fig. 2.

Dense SIFT descriptor geometry

This approach obtains the same amount of features per image if they have the same size. This can improve some of the results since using only sparse SIFT some images might have more features than others, and then can be weighted more. This effect is avoided with dense SIFT. Figure 3 shows the result of applying dense SIFT to the same image shown in Figure 1. Only the features for the top left part of the window were plotted. As can be observed, the descriptors are uniformly distributed among the image and will have

K-means in 3D space, clustering the space in regions

There were many approaches taken to compute the Bag of Words and they differ in the way the features obtained from the images are grouped before applying k-means as well as which images are used for their computation. • In this approach, every class is treated separately and for every class, k-means is computed for all the features belonging to all the images in the train set of that class. This is computationally expensive and when more than 200 clusters are desired the process can easily take more than 1 hour per class. There is a different vocabulary for different classes. • The second approach is to sample every training set separately picking only a subset of positive and negative examples randomly selected for every class. The process is independent for every class, and there is a different vocabulary for different classes. This is faster since an arbitrarily less number of images can be considered.

•

The third approach builds one single vocabulary for all the classes, by picking a fixed but random number of positive examples from every class. For instance, 4 positive examples can be picked from every class, yielding 40 images whose features will be used to build the vocabulary using k-means. In this approach, the condition of randomness can be relaxed and a fixed number of images can be taken.

D. Histograms Once the bag of words has been computed using any of the methods described, for every class, each feature is mapped onto the clusterized space, getting the number of the nearest cluster. Then, supposing that there are k clusters, and using the cluster numbers, a histogram with k elements (ranging from 1 to k) is constructed. The histograms are then grouped into two: the ones corresponding to the positive examples, and the ones corresponding to the negative examples. In the testing stage, histograms are computed again for every image with its correspondent features. This is done to make it coherent the inputs to the classifier, since it is trained using the training histograms. E. Classifiers Classifiers take information to the train set and based on that knowledge they determine whether a test element belongs to a class or not. The histograms are the inputs that are used to train the classifier. The classifiers used in this project were nearest, neighbour, SVM, PCA, LDA, nearest mean and a combination of classifiers. The pattern recognition library for Matlab was used with this sake. 1) SVM.: They are supervised learning methods used for classification and regression. Given a set of training examples, each one containing a label of the category (+1,-1), an SVM builds a model that predicts whether a new example falls into one category or the other. More formally, SVMs construct a hyper plane or a set of them in a high or infinite dimensional space. 2) Nearest Neighbour.: It is a method for classifying objects based on closest training examples in the feature space. Even though it is amongst the simplest of all machine learning algorithms, it still has good performance in this classification task. In this method, an object is classified by a majority vote of its neighbors, and it is assigned to the class most common amongst its k-nearest neighbors. The simplest case is when k = 1, then the object is simply assigned to the class of its nearest neighbor. 3) Principal Component Analysis: Principal Component Analysis also known as Karhunen-Loeve (KL) and Principal Component Analysis (PCA), projects features in to a lower dimensional subspace into a subspace such that the first orthogonal dimension of this subspace captures the greatest amount of variance among the features and the last dimension of this subspace captures the least amount of variance among the image. this method was mainly popular for image compression due to its ability to represent the intrinsic structure of data with lower in lower dimension

space. PCA satisfies its by removing redundant correlated features in the training set. Further, As discussed the subspace projection is done to reduce dimension of feature space and discard the redundant and data which does not contribute to classification. the number of Eigen vectors used for projection has defines the dimension of subspace (as in Eq.1). so now rises the question of what is the appropriate number of Eigen vectors to be used for subspace projection. The most conventional way of reducing number of eigenvectors is discarding last 40 of eigenvectors which are associated with lowest Eigen values. Alternatively, energy dimension can be used for eigenvector selection. In this method minimum number of Eigenvectors which can guarantee an energy dimension will be selected. Energy dimension for can be computed as Eq.1 ei =

∑ij=1 λi ∑kj=1 λ j

.

(1)

Thus we select a number of eigen vectors such that the ratio between summation of their associated eigen values and summation of all eigen values reaches a certain threshold (a common threshold is 0.9). In should be mentioned that PCA is a powerful tool for data compression but it does not guarantee the maximum discriminative capability of the subspace. This is due to the fact that subspaces build on mere PCA use only the variance of training data to derive the projection matrix. This method does not consider severability of classes in training data as an factor in calculation of the projection matrix. In the next session we will discuss linear discriminate Analysis which attempts to address this issue. 4) Linear Discriminant Analysis (LDA): Linear Discriminant Analysis(LDA) also referred to as Fisher Discriminant Analysis FDA was proposed by D.Swets, J. Weng to build a projection of image data which is not only a lower dimensional representation of the original image space but it provide maximum class separation, thus, higher quality of classification. LDC function in PRtool implements LDA classification. However, Swets et al proposed using PCA projection to ensure non-singularity of Sw. it has been show that doing 2 consecutive PCA and LDA projection will result in derivation Most discriminated feature (MDF) which has been shown to outperform both PCA and Conventional LDA. this classifier can be implemented by PCLDC classifier in PRtool. In the test part of the program, the classifier has as an input the set of features of a test image and it uses its previous acquired knowledge of the class to decide if there is an object (the desired object) on that image or not. The confidence used to test these classifiers is explained in the section of results. 5) Nearest Mean Classifier: Computes the linear discriminant for the classes in the dataset assuming zero covariances and equal class variances. In order to address the scale variance issue. We have used Nearest mean Scaled Classifier (NMSC) which has been implemented in PRtool using NMSC function.

F. Combination of classifiers

A. Flow Diagram of the program

Due to dependency of classification with distribution of positive and negative example in each classes. It is possible that different classifier generate non-overlapping results. In other words, while one classifier, classifies a test sample correctly another classifier may misclassify the same sample. We will have a different best classifier for each class and we will not know which one to choose. If the overlap between different classifier is complementary. Appropriate combination of classifier can provide a more robust classification. Appropriate combination scheme will provide a minimum sum of classification error for all classes. Below we will present several method used to combine classifiers. 1) Sequential Combination: This is done in such a way that first one classifier is used to map the data into a space and then the second classifier map the second mapping to 2 definitional space were the labels are assigned with a confidence. This is what is built-in in some classifiers such as PCLDC explained in section 5.4. 2) Probability combination: In this method first each classifier produces its classification results as posteriori probability of belonging to each of the target classes. then the results are aggregated using one of following : minc, maxc , median c, meanc and prodc. In the test part of the program, the classifier has as an input the set of features of a test image and it uses its previous acquired knowledge of the class to decide if there is an object (the desired object) on that image or not. The confidence used to test these classifiers is explained in the section of results.

The general flow that the program follows is shown graphically in Figure 5. For each one of the steps there were different approaches considered and they were explained in the previous section. The general frame of the example program provided was used. For instance, in this program, the features computed are stored in a folder named local. When the program needs to compute them, it first checks whether they have been previously computed and stored. If so, it recovers the file, otherwise, it computes them and saves them. The advantage of this procedure is to save time since computing all the descriptors takes some time. This approach was also taken for computing the k-means, since this is another time consuming process. Different names of the K-means were stored according to their characteristics, as the number of images used to compute them or the number of clusters, and then they were retrieved when necessary.

G. ROC curve The receiver operating characteristics (ROC) curve is a useful tool for organizing classifiers, visualizing their performance and based on this response selecting the most suitable classifier. It consists on a graphical plot of the true positives, also called sensitivity, versus the false positives, varying the discrimination threshold for a binary classifier. This curve can also be represented by plotting the true positive rate versus the false positive rate. There are some important considerations to take into account when drawing the curve. Binary classifiers, like decision trees or rules set, usually yield only a numerical value or a binary label, and so, they will generate a single point in the ROC space. Other classifiers, such as naive Bayes and neural networks, produce probability values that represent the degree to which the element belongs to. In this case, setting a threshold value determines a fixed amount of true positives and flase positives, defining a point in the ROC space. This is closely related to the confusion matrix that can be considered as the specification of a single point in the ROC curve for a certain threshold. IV. DESIGN AND IMPLEMENTATION The project was implemented in a Computer with an 2Ghz microprocessor, 4GB of RAM memory and running Windows Vista. The programming language used was Matlab.

B. Implementation of descriptors and classifiers For the computation of the sift features as well as for the k-means, the open library called vlfeat was used. This is done very simply by using the functions vl sift and vl dsift, for the sparse and dense sift, respectively. For the sparse version, the step size of the window among which the features will be find has to be also specified. Both functions output the features and their localization. In this program only the features were used, regardless of their position in the image. For K-means, the function used was [words i] = vl ikmeans(Features,K), where the first parameter specifies the SIFT features obtained and the second one is the number of clusters to be used. The function vl ikmeanspush(feature,words) was used to assign a cluster number to a certain feature. An input to this function is words, which is the central point of the clusters obtained using kmeans. The implementation of the classifiers was not done directly in the program. Rather, the available package prtools (pattern recognition tools) for Matlab was utilized due to the large amount of different possibilities for classification that it offers and because of its relative ease of use. Using this library, first the dataset has to be created using the key word dataset, where the features, as well as their groundtruth has to be specified as A = dataset(features, gt). The program orders the features in such a way that the first ones correspond to the positive examples and the last ones to the negative examples, that is: f eatures = [ f eatures positive; f eatures negative]T . Using this convention, gt is simplified to a set of +1 or −1 with the size of the previous features. Then, the classifiers are easily trained using knnc(A,1) for nearest neighbour, or pcldc(A,0.95) for PCA+LDA, or svc for SVMs. To test the data, the function TestData is used as TestData*classifier.model, where classifier model was obtained in the training stage. For SVMs, the library called SV Mlight , was utilized since it gives the value that yields between +1 and -1 or out of this range and this lets the confidence of the classifier be computed in order to graph the ROC curve. All the ROC

clusters using Kmeans is also an important factor which has significant impact in the performance in various cases. We have done experiments using following settings: 1) use ten first images in training dataset 2) use one random positive sample from each class (i.e. a total of 10 images). 3) use all training images. 4) use positive example of each class to build a class specific bag of words. Another important factor is number of words i.e. clusters. This factor basically defines the dimension of feature space. We have to try various numbers of words to find out the appropriate feature dimension for representation of images. In this work we have used 10, 20, 30, 40, 50, 100, 150, 200, 400 features. Classification allows several choices which need to be explored. Firstly have the option to use single or multiple classifiers. For single classifier choice we can choose one of the classifiers explained in part III section 5. That is: SVM, KNN, NMSC and PCLD. After choosing the classifier we might be subjected to setting the parameter for that classifier. For example the choice of PCLD subjects us to do experiments to find the optimal energy dimension. To explore the combination of classifiers we are required to experiment different classifier combination methods ising various set of selected classifiers. In this work we have carried out our experiments using 2-3 classifiers and explored the probability based combination of these classifiers. V. RESULTS

Fig. 5.

General Flow Diagram for the program

The results of the classification methods attempted and described in the previous section were evaluated using the so called ROC curve, and the area under the curve was used as a measure of the performance of the classifier. The outputs of the classifiers are values that indicate the degree in which it belongs to the class to be selected. For each classifier used, the way for computing the confidence will be exposed. The vocabulary is generated using k-means and it is important to point out that k-means depends on the random seed of its initialization. Then, even running the same program, but computing k-means (the vocabulary) again yields to similar but not exactly the same results. A. Nearest Neighbor + SIFT

curves presented in this work were plotted and obtained using the function VOCroc provided with the images [4]. C. Evaluation and Parameter setting Due to variability of parameters we need extensive experiments to evaluate the performance of the classification on the Pascal Challenge. Figure 6 illustrates the parameters structure which can be changed to evaluate the method. One of the parameter which needed to be set was the use of dense or normal sift descriptors. As we will see, for certain classes dense sift feature provide a significantly higher discriminatory power, which cannot be achieved with normal sift [5]. The method of selection of images used for building the

Using Bag of word created from local SIFT features we have carried out following experiment. Table I shows the results obtained using nearest neighbour. It can be seen that increasing number of words results in improvements in performance in most cases. The histograms are build based on the words created by applying K-mean to SIFT feature gathered from all images in dataset. The 1-Nearest Neighbour assigns the test image to the class with closest feature representation (i.e. histogram). To compute the ROC curve, the shortest distance to the nearest negative and the nearest positive was calculated, and divided to get a measurement of confidence. Table I shows the results increasing the number of clusters K. Overall it can be observed that this method does not provide a high accuracy. we believe low

Fig. 6.

The tree of parameters explored in the course of experiments. (Tree is symmetrical)

TABLE I N EAREST N EIGHBOUR WITHOUT NORMALIZATION

OF THE

H ISTOGRAM

performance is due to high over lap in feature space which cannot be distinguished by nearest neighbour method. In the previous Table, the histograms computed were not normalized. Using normalization for the histograms, Table II is obtained, and it can be seen that the performance does not improve, except for few classes. Basically, normalization only helps in case when it reduces the overlap between positive and negative examples. Later is not the case for all the classes. Further the test and training example are not normalize using similar values thus when computing distance the feature values may be distorted.

Due to the trend of results in Table II we cannot present a general rule for the impact of normalization on data distribution, as we cannot estimate the affect of normalization on the data. We are going to avoid normalization here on. One general trend which can be seen for most of classes in both previous tables is that performance increases as the number of cluster increases. However, the results are not very good when taking all of the features of all the images when computing the vocabulary. Another approach taken was to use a random n number of positive samples from each class to generate a unique vocabulary. The result of this approach using nearest neighbor is shown in Table III. TABLE III N EAREST N EIGHBOUR WITH A GLOBAL VOCABULARY

TABLE II N EAREST N EIGHBOUR NORMALIZING

THE

H ISTOGRAMS

The Table III show considerable improvement of result for most of classes. We believe this is because building the vocabulary using positive examples causes higher discriminatory power of histogram as the words generated using the positive example have higher probability of occurrence in an image within the same class rather than a negative example. Give the presented hypothesis in the most of the following approaches will mainly use this way of generating the vocabulary i.e. with only a sub sample of the positive examples in the classes and with a global vocabulary.

B. SIFT + SVM As an alternative to Nearest Neighbor we have applied SVM classifier for classification of histograms of bag of words. This experiment was carried out to compare the result of previous classification method with SVM. In this experiment we have used class specific vocabulary. The result presented in Table IV was computed from the ROC curve was taking the ratio between the distance from the point to -1 and the distance to +1, since the output of the classifier is a real number and the closer it is to +1 or -1 the more likely it is to be a positive or negative element. Different parameter setting for SVM has been tested and the ones providing highest performance has been presented in Table IV. It can be seen that except for an increase in performance in bus class nearest neighbor outperforms. The problem with SVM is that it easily over-fits the training data. On the other hand it out performs the KNN using global bag of words and normalized histograms.

Changing the vocabulary to a global one used for all the classes, and picking only the features of 10 training images, Table VI was obtained. The high performance of PCA+LDA compared to previous classification method inspires more investigation based on this classifier. We believe that high performance provided by this method raises form the optimum class separation which is the nature if PCA+LDA projection. This projection is very effective when the feature space of test image have been visited in training data. In other words if there is a similar image in training set there is high probability of correct detection. We believe that the reason for low performance of some classes is high inter class variation due to change in size of the object i.e. person, cow, car, sheep motor etc. we have tried to resolve this issue by using dense sift, which enforces feature extraction at certain locations. TABLE VI SIFT

USING

PCA LDA

AND A GLOBAL VOCABULARY

TABLE IV SIFT USING SVM

WITH DIFFERENT KERNELS

Some of the best results of this classifier and this method are shown in Figure 7.

C. SIFT + PCA-LDA The PCA-LDA classification plus class specific vocabulary was used for all classes. In this experiment the vocabulary was built considering all the features in all the images. The features are normal SIFT feature and the classifier uses sequential combination of PCA and LDA with energy dimension of 0.95. The energy dimension was derived though a extensive test where various energy dimensions where used on a fix parameter setting (i.e.K, image selection type of sift). TABLE V SIFT

USING

PCA LDA

AND CLASS SPECIFIC VOCABULARIES

Fig. 7.

Some good results obtained with SIFT + PCA LDA

D. Dense SIFT + PCA-LDA Based on the hypothesis stated in the previous section we have used dense SIFT to investigate the performance gain. The result of our extensive experiment is shown in Table VII, where the classifier used was the mixture of PCA and LDA. PCA+LDA classifier with different parameters and various clusters and set of images where tested in this experiment. All results were generated using one single vocabulary for all the classes, and it is constructed using n positive examples from each class, so that in total features will be extracted from 10n images, since there are 10 classes. The number of clusters used in k-means is specified by K, and e specifies the parameter of pcldc.

TABLE VII D ENSE S IFT USING PCA LDA

Figure 8 shows some of the best results that were obtained using dense SIFT and pldc for the 10 classes. The parameters used for the images can be obtained looking at the previous table. This experiment proves the hypothesis that dense SIFT is necessary to improve the performance of several classes. This is due to the location of the sift features in the positive example of these classes corresponding to features in the test image. In addition it can be observed that when the number of clusters is increased (i.e. which means increase in dimension of feature vector) we need to lower the energy dimension to achieve higher performance. This is because more clusters mean more redundant data in feature space. Thus, we have to lower the energy dimension so that more redundancy is removed. E. NMSC+ SIFT(Dense vs Normal) This section covers the experiment carried out using nearest mean scaled classifier. Various number of words image selection method and both normal and dens sift type has been tested. Comparison of the two table below shows that (NMSC+Dens SIFT) is much more appropriate feature extraction method as 8 out of 10 classes outperform NMSC+SIFT. Comparing Table VIII and Table IX it can be observed that different parameter setting gained best overall performance. F. Combining best Classifiers (PCLD+NMSC) In this section we want to explore the hypothesis of classifier combination. This hypothesis states that given result of classification provided by 2 different classifier if the results are uncorrelated the combination of posteriori probabilities of

Fig. 8.

Some good results obtained with SIFT + PCA LDA

can lead to higher performance . This is because the mistakes by on classifier can be compensated by the other one. In this section we have combined the aforementioned classifiers and compared the result with single classifiers provided in previous sections. It can be seen that the performance of the combinations outperforms the single best classifier while in other cases combination follows single best classifier closely. Although the single classifier performs better in most of case it is more robust to use the combination due to the fact that different classes have high performance with different classifiers and

TABLE VIII N ORMAL SIFT + NMSC +

RANDOM POSITIVE IMAGE

SELECTION ( GLOBAL ).

TABLE XI D ENSE SIFT+ NMSC +

RANDOM POSITIVE IMAGE SELECTION .

M EAN

COMBINATION .

TABLE IX D ENS SIFT+ NMSC +

RANDOM POSITIVE IMAGE SELECTION ( GLOBAL ).

TABLE X ND ENS SIFT+ NMSC +PCLD SELECTION ( GLOBAL ).

RANDOM POSITIVE IMAGE

P RODUCT COMBINAITON

we cannot choose the classifier beforehand. The combiners above where selected among the best combination method s other method of this kind are max, mean, median, etc. VI. ORGANIZATION AND DEVELOPMENT This work was developed on few stages. First stage was familiarizing with the frame work and running the example classifier using limited number of training and testing im-

ages. In the second stage we familiarized with sift feature extraction and studied the vlfeat library which was later used for feature extraction. In the third stage the feature extraction was integrated into the main frame work and the analysis of the problem began in the fourth stage the bag of words and image representation where analogized and different strategies were suggested and implemented. The familiarization with PRtool was done in the fifth stage. And in final stage the extensive experiments were carried out over time and optimum parameters and strategies where found and presented and the report was written. The whole process took about 84 hours. VII. CONCLUSIONS This work presents a solution to the problem of scene categorization. SIFT descriptors were used as local features and different strategies regarding building a bag words using K-mean were investigated. Different parameters regarding size of cluster selection of type of sift and various classifiers were tested. The results have shown that in most cases a histogram of sift feature build from bag of words generated from randomly selected positive example can be a suitable representation of an image. Using this representation and combination of classifiers we can achieve reasonable scene classification performance. Time complexity of the training and testing procedure is highly dependent on the number of cluster (i.e. dimension of features). The proposed method in section 6 achieves high performance in the lowest computation time. Each training and testing takes less than 3 minutes (for K=100 n=1). For some classes above 70 area under the curve can be achieved in real-time if the training data is saved on hard disk. For example, car, cat, cow and sheep can achieve area under the curve of 0.86,0.72,0.79,0.74 respectively with k=50 n=1 which is almost real-time if features are precomputed. R EFERENCES [1] M. Everingham and J. Winn. The pascal visual object classes challenge 2010 (voc2010) development kit. 2010.

[2] T. Fawcett. An introduction to roc analysis. Pattern recognition letters, 27(8):861–874, 2006. [3] D.G. Lowe. Distinctive image features from scale-invariant keypoints. International journal of computer vision, 60(2):91–110, 2004. [4] A. Vedaldi and B. Fulkerson. Vlfeat: An open and portable library of computer vision algorithms. In Proceedings of the international conference on Multimedia, pages 1469–1472. ACM, 2010. [5] K. Mikolajczyk and C. Schmid. A performance evaluation of local descriptors. Pattern Analysis and Machine Intelligence, IEEE Transactions on, 27(10):1615–1630, 2005.

Scene Segmentation and Interpretation Pascal Project

The receiver operating characteristics (ROC) curve is a useful tool for organizing ... microprocessor, 4GB of RAM memory and running Win- dows Vista.

Download PDF

968KB Sizes 1 Downloads 72 Views

Report

Scene Segmentation and Interpretation Pascal Project

Recommend Documents