2010 International Conference on Pattern Recognition
Enhanced Measurement Model for Subspace-based Tracking ∗ Shimin Yin, Haan Joo Yoo, and Jin Young Choi EECS Department, ASRI, Seoul National University, Seoul, Korea {smyoon, hjyoo, jychoi}@neuro.snu.ac.kr Abstract We present an efficient and robust measurement model for visual tracking. This approach builds on and extends work on measurement model of subspace representation. Subspace-based tracking algorithms have been introduced to visual tracking literature for a decade and show considerable tracking performance due to its robustness in matching. However, the measures used in their measurement models are not robust enough in cluttered backgrounds. We propose a novel measure of object matching referred to as WDIFS, which aims to improve the discriminability of matching within the subspace. Our measurement model can distinguish target from similar background clutters which often cause erroneous drift by conventional DFFSbased measure. Experiments demonstrate the effectiveness of the proposed tracking algorithm under cluttered background.
1. Introduction Visual tracking has become an essential technology of great interest in video-based computer vision applications, extending from augmented reality(AR), human-computer interface(HCI), and visual surveillance. Numerous and various approaches have been endeavored to improve its performance and achieved great development in last two decades. Some of those approaches showed noticeable improvement in performances and made a strong impact on the visual tracking community. For appearance models, subspace representation[1, 2] is firstly adopted by the Eigentracking algorithm[3] which gives us the inspiration of what is a good feature for tracking. Even now, it is still the most meaningful appearance model in 2D image and has the advantage of learning variations in pose and illumination with the aid of incremental subspace learning(ISL) techniques[5, 9]. For the corresponding measurement model, Moghaddam and ∗ This
Pentland[4] proposed an eigenspace density estimation technique with the distance in feature space(DIFS) and the distance from feature space(DFFS). They also gave the hint that the DFFS is important for object detection task which plays the similar role in the measurement model of tracking algorithm. As mentioned above, the DFFS has the potential to be a measure for the subspace-based tracking. Assuming target appearance subspace is well learned, which means the training images are well sampled from target object, then the DFFS measure shows considerable robustness. J.Lim et al.[5, 9] also adopted the probabilistic principal component analysis(PPCA)[5] and the robust norm[3] to improve the robustness of DFFS measure. In fact, the DFFS-based measurement model is an attractive measure due to its efficiency and effectiveness and most of the recent subspace based tracking algorithms adopted the DFFS-based measure in their measurement model[5, 6, 7, 8]. Unfortunately, DFFS is vulnerable to targetsimilar background clutters in incremental subspace learning(ISL)-based tracking because of the following two reasons. First, the assumption of accurate sampling is impractical in tracking case. Because of the trade-off between the real-time performance and tracking accuracy, the estimated tracking box is usually more or less misaligned with real target in image. The bias from rotation, scaling and translation factors can be included in the sampling process for generating the training data. After training by these contaminated data, the representation of target in subspace is less compact, which imply that the DFFS measure has to tolerate some error in its accuracy. The second cause is from the introducing of ISL. The purpose of ISL is to enable tracking dynamically changing object by learning pose and illumination change and also the shape deformation of object. Thus, the training data are usually quite different from each other. Therefore, during the subspace learning process, these training data are prone to cause considerable rotational effect in eigenbases and significant changes in its corresponding eigenvalues. Because the DFFS is
research was supported by the Samsung Techwin Co., Ltd.
1051-4651/10 $26.00 © 2010 IEEE DOI 10.1109/ICPR.2010.852
3480 3496 3492
163
164 50
50
100
100
100
150
150
200
50
100
150
200
250
1st 2nd 3rd 4th 5th 6th 7th 8th 300
F4
165
50
OC 4
F3
150
200
50
100
150
200
250
1st 2nd 3rd 4th 5th 6th 7th 8th 300
200
50
100
150
200
250
C2
1st 2nd 3rd 4th 5th 6th 7th 8th 300
1
0.9 5
0.2
5
5
4
PC2
0.8 0.1
10
0.7
10
10
15
15
2
0.5
OC1
−0.1
0.4
20
20
20
1 −0.2
0.3 25
25
25
0.2
0
F2
−0.3
0.1
30
30
30
−1 5
10
15
20
25
30
BC
−0.4
0
5
Wimg1
10
15
20
25
5
30
10
15
20
25
30
WDIFS: 1.262e−008
DFFS: 5.8109e−015 1
0.5
OC2
0.9 5
0.4
5
5
C1
1
0.8 0.3
10
0.7
10
10
F1
0.5 0.2
0.6 15
OC3
PRef
0
0.6 15
3
15
15
0.5
0.1 0
0.4
20
20
0
PC1
20
0.3 −0.5
−0.1
25
25
25
0.2 −0.2
0.1
30
5
10
15
20
Wimg2
25
30
0
30
30
−1
−0.3 5
10
15
20
25
DFFS: 4.5962e−017
30
5
10
15
20
25
30
WDIFS: 0.0068024
Figure 1. Example of tracking failure in background clutters by DFFS measure. the distance relative to the subspace, the change of subspace directly affects the accuracy of DFFS measure, which aggravates the structural defect of DFFS measure in ISL-based tracking. In this paper, we propose an efficient and effective measure within the subspace, which is called weighted difference in feature space(WDIFS). It acts as complementary measure of the DFFS and enhances the ability to distinguish the true object from the object-similar background clutters. Considering the real-time performance, the efficiency of WDIFS is an other strength addressed in this paper. In the remainder of this paper, we first briefly illustrate the problem of DFFS measure and then present the details of WDIFS in Section 2. Experimental results and Conclusions are discussed in Section 3 and Section 4 respectively.
2 WDIFS for In-Subspace Meaure 2.1 Problem of DFFS First of all, we illustrate the failure example of conventional DFFS measure. The tracking framework used in our paper is same as IVT[5] which is composed of ISL and Particle Filter(PF). In the Figure.1, the top row images show the eight best fit tracking boxes which each one is corresponding to a particle in PF, and the top-down order in the color legend box(located in the right-bottom corner of each image) is exactly the descending order of its confidence. The middle and bottom row images visualize two highest DFFS-based confidence particles(Wimg1, Wimg2) at frame 164 and its corresponding DFFS and WDIFS. the values beneath the DFFS and the WDIFS images are the corresponding confidence values. It’s obvious that the Wimg2 is more proper to track. However, the confidence of the
Figure 2. Illustration of DFFS and inspiration for WDIFS Wimg1 is bigger than that of the Wimg2 in the view of DFFS. On the other hand, the confidence calculated by the WDIFS measure gives more meaningful values, which can compensate the defect of the DFFS. Followings are the our solution using the WDIFS.
2.2 Our solution to DFFS In the visual tracking literature, one of the common used assumption is that the change of object within one time step is trivial. Also, it is known that the principal component analysis(PCA) preserves the relative distances between two high dimensional data in lower dimensional space. Since the tracking object images are high dimensional data, hereby, within the assumption of the object images from any two consecutive frames are similar, we deduce that their projection on the subspace F is also similar, where the F is learned by the PCA or the ISL. Therefore, if we treat the current object image as a reference, then the projection of the next object image on the F should be close to that of reference. Thus, their coordinate vectors on the F is also similar. The change of the F alters their coordinate vectors, but, whatever the F is changed to under the ISL, their coordinate vectors should always be close to each other. It is the key idea of our approach. Figure.2 illustrates the cause of DFFS failure. OC1 , OC2 , OC3 , and OC4 are four object image classes respectively. Each of them represents a class of different aspect of the object under illumination and pose change, and shape deformation. BC represents one class of background clutters. Here we assume that the object is varied from OC1 to OC4 and the subspace F is changed from F1 to F4 orderly. Suppose two particles are drawn, which one is C1 from BC, and the other is C2 from OC4 (both are marked by filling yellow color). After projecting them on F4 , the DFFS from
3493 3497 3481
C1 is shorter than the DFFS from C2 . But within the subspace, things are different. PC1 , PC2 , and PRe f on F4 are the projections of C1 , C2 and the previous object(marked as ’Ref’) respectively. On F4 , PC2 is much more close to PRe f than PC1 . It inspired us that the distances between PRe f and PC1 , PC2 can act as complementary measure of the DFFS within the subspace and complete our measurement model by incorporating it with the DFFS measure. It’s also the main contribution of our measurement model. Followings are the detailed mathematical description of the WDIFS.
2.3 Mathematical Description of the WDIFS First, the previous estimated object image is regarded as the reference image I re f . The IVT-based tracking framework provides a candidate image set {|I i ∈ , i = 1, 2, ...n}, where I i is the i-th candidate image and n is the number of particles. And it also maintains an appearance subspace which provides a mean, ¯ Λ M , Φ M respectively. eigenvalues and eigenbases as I, i ¯ I and I are the image vectors which has the same size as I re f . Φ M is a matrix whose each column vector is an eigenvector and the subindex M means the number of the principal eigenvectors for the subspace. Λ M is the diagonal matrix and each diagonal term is the eigenvalue of corresponding eigenvector in Φ M . Given a subspace, we calculate the difference to its mean from the i-th candidate and the reference image as follows, ¯ E i = I i − I, (1) ¯ E re f = I re f − I. And then, we calculate each coordinate vector on the subspace as follows, C i = ΦTM · E i , C re f = ΦTM · E re f .
(2)
However, C i and C re f in (2) does not consider the importance of each principal eigenvector, which some are bearing large variations and some are bearing small variations. To consider this, we multiply each coordinate by the corresponding eigenvalue and they are defined as WC i , WC re f respectively in (3). WC i = Λ M · C i , WC re f = Λ M · C re f .
(3)
Based on our inference, if the i-th candidate is the current estimated target image, then WC i ≈ WC re f . Thus, our prototype measure in subspace is defined as follow, (4) WDi = WC i − WC re f .
Now, we need to observe the difference of WDi in image space, thus we reflect the difference in subspace basis, and we get the WDIFS as follow, WDIFS i = Φ M · WDi .
(5)
Assuming that the measure in the subspace is governed by a Gaussian distribution, then the probability for the i-th candidate at time t in the WDIFS measure is re f )= PWDIFS (Iti |It−1
1 exp(−WDIFS ti ). (2πη)1/2
(6)
Here η is the Gaussian noise. And our proposed measurement model is designed as follows, re f re f ) = PDFFS (Iti )· PWDIFS (I i |It−1 ), P(Iti |It−1 re f arg maxk {P(Itk |It−1 )}
Itre f = It−1
.
(7)
Compare to DFFS measure, WDIFS measure only adds additional computation by Equation (3, 4, 5, 6), which is very efficient. In practice, the FPS(Frame Per Second) measured by our tracker only drops 5-10 precents compared to the DFFS measure-based tracker.
3
Experimental Results
To evaluate the performance of our proposed measurement model, we tested a number of videos recorded in outdoor and indoor environments where the targets change pose in different lighting conditions. Experiments are done using the IVT[5] tracker under same configuration, and just change the measurement model for comparison. Figure.3 shows the successful tracking result with our proposed measurement model. The conventional DFFS-based measurement models(DFFS, DFFS+PPCA[5], and DFFS+robust norm[5]) fail to track before or at frame 615, where the doll is under large pose change. The bottom row visualized top two highest confidence particle context at frame 615. From the comparison of the corresponding DFFS and WDIFS confidence values, we conclude that WDIFS an effective and discriminative measure, which enables to compensate the defects of the DFFS measure. Figure.4 shows the tracking results of the video captured under severe camera shaking and blurring by high image compression.
4. Conclusion In this paper, we introduced an efficient measurement model to enhance the robustness of subspacebased measurement model. We proposed the weighted difference in feature space(WDIFS) measure within the 3494 3498 3482
100
350
615
1000
50
50
50
50
100
100
100
100
150
150
150
150
200
200
50
100
150
200
250
200
300
50
1050
100
150
200
250
200
300
50
1100
100
150
200
250
300
50
1250
50
50
50
50
100
100
100
100
150
150
150
150
200
200
200
50
100
150
200
250
300
50
100
150
200
250
50
100
150
0.7
300
300
50
100
150
200
250
300
0.8
0.6
0.4
5
0.3
4
10
10
0.7
10
0.4
10
15
0.2
15
20
0
20
25
−0.2
25
−0.4
30
0.2
0.6
2
15
0.1
15 0.5
0
0.4
250
5
0.2
0.5
250
0.8
0.3
10
15
20
5
0.4
0.6 15
200
0.9
6
5
5
0.8 10
200 1
0.5
0.9
150
200
300
1
5
100
1344
0 20
20
0.4
20
0.1
0
−0.1
0.3
0.3
−2 25
25
−0.2
25
25
0.2 −0.3
0.1
30
−0.1
0.2 −0.2
−4 30
30
0.1
30
30
−0.4
5
10
15
20
25
30
0
5
Wimg1
10
15
20
25
5
30
10
15
20
25
30
5
WDIFS: 6.6008e−018
DFFS: 7.3493e−016
10
15
20
25
30
0
Wimg4
(1) Context of Particle 1
−0.3 5
10
15
20
25
30
DFFS: 5.8266e−022
5
10
15
20
25
30
WDIFS: 0.16401
(2) Context of Particle 2
Figure 3. Top two rows: Tracking result of a animal doll with pose and illumination change. Bottom row: Visualization of top two highest confidence particles and its corresponding DFFS and WDIFS at frame 615.
50
10
50
100
100
150
150
200
200
250
11
250
300
References
300 50
50
subspace. Also we demonstrated that our measurement model performs more robust at the background cluttered environment. In future work, one avenue is to find a method for maintaining multi-reference image in a hierarchical way.
100
150
200
250
300
350
400
450
12
50
50
100
100
150
150
200
200
250
250
300
100
150
200
250
300
350
400
450
100
[1] M.A Turk, A.P. Pentland.: Face recognition using eigenfaces. In: CVPR. (1991) 586–591 [2] T.F. Cootes, G.J. Edwards, C.J. Taylor.: Active appearance model In: ECCV. (1998) 484–498 [3] M. J. Black, A. D. Jepson.: EigenTracking: Robust matching and tracking of articulated objects using a viewbased representation. In: ECCV. (1996) 329–342 [4] B. Moghaddam, A. Pentland.: Probabilistic Visual Learning for Object Representation. In: TPAMI. (1997) 696–710 [5] J. Lim, D. Ross, R.-S. Lin, M.-H. Yan.: Incremental Learning for Visual Tracking In: Neural Information Processing Systems(NIPS). (2005) 793–800 [6] M. Yang, Y. Wu.: Tracking non-stationary appearances and dynamic feature selection. In: CVPR. (2005) 1059– 1066 [7] K.C. Lee, D. Kriegman.: Online Learning of Probabilistic Appearance model Manifolds for Video-based Recognition and Tracking. In: CVPR. (2005) 852–859 [8] B. Zhang, W. Tian, Z. Jin.: Efficient hybrid appearance model for object tracking with occlusion handling. In: Optical Engineering. (2007) [9] D. Ross, J. Lim, R.-S. Lin, M.-H. Yan.: Incremental Learning for Robust Visual Tracking. In: International Journal of Computer Vision. (2008) Vol. 77, No. 1-3
300 50
100
150
200
250
300
350
400
450
50
100
150
200
250
300
350
400
450
(a) DFFS-based measurement model 50
10
50
100
100
150
150
200
200
250
250
300
300 50
50
11
100
150
200
250
300
350
400
450
12
50
50
100
100
150
150
200
200
250
100
150
200
250
300
350
400
450
100
150
200
250
300
350
400
450
100
250
300
300 50
100
150
200
250
300
350
400
450
50
(b) Our proposed measurement model Figure 4. Comparison of tracking result. This video are captured with severe camera shaking and heavily blurred by high image compression.
3495 3499 3483