Enhanced Measurement Model for Subspace-Based ...

Viewer
Transcript

2010 International Conference on Pattern Recognition

Enhanced Measurement Model for Subspace-based Tracking ∗ Shimin Yin, Haan Joo Yoo, and Jin Young Choi EECS Department, ASRI, Seoul National University, Seoul, Korea {smyoon, hjyoo, jychoi}@neuro.snu.ac.kr Abstract We present an eﬃcient and robust measurement model for visual tracking. This approach builds on and extends work on measurement model of subspace representation. Subspace-based tracking algorithms have been introduced to visual tracking literature for a decade and show considerable tracking performance due to its robustness in matching. However, the measures used in their measurement models are not robust enough in cluttered backgrounds. We propose a novel measure of object matching referred to as WDIFS, which aims to improve the discriminability of matching within the subspace. Our measurement model can distinguish target from similar background clutters which often cause erroneous drift by conventional DFFSbased measure. Experiments demonstrate the eﬀectiveness of the proposed tracking algorithm under cluttered background.

1. Introduction Visual tracking has become an essential technology of great interest in video-based computer vision applications, extending from augmented reality(AR), human-computer interface(HCI), and visual surveillance. Numerous and various approaches have been endeavored to improve its performance and achieved great development in last two decades. Some of those approaches showed noticeable improvement in performances and made a strong impact on the visual tracking community. For appearance models, subspace representation[1, 2] is firstly adopted by the Eigentracking algorithm[3] which gives us the inspiration of what is a good feature for tracking. Even now, it is still the most meaningful appearance model in 2D image and has the advantage of learning variations in pose and illumination with the aid of incremental subspace learning(ISL) techniques[5, 9]. For the corresponding measurement model, Moghaddam and ∗ This

Pentland[4] proposed an eigenspace density estimation technique with the distance in feature space(DIFS) and the distance from feature space(DFFS). They also gave the hint that the DFFS is important for object detection task which plays the similar role in the measurement model of tracking algorithm. As mentioned above, the DFFS has the potential to be a measure for the subspace-based tracking. Assuming target appearance subspace is well learned, which means the training images are well sampled from target object, then the DFFS measure shows considerable robustness. J.Lim et al.[5, 9] also adopted the probabilistic principal component analysis(PPCA)[5] and the robust norm[3] to improve the robustness of DFFS measure. In fact, the DFFS-based measurement model is an attractive measure due to its eﬃciency and eﬀectiveness and most of the recent subspace based tracking algorithms adopted the DFFS-based measure in their measurement model[5, 6, 7, 8]. Unfortunately, DFFS is vulnerable to targetsimilar background clutters in incremental subspace learning(ISL)-based tracking because of the following two reasons. First, the assumption of accurate sampling is impractical in tracking case. Because of the trade-oﬀ between the real-time performance and tracking accuracy, the estimated tracking box is usually more or less misaligned with real target in image. The bias from rotation, scaling and translation factors can be included in the sampling process for generating the training data. After training by these contaminated data, the representation of target in subspace is less compact, which imply that the DFFS measure has to tolerate some error in its accuracy. The second cause is from the introducing of ISL. The purpose of ISL is to enable tracking dynamically changing object by learning pose and illumination change and also the shape deformation of object. Thus, the training data are usually quite diﬀerent from each other. Therefore, during the subspace learning process, these training data are prone to cause considerable rotational eﬀect in eigenbases and significant changes in its corresponding eigenvalues. Because the DFFS is

research was supported by the Samsung Techwin Co., Ltd.

1051-4651/10 $26.00 © 2010 IEEE DOI 10.1109/ICPR.2010.852

3480 3496 3492

163

164 50

50

100

100

100

150

150

200

50

100

150

200

250

1st 2nd 3rd 4th 5th 6th 7th 8th 300

F4

165

50

OC 4

F3

150

200

50

100

150

200

250

1st 2nd 3rd 4th 5th 6th 7th 8th 300

200

50

100

150

200

250

C2

1st 2nd 3rd 4th 5th 6th 7th 8th 300

1

0.9 5

0.2

5

5

4

PC2

0.8 0.1

10

0.7

10

10

15

15

2

0.5

OC1

−0.1

0.4

20

20

20

1 −0.2

0.3 25

25

25

0.2

0

F2

−0.3

0.1

30

30

30

−1 5

10

15

20

25

30

BC

−0.4

0

5

Wimg1

10

15

20

25

5

30

10

15

20

25

30

WDIFS: 1.262e−008

DFFS: 5.8109e−015 1

0.5

OC2

0.9 5

0.4

5

5

C1

1

0.8 0.3

10

0.7

10

10

F1

0.5 0.2

0.6 15

OC3

PRef

0

0.6 15

3

15

15

0.5

0.1 0

0.4

20

20

0

PC1

20

0.3 −0.5

−0.1

25

25

25

0.2 −0.2

0.1

30

5

10

15

20

Wimg2

25

30

0

30

30

−1

−0.3 5

10

15

20

25

DFFS: 4.5962e−017

30

5

10

15

20

25

30

WDIFS: 0.0068024

Figure 1. Example of tracking failure in background clutters by DFFS measure. the distance relative to the subspace, the change of subspace directly aﬀects the accuracy of DFFS measure, which aggravates the structural defect of DFFS measure in ISL-based tracking. In this paper, we propose an eﬃcient and eﬀective measure within the subspace, which is called weighted diﬀerence in feature space(WDIFS). It acts as complementary measure of the DFFS and enhances the ability to distinguish the true object from the object-similar background clutters. Considering the real-time performance, the eﬃciency of WDIFS is an other strength addressed in this paper. In the remainder of this paper, we first briefly illustrate the problem of DFFS measure and then present the details of WDIFS in Section 2. Experimental results and Conclusions are discussed in Section 3 and Section 4 respectively.

2 WDIFS for In-Subspace Meaure 2.1 Problem of DFFS First of all, we illustrate the failure example of conventional DFFS measure. The tracking framework used in our paper is same as IVT[5] which is composed of ISL and Particle Filter(PF). In the Figure.1, the top row images show the eight best fit tracking boxes which each one is corresponding to a particle in PF, and the top-down order in the color legend box(located in the right-bottom corner of each image) is exactly the descending order of its confidence. The middle and bottom row images visualize two highest DFFS-based confidence particles(Wimg1, Wimg2) at frame 164 and its corresponding DFFS and WDIFS. the values beneath the DFFS and the WDIFS images are the corresponding confidence values. It’s obvious that the Wimg2 is more proper to track. However, the confidence of the

Figure 2. Illustration of DFFS and inspiration for WDIFS Wimg1 is bigger than that of the Wimg2 in the view of DFFS. On the other hand, the confidence calculated by the WDIFS measure gives more meaningful values, which can compensate the defect of the DFFS. Followings are the our solution using the WDIFS.

2.2 Our solution to DFFS In the visual tracking literature, one of the common used assumption is that the change of object within one time step is trivial. Also, it is known that the principal component analysis(PCA) preserves the relative distances between two high dimensional data in lower dimensional space. Since the tracking object images are high dimensional data, hereby, within the assumption of the object images from any two consecutive frames are similar, we deduce that their projection on the subspace F is also similar, where the F is learned by the PCA or the ISL. Therefore, if we treat the current object image as a reference, then the projection of the next object image on the F should be close to that of reference. Thus, their coordinate vectors on the F is also similar. The change of the F alters their coordinate vectors, but, whatever the F is changed to under the ISL, their coordinate vectors should always be close to each other. It is the key idea of our approach. Figure.2 illustrates the cause of DFFS failure. OC1 , OC2 , OC3 , and OC4 are four object image classes respectively. Each of them represents a class of diﬀerent aspect of the object under illumination and pose change, and shape deformation. BC represents one class of background clutters. Here we assume that the object is varied from OC1 to OC4 and the subspace F is changed from F1 to F4 orderly. Suppose two particles are drawn, which one is C1 from BC, and the other is C2 from OC4 (both are marked by filling yellow color). After projecting them on F4 , the DFFS from

3493 3497 3481

C1 is shorter than the DFFS from C2 . But within the subspace, things are diﬀerent. PC1 , PC2 , and PRe f on F4 are the projections of C1 , C2 and the previous object(marked as ’Ref’) respectively. On F4 , PC2 is much more close to PRe f than PC1 . It inspired us that the distances between PRe f and PC1 , PC2 can act as complementary measure of the DFFS within the subspace and complete our measurement model by incorporating it with the DFFS measure. It’s also the main contribution of our measurement model. Followings are the detailed mathematical description of the WDIFS.

2.3 Mathematical Description of the WDIFS First, the previous estimated object image is regarded as the reference image I re f . The IVT-based tracking framework provides a candidate image set {|I i ∈ , i = 1, 2, ...n}, where I i is the i-th candidate image and n is the number of particles. And it also maintains an appearance subspace which provides a mean, ¯ Λ M , Φ M respectively. eigenvalues and eigenbases as I, i ¯ I and I are the image vectors which has the same size as I re f . Φ M is a matrix whose each column vector is an eigenvector and the subindex M means the number of the principal eigenvectors for the subspace. Λ M is the diagonal matrix and each diagonal term is the eigenvalue of corresponding eigenvector in Φ M . Given a subspace, we calculate the diﬀerence to its mean from the i-th candidate and the reference image as follows, ¯ E i = I i − I, (1) ¯ E re f = I re f − I. And then, we calculate each coordinate vector on the subspace as follows, C i = ΦTM · E i , C re f = ΦTM · E re f .

(2)

However, C i and C re f in (2) does not consider the importance of each principal eigenvector, which some are bearing large variations and some are bearing small variations. To consider this, we multiply each coordinate by the corresponding eigenvalue and they are defined as WC i , WC re f respectively in (3). WC i = Λ M · C i , WC re f = Λ M · C re f .

(3)

Based on our inference, if the i-th candidate is the current estimated target image, then WC i ≈ WC re f . Thus, our prototype measure in subspace is defined as follow, (4) WDi = WC i − WC re f .

Now, we need to observe the diﬀerence of WDi in image space, thus we reflect the diﬀerence in subspace basis, and we get the WDIFS as follow, WDIFS i = Φ M · WDi .

(5)

Assuming that the measure in the subspace is governed by a Gaussian distribution, then the probability for the i-th candidate at time t in the WDIFS measure is re f )= PWDIFS (Iti |It−1

1 exp(−WDIFS ti ). (2πη)1/2

(6)

Here η is the Gaussian noise. And our proposed measurement model is designed as follows, re f re f ) = PDFFS (Iti )· PWDIFS (I i |It−1 ), P(Iti |It−1 re f arg maxk {P(Itk |It−1 )}

Itre f = It−1

.

(7)

Compare to DFFS measure, WDIFS measure only adds additional computation by Equation (3, 4, 5, 6), which is very eﬃcient. In practice, the FPS(Frame Per Second) measured by our tracker only drops 5-10 precents compared to the DFFS measure-based tracker.

3

Experimental Results

To evaluate the performance of our proposed measurement model, we tested a number of videos recorded in outdoor and indoor environments where the targets change pose in diﬀerent lighting conditions. Experiments are done using the IVT[5] tracker under same configuration, and just change the measurement model for comparison. Figure.3 shows the successful tracking result with our proposed measurement model. The conventional DFFS-based measurement models(DFFS, DFFS+PPCA[5], and DFFS+robust norm[5]) fail to track before or at frame 615, where the doll is under large pose change. The bottom row visualized top two highest confidence particle context at frame 615. From the comparison of the corresponding DFFS and WDIFS confidence values, we conclude that WDIFS an eﬀective and discriminative measure, which enables to compensate the defects of the DFFS measure. Figure.4 shows the tracking results of the video captured under severe camera shaking and blurring by high image compression.

4. Conclusion In this paper, we introduced an eﬃcient measurement model to enhance the robustness of subspacebased measurement model. We proposed the weighted diﬀerence in feature space(WDIFS) measure within the 3494 3498 3482

100

350

615

1000

50

50

50

50

100

100

100

100

150

150

150

150

200

200

50

100

150

200

250

200

300

50

1050

100

150

200

250

200

300

50

1100

100

150

200

250

300

50

1250

50

50

50

50

100

100

100

100

150

150

150

150

200

200

200

50

100

150

200

250

300

50

100

150

200

250

50

100

150

0.7

300

300

50

100

150

200

250

300

0.8

0.6

0.4

5

0.3

4

10

10

0.7

10

0.4

10

15

0.2

15

20

0

20

25

−0.2

25

−0.4

30

0.2

0.6

2

15

0.1

15 0.5

0

0.4

250

5

0.2

0.5

250

0.8

0.3

10

15

20

5

0.4

0.6 15

200

0.9

6

5

5

0.8 10

200 1

0.5

0.9

150

200

300

1

5

100

1344

0 20

20

0.4

20

0.1

0

−0.1

0.3

0.3

−2 25

25

−0.2

25

25

0.2 −0.3

0.1

30

−0.1

0.2 −0.2

−4 30

30

0.1

30

30

−0.4

5

10

15

20

25

30

0

5

Wimg1

10

15

20

25

5

30

10

15

20

25

30

5

WDIFS: 6.6008e−018

DFFS: 7.3493e−016

10

15

20

25

30

0

Wimg4

(1) Context of Particle 1

−0.3 5

10

15

20

25

30

DFFS: 5.8266e−022

5

10

15

20

25

30

WDIFS: 0.16401

(2) Context of Particle 2

Figure 3. Top two rows: Tracking result of a animal doll with pose and illumination change. Bottom row: Visualization of top two highest confidence particles and its corresponding DFFS and WDIFS at frame 615.

50

10

50

100

100

150

150

200

200

250

11

250

300

References

300 50

50

subspace. Also we demonstrated that our measurement model performs more robust at the background cluttered environment. In future work, one avenue is to find a method for maintaining multi-reference image in a hierarchical way.

100

150

200

250

300

350

400

450

12

50

50

100

100

150

150

200

200

250

250

300

100

150

200

250

300

350

400

450

100

[1] M.A Turk, A.P. Pentland.: Face recognition using eigenfaces. In: CVPR. (1991) 586–591 [2] T.F. Cootes, G.J. Edwards, C.J. Taylor.: Active appearance model In: ECCV. (1998) 484–498 [3] M. J. Black, A. D. Jepson.: EigenTracking: Robust matching and tracking of articulated objects using a viewbased representation. In: ECCV. (1996) 329–342 [4] B. Moghaddam, A. Pentland.: Probabilistic Visual Learning for Object Representation. In: TPAMI. (1997) 696–710 [5] J. Lim, D. Ross, R.-S. Lin, M.-H. Yan.: Incremental Learning for Visual Tracking In: Neural Information Processing Systems(NIPS). (2005) 793–800 [6] M. Yang, Y. Wu.: Tracking non-stationary appearances and dynamic feature selection. In: CVPR. (2005) 1059– 1066 [7] K.C. Lee, D. Kriegman.: Online Learning of Probabilistic Appearance model Manifolds for Video-based Recognition and Tracking. In: CVPR. (2005) 852–859 [8] B. Zhang, W. Tian, Z. Jin.: Eﬃcient hybrid appearance model for object tracking with occlusion handling. In: Optical Engineering. (2007) [9] D. Ross, J. Lim, R.-S. Lin, M.-H. Yan.: Incremental Learning for Robust Visual Tracking. In: International Journal of Computer Vision. (2008) Vol. 77, No. 1-3

300 50

100

150

200

250

300

350

400

450

50

100

150

200

250

300

350

400

450

(a) DFFS-based measurement model 50

10

50

100

100

150

150

200

200

250

250

300

300 50

50

11

100

150

200

250

300

350

400

450

12

50

50

100

100

150

150

200

200

250

100

150

200

250

300

350

400

450

100

150

200

250

300

350

400

450

100

250

300

300 50

100

150

200

250

300

350

400

450

50

(b) Our proposed measurement model Figure 4. Comparison of tracking result. This video are captured with severe camera shaking and heavily blurred by high image compression.

3495 3499 3483

Enhanced Measurement Model for Subspace-Based ...

Visual tracking has become an essential technol- ogy of great interest in video-based computer vision applications, extending from augmented reality(AR), human-computer interface(HCI), and visual surveil- lance. Numerous and various approaches have been en- deavored to improve its performance and achieved great.

Download PDF

738KB Sizes 0 Downloads 248 Views

Report

Enhanced Measurement Model for Subspace-Based ...

Recommend Documents