Matching Groups of People by Covariance Descriptor

Viewer
Transcript

2010 International Conference on Pattern Recognition

Matching Groups of People By Covariance Descriptor Yinghao Cai, Valtteri Takala and Matti Pietik¨ainen Machine Vision Group, Department of Electrical and Information Engineering University of Oulu, Finland {yinghao.cai, valtteri.takala, mkp}@ee.oulu.fi Abstract In this paper, we present a new solution to the problem of matching groups of people across multiple nonoverlapping cameras. Similar to the problem of matching individuals across cameras, matching groups of people also faces challenges such as variations of illumination conditions, poses and camera parameters. Moreover, people often swap their positions while walking in a group. In this paper, we propose to use covariance descriptor in appearance matching of group images. Covariance descriptor is shown to be a discriminative descriptor which captures both appearance and statistical properties of image regions. Furthermore, it presents a natural way of combining multiple heterogeneous features together with a relatively low dimensionality. Experimental results on two different datasets demonstrate the effectiveness of the proposed method.

1

Introduction

Due to the low costs and easy installment of video cameras, more and more cameras are employed in surveillance and traffic monitoring systems. The major task of a multi-camera surveillance system is to keep track of all objects in cameras with overlapping or nonoverlapping fields of view. Many efforts have been devoted to matching individuals across cameras or person re-identification [2, 4]. In person re-identification, we need to identify the same person under different views. In this paper, we consider a related problem of identifying groups of people across multiple non-overlapping cameras. In crowd scenarios, people often walk in groups. We refer “group” as a small number of people walking in close vicinity. The problem of associating groups of people was firstly proposed by Zheng et al. [9]. It was demonstrated that associating groups of people across cameras assists to infer long term activity analysis over a wide area. More importantly, associating groups of people helps to discriminate visually very similar indi1051-4651/10 $26.00 © 2010 IEEE DOI 10.1109/ICPR.2010.672

Figure 1. Challenges of matching groups of people across cameras. (a) groups with similar appearances. (b) relative positions between group members change often. (c) self occlusion within the group. (d) different viewpoints. viduals by providing a rich context [9]. Similar to the problem of associating individuals across cameras, associating groups of people also faces challenges such as variations of illumination conditions, poses and camera parameters across cameras. In addition, people often swap their positions while walking in a group. Due to different viewpoints and non-rigid characteristics of movements of groups, the appearances of groups may exhibit significant differences under different views. Some challenges of matching groups of people across cameras are demonstrated in Figure 1. The problems of matching individuals across cameras and groups across cameras are highly related. In matching individuals across cameras, Cai et al. [2] partitioned the image region of moving objects into regular patches. Each patch of one image is matched against the patch in roughly the same grid position of another image to compute a similarity measure. Since people may change their positions when walking in a group, patches of the same grid positions of one group under two views may not correspond to the same foreground region. To this end, Zheng et al. [9] proposed descriptors aiming to characterize ratio information within and between mul2736 2748 2744

tiple rectangular ring regions. It is assumed in Zheng et al. [9] that local variations of visual appearance in a rectangular ring of a given group image are stable. In this paper, we propose a novel method of matching groups of people. In the proposed method, the covariance descriptor [8] is employed to characterize the appearance of a group image. By incorporating illumination independent statistics, certain degrees of illumination invariance are implied in covariance descriptor. In addition, covariance descriptor is capable of dealing with image modifications on the same radial circle. Experimental results on two different datasets demonstrate the effectiveness of the proposed method.

2

Covariance Descriptor

Covariance descriptor was firstly proposed by Tuzel et al. [8] for object detection. Then, it achieved superior performance in object tracking [6] and license plate detection [7]. The wide applicability of the covariance descriptor lies in the fact that covariance of a distribution encodes enough discriminative information. Furthermore, it presents a natural way of combining multiple heterogeneous features together with a relatively low dimensionality. Due to the symmetry of covariance matrix, the covariance descriptor has only (d2 + d)/2 different values if each pixel is represented as a d-dimensional vector. In matching groups of people, each pixel in the image region is represented as a d-dimensional vector where both spatial term and appearance term are included: O1 (x, y) O2 (x, y) , , (1) fn =[x, y, O3 (x, y) O3 (x, y) ˆ ˆλi x (x, y)|, |E ˆλi y (x, y)|, arctan Eλi y (x, y) )i=1,2,3 ] (|E ˆλi x (x, y) E where (x, y) is the Euclidean location of pixel in the O2 (x,y) 1 (x,y) image region. O O3 (x,y) and O3 (x,y) are W-invariants at location (x, y) defined in opponent color space [3]: ⎞ ⎛ ⎞ ⎛ R−G √ O1 2 ⎟ R+G−2B ⎝ O2 ⎠ = ⎜ √ (2) ⎠. ⎝ 6 R+G+B O3 √

The covariance descriptor is a d × d symmetric matrix computed as: CR =

N 1 (fn − μ)(fn − μ)T N − 1 n=1

(4)

where N is the number of pixels in the image region. μ is the mean vector of d-dimensional feature points {fn }n=1,2...N . The diagonal elements of covariance descriptor denote the variance of their corresponding features while the off-diagonal elements represent how two features vary together. Instead of original R,G,B color values and intensity derivatives defined in [8], we represent each pixel with illumination independent statistics in opponent color space which implies that certain degrees of illumination invariance are included in the covariance descriptor. Winvariant is supposed to be insensitive to light intensity change and light intensity shift [3]. Note that the Euclidean coordinates in Eq.(1) make the covariance descriptor rotation variant. Rotation invariance can be added into the computation of covariance descriptor by replacing Euclidean coordinates (x, y) in Eq.(1) by relative distance between pixel location (x, y) to the image center (x0 , y0 ). Or if no spatial term and orientations of points are included in Eq.(1), the computation of covariance descriptor will not relate to the position of elements inside the image region. It is a tradeoff that adding invariance into group representation means losing discriminativeness to some extent. Various feature combinations for covariance descriptor are tested on two datasets in the following section. The dissimilarity of two covariance matrices C1 and C2 is computed according to [8]:

m 2 dist(C1 , C2 ) = ln λk (C1 , C2 )

(5)

k=1

where {λk (C1 , C2 )}k=1,2,...m are the generalized eigenvalues of C1 and C2 . Finally, for each group image under one camera, we measure the distance (Eq. 5) from all other group images under another camera. The correspondences between group images are determined according to nearest neighbor classifier.

3

ˆλi y in Eq.(1) are x and y ˆλi x and E In addition, E derivatives of color gradient invariants. i = 1, 2, 3 correspond to the intensity, yellow-blue channel and redgreen channel in opponent color model respectively [1]:

⎤ ⎛ 0.06 0.63 0.27 ⎞ ⎡ ⎡ ˆ ⎤ Eλ1 (x, y) R(x, y) ⎜ ⎟ ˆλ2 (x, y) ⎦ = ⎝ 0.30 0.04 − 0.35⎠ ⎣ G(x, y) ⎦ . ⎣ E ˆλ3 (x, y) B(x, y) E 0.34 − 0.60 0.17 (3)

3

Experimental Results and Analysis

We evaluate the effectiveness of the proposed method on two different datasets. The CASIA dataset consists of 44 pairs of group images under two outdoor views. The OULU dataset consists of 20 pairs of group images selected from five indoor views. Background subtraction is firstly performed to obtain foreground regions. Group images are determined by head-shoulder 2745 2749 2737

detection [5]. A foreground region is assumed to be a group image if it contains minimum of two persons. The obtained group images are of different sizes. Only foreground pixels in image regions are used to compute the covariance descriptor for group image representation. In this paper, cumulative matching characteristic curve (CMC) and synthetic disambiguation rate curve (SDR) [4] are used to evaluate the performance of matching groups of people. In CMC, rank i performance is the rate that the correct group is in the top i of the retrieved list. Since the performance of identifying individuals or groups is closely related to the number of people in the database, SDR [4] is capable of converting a performance metric of size N to size M . We also demonstrate that associating groups of people helps discriminate visually very similar individuals in person re-identification.

Table 1. Different feature combinations for covariance descriptor. Features

Index

ˆ i E λ y 1 O2 ˆ i ˆ i {x, y, O ˆ i )i=1,2,3 } O3 , O3 , (|Eλ x |, |Eλ y |, arctan E λ x ˆ Eλi y 1 O2 ˆ i ˆ i {d0 , O ˆ i )i=1,2,3 } O3 , O3 , (|Eλ x| , |Eλ y| , arctan E λ x ˆ i E 1 O2 ˆλi x |, |E ˆλi y |, arctan λ y )i=1,2,3 } {O , , (| E ˆ O3 O3 E

C3

ˆ ˆλi x |, |E ˆλi y |, arctan Eλi y )i=1,2,3 } {x, y, R, G, B, (|E ˆ i E

C4

λ x

{x, y, R, G, B, |Ix |, |Iy |, |Ixx |, |Iyy |} {R, G, B, |Ix |, |Iy |, |Ixx |, |Iyy |}

C5 C6

ances. Figure 3 shows some examples of associating groups of people using C1 method. Correct matches are highlighted by red boxes in Figure 3. It is demonstrated that covariance descriptor is capable of differentiating visually very similar group images. 1

Recognition Percentage

0.8 0.7

C1

0.6

C2

0.5

C3

0.4

C4

0.3

C5

0.2

C6

0.1

Histogram Principal Axis Histogram

0

Synthetic Disambiguation Rate(SDR) Curve 1

0.9

5

10

15

20

25

Rank Score

30

35

40

Synthetic Disambiguation Rate

Evaluation of Group Association on CASIA dataset

To evaluate the effectiveness of covariance descriptor, we test different feature combinations for covariance descriptor given in Table 1. In Table 1, d0 = (x − x0 )2 + (y − y0 )2 is the quantized relative distance of pixel (x, y) with respect to the image center (x0 , y0 ). The CMC and SDR curves are shown in Figures 2(a) and (b), respectively. It is shown in Figure 2 that by incorporating a spatial term such as the pixel location (x, y) in the image region or relative distance to the image center d0 , the overall performance is improved. A possible reason is there are no significant different view angles and switches of relative positions in group images of the CASIA database. If we replace (x, y) by d0 as in C2 , the representation of group images will be robust against modifications on the same radial circle. This characteristic is more suitable in case of scenarios where people change their positions often as in OULU database. Moreover, illumination independent statistics are demonstrated to be important when matching two images under different cameras. We compare our proposed covariance based method with two baseline methods, histogram and principal axis histogram [4]. In computing histograms, we set the number of quantization levels to 16. Histograms are compared by L1 distance in YCbCr colorspace. In principal axis histogram, all group images are normalized to 80 × 80 pixels firstly. Then, 8 regions are chosen in 10 pixel increments to cover the horizontal stripe of the image [4]. We can see from Figure 2 that solely color information does not perform well on CASIA dataset since there are many groups with similar color appear-

C2

λi x

Cumulative Matching Characteristic(CMC) Curve

3.1

C1

0.9 0.8

C1

0.7

C2

0.6

C3

0.5

C4 C5

0.4

C6 Histogram Principal Axis Histogram

0.3 0.2 1

2

3

4

5

6

7

8

9

Number of Targets

Figure 2. CMC and SDR curves for associating groups of people on CASIA dataset. reference image

top5matchesorderedfrom lefttoright

Figure 3. Examples of group association on CASIA dataset.

3.2

Evaluation of Group Association on OULU dataset

We follow the same procedures discussed above to evaluate the performance of group association on OULU dataset. As we mentioned before, the group images in OULU dataset are selected from five indoor cameras with non-overlapping fields of view. Group 2746 2750 2738

10

Cumulative Matching Characteristic(CMC) Curve

0.7

C1

0.6

C2

0.5

C3

0.4

C4

0.3

C5

0.2

C6

0.1

Histogram Principal Axis Histogram

0

5

10

15

Rank Score

20

Cumulative Matching Characteristic(CMC) Curve

Synthetic Disambiguation Rate(SDR) Curve 1

1

0.9

0.9

0.8

C1

0.7

C2

0.6

C3 C4

0.5

C5

0.4

C6

0.3

Histogram Principal Axis Histogram

0.2 1

Recognition Percentage

Recognition Percentage

0.8

Synthetic Disambiguation Rate

1 0.9

2

3

4

5

6

7

8

9

0.8 0.7 0.6 0.5 0.4 0.3 0.2

with group context without group context

0.1

10

Number of Targets

0

5

10

15

20

25

30

35

40

Rank Score

Figure 6. Integrate group information for person re-identification.

Figure 4. CMC and SDR curves for associating groups of people on OULU dataset. reference image

top5matchesorderedfrom lefttoright

is demonstrated in Figure 6 that including group information evidently improves the performance of person re-identification.

4

Conclusions

In this paper, we have proposed a solution to the problem of matching groups of people across multiple cameras with non-overlapping fields of view. Experimental results demonstrate the effectiveness of the proposed method. Future work will focus on exploiting group information in understanding long term activities over a wide area. Figure 5. Examples of group association on OULU dataset. images taken from front views and back views present more challenges to the problem. In addition, people swap their positions often in this dataset. The CMC and SDR curves are shown in Figures 4(a) and (b), respectively. C2 method in Table 1 is capable of dealing with modifications on the same radial circle which performs best in comparison with other methods. Figure 5 shows some examples of associating groups of people using C2 method on OULU dataset.

3.3

Acknowledgement This research was supported by Finnish Funding Agency for Technology and Innovation (TEKES) and European Regional Development Fund.

References [1] G. J. Burghouts and J.-M. Geusebroek. Performance evaluation of local colour invariants. CVIU, 113(1):48– 62, 2009. [2] Y. Cai, K. Huang, and T. Tan. Matching tracking sequences across widely separated cameras. In ICIP, pages 765 – 768, 2008. [3] K. E. de Sande, T. Gevers, and C. G.M.Snoek. Evaluation of color descriptors for object and scene recognition. In CVPR, pages 1–8, 2008. [4] D. Gray and H. Tao. Viewpoint invariant pedestrian recognition with an ensemble of localized features. In ECCV, pages 262–275, 2008. [5] M. Li, Z. Zhang, K. Huang, and T. Tan. Rapid and robust human detection and tracking based on omega-shape features. In ICIP, 2009. [6] X. Li, W. Hu, Z. Zhang, X. Zhang, M. Zhu, and J. Cheng. Visual tracking via incremental log-euclidean riemannian subspace learning. In CVPR, pages 1–8, 2008. [7] F. Porikli and T. Kocak. Robust license plate detection using covariance descriptor in a neural network framework. In ICVSS, page 107, 2006. [8] O. Tuzel, F. Porikli, and P. Meer. Region covariance: A fast descriptor for detection and classification. In ECCV, pages 589–600, 2006. [9] W.-S. Zheng, S. Gong, and T. Xiang. Associating groups of people. In BMVC, 2009.

Integrate Group Information for Person Re-identification

In this section, group information is exploited as a contextual cue for person re-identification in CASIA dataset. More specifically, for person image Ia under one camera, we match it against all the person images Ib under another camera. Similar to [9], the distance between two person images is computed as: dist(Ia , Ib ) = ω1 dist(Pa , Pb ) + ω2 dist(Ga , Gb ) (6) where dist(Pa , Pb ) is the distance between covariance descriptor of individuals and dist(Ga , Gb ) is the distance between two corresponding group images. ω1 and ω2 are weights which are set to 0.5 and 0.5, respectively. Top rank 40 performance of C1 method without and with group context are shown in Figure 6. It 2747 2751 2739

Fingerprint Matching With Rotation-Descriptor Texture ...

file streams and access to the file descriptor -

descriptor page.pdf

Shrinkage Estimation of High Dimensional Covariance Matrices

mixtures of inverse covariances: covariance ... - Vincent Vanhoucke

Covariance Matrix.pdf

JOB DESCRIPTOR Supervisor of Food Service.pdf

Covariance-based subdivision of the human striatum ...

sample questions for descriptor page.pdf

Effect of Noise Covariance Matrices in Kalman Filter - IJEECS

Estimating Covariance Models for Collaborative ...

$man-105\covariance-matrix-excel.pdf$

man-105\covariance-matrix-excel.pdf

Bootstrapping integrated covariance matrix estimators ...

Fast Covariance Computation and ... - Research at Google

Dimensional Reduction, Covariance Modeling, and ...

Manipulation of Stable Matching Mechanisms ...

Stable Matching With Incomplete Information - University of ...

Fast Prefix Matching of Bounded Strings - gsf

SPAM and full covariance for speech recognition.