Recovering the Topology of Multiple Cameras by ...

Viewer
Transcript

Recovering the Topology of Multiple Cameras by Finding Continuous Paths in a Trellis

1

Yinghao Cai1,2 , Kaiqi Huang1 , Tieniu Tan1 and Matti Pietik¨ainen2 National Laboratory of Pattern Recognition, Institute of Automation, Chinese Academy of Sciences 2 Machine Vision Group, Department of Electrical and Information Engineering, University of Oulu, Finland {kqhuang, tnt}@nlpr.ia.ac.cn, {yinghao.cai, mkp}@ee.oulu.ﬁ Abstract

In this paper, we propose an unsupervised method for recovering the topology of multiple cameras with non-overlapping ﬁelds of view. The nodes in the topology graph are deﬁned as entry/exit zones in each camera while the connectivity between nodes is inferred through ﬁnding continuous paths in a trellis where appearance information and temporal information of moving objects are encoded. Unlike previous methods which assume a single mode transition distribution between nodes, our method is capable of dealing with multi-modal transition situations when both cars and pedestrians are in the scene. Results on simulated and real-life datasets demonstrate the eﬀectiveness of the proposed method.

1

Introduction

With the ever increasing number of cameras involved in surveillance applications, it is becoming almost impossible for human operators to monitor and analyze dozens of video sequences eﬀectively and eﬃciently. Therefore, automatic methods are required to analyze video inputs collected by a network of cameras. The prerequisite for global activity monitoring in a network of cameras is to establish correspondences between observations across cameras. Signiﬁcant eﬀorts in multicamera surveillance have been devoted to consistently labeling objects with overlapping or non-overlapping ﬁelds of view. To achieve automatic labeling, learning the topology of multiple cameras is necessary. The topology graph of multiple cameras is an abstract network of nodes and connections [2]. The nodes in the topology graph are deﬁned either as entry/exit zones in each camera or single cameras. The connections in the topology graph are used to indicate the connectivity between nodes in a network of cameras. The topology graph of multiple cameras helps predict the

(a)

(b)

(c)

Figure 1: (a-c) Entry zones and exit zones for Camera 1, 2 and 3, respectively. Entry zones and exit zones are nodes in the topology graph. reappearance of moving objects. If one object disappears from the node of one camera’s ﬁeld of view, we only need to search the reappearance of this object in nodes that are linked to the node of disappearance. In addition, transition time characteristics of the connection between nodes can be inferred through observing objects moving across cameras. Therefore, by deﬁning topology graph, the computational complexity of tracking across multiple cameras is signiﬁcantly reduced. Many methods have been put forward to recover the topology between cameras. In Makris et al. [2], all pairs of arrival and departure events contribute to the distribution of transition time. A peak in the distribution of transition time indicates the most probable transition time and the connectivity between cameras. Niu et al. [3] further weighted the temporally correlating information by appearance information. Only those observations which look similar in appearance are used to derive the transition distribution. Methods mentioned above [2, 3] implicitly assume a single mode transition distribution which can not be adapted in situations when both cars and pedestrians are part of observations. In this paper, we propose an unsupervised method for recovering the topology of multiple cameras with non-overlapping ﬁelds of view. The nodes in the topology graph are deﬁned as entry/exit zones in each camera as in Figure 1. Some nodes work as both entry zones and exit zones. The entry zones and exit zones are obtained by grouping the start points and end points of tra-

Camera1 Exit node 2

Sort by time

Sort bycolor similarity

4

5

6

7

191-228

347-393

526-557

571-622

609-641

298-314

468-489

625-642

683-702

698-714

280-313

698-714

8

740-754

9

665-688

698-714

740-754

10

11

12

13

14

15

783-818

878-912

894-928

999-1024

1036-1075

1145-1189

891-918

969-988

986-1008

1081-1104

986-1008

969-988

1169-1198

1265-1291

1223-1250

1203-1224

266-295

Camera2 Entry node 4

Figure 2: Trellis between exit node 2 and entry node 4. The number under each object indicates the time of appearance and disappearance of objects under the ﬁeld of view of each camera measured in seconds. jectories through k-means clustering [2]. The connectivity between nodes in the topology graph is inferred through ﬁnding continuous paths in a trellis where appearance matching information and temporal information of moving objects are encoded. Our method is based on the assumption that, if two cameras are connected, some of moving objects will appear in the ﬁelds of view of both cameras under certain temporal constraints. Based on this assumption, we build a trellis between each pair of exit zone and entry zone. The inspiration of building a trellis comes from [6] where the trellis is used to mine repetitive clips from video databases. Unlike previous methods which assume a single mode transition distribution between nodes, our method is capable of dealing with multimodal transition situations when both cars and pedestrians are in the scene. Experimental results on a simulated dataset and real-life dataset demonstrate the eﬀectiveness of the proposed method.

2

Object Representation and Matching

As we mentioned before, we build a trellis between each pair of exit zone and entry zone. A typical trellis between exit node 2 and entry node 4 is shown in Figure 2. To build such a trellis, for each object disappears from node 2, we search for its best matches in the appearing objects at node 4 in a discrete time buﬀer. The similarity between two objects is measured based on their visual appearance. In this section, we present our object representation and matching method based on dominant colors. Note that, we present a simple yet eﬀective method to match objects across cameras to build the trellis. Any object representation and matching methods can be easily used alongside to improve the accuracy of object matching. Color spatial information is important in discriminating one object from another since objects may have similar color components with diﬀerent layouts. To this

end, we partition the blob of moving object into regular patches for localization of color components as shown in Figure 3. Then, the i-th patch of model image P is represented by the ﬁrst k dominant colors along with their frequencies of occurrence these colors appearing on the target: Rip = {(C1 , W1 ), ..., (Ck , Wk )}.

Figure 3: Partition the blob of moving object into regular patches. Each patch of one object is matched against its corresponding patch of another object as illustrated in Figure 3. The similarity measure between two patches Rip and Riq is deﬁned as: S im(Rip , Riq ) = min(P(Rip |Riq ), P(Riq |Rip ))

(1)

where P(Riq |Rip ) is the probability of observing dominant color representation of Riq in Rip which is deﬁned as: M ip

P(Riq |Rip )

=

n=1

i min{W p,n ,

Mqi

m=1

i i δ(C ip,n , Cq,m )Wq,m }

|N pi |

(2)

where |N pi | is the number of foreground pixels in the ith patch of model image P. M ip and Mqi are numbers of i is the frequency of dominant colors in each patch. W p,n the n-th color appearing in the i-th patch of model image i P. δ(C ip,n , Cq,m ) equals to 1 if two dominant colors are close enough according to the color distance deﬁned in [4]. P(Rip |Riq ) can be deﬁned similarly. So the similarity between model image P and query image Q is: min(N p ,Nq )

S im(P, Q) =

i=1

S im(Rip , Riq )

min(N p , Nq )

(3)

where N p and Nq are numbers of patches in image P and Q respectively. More details of dominant color representation can be found in our previous work [1].

3

Finding Continuous Paths in a Trellis

To build the trellis in Figure 2, for each object Oi1,2 which disappears from node 2 of camera 1, we search j for objects O2,4 appearing at node 4 of camera 2 in a discrete time buﬀer (0, T 1 ]. In Figure 2, the number under each object indicates the period of its appearance measured in seconds. The matches of object Oi1,2 are sorted by similarity in descending order which corresponds to the i-th column in the trellis. We denote the match-set i of object Oi1,2 by N1,2 which is deﬁned as: j j i N1,2 = {O2,4 : S im(Oi1,2 , O2,4 ) ≥ τ}, st :

0<

j entry(O2,4 )

−

exit(Oi1,2 )

≤ T1

(4) (5)

where entry(.) and exit(.) denote the time of appearance and disappearance of objects under the ﬁeld of view of i i+m each camera. For object vk ∈ N1,2 and object vl ∈ N1,2 , we deﬁne there is a path between vk and vl if they satisfy: i+m |(exit(vl ) − exit(vk )) − (exit(O1,2 ) − exit(Oi1,2 ))| ≤ T 2

(6)

where m ≥ 1. The process of deﬁning a path in trellis is illustrated in Figure 4. The value of m equals to one in Figure 4. For each step, we check existing paths to see if it can grow in the next step (m = 1). Since objects disappearing from the ﬁrst camera might not be observed in the second camera later, if the path can not grow in the current step, we move on to see if it can grow afterwards (m > 1). After discovering all the paths in the trellis, we can ﬁnd the longest path denoted by the red line in Figure 2. Then the transition probability between exit node 2 and entry node 4 is deﬁned according to: P=

Num o f people in the longest path Num o f people exiting f rom node 2

(7)

Similarly, we compute the transition probability between each pair of exit zone and entry zone. If the transition probability is larger than a threshold, then there is a link between the exit node and the entry node. Note that transition probability between two nodes are directional. For example, the transition probability of objects disappearing from node 2 and appearing at node 4 may diﬀer from the transition probability of objects disappearing from node 4 and appearing at node 2. In addition, the transition probability of one node to another also varies in a day due to varying motion patterns in a day.

4

Experimental Results and Analysis

We evaluate the eﬀectiveness of the proposed method on a simulated dataset and a real-life dataset.

Figure 4: Deﬁne the path in trellis. 4.1

Simulated Experiments

The simulation is based on the network structure shown in Figure 5. The number in each node indicates the ID of the node in the network. The departure time of one hundred moving objects in simulated experiments is generated by a Poisson (0.1) process [5]. The transition distribution between nodes follows a mixture of Gamma (16.67, 0.33) and Gamma (266.67, 1.33) corresponding to motion patterns of cars and pedestrians in the scene [5]. We also assume the transition time in each camera’s ﬁeld of view follow Gamma (15, 0.35) and Gamma (270, 1.35) for cars and pedestrians respectively. For simplicity, we do not include the appearance information in the building of trellis in simulated experiments. That is to say, all objects within a time buﬀer (0, T 1 ] are used to build the trellis. T 1 and T 2 are determined by characteristics of multi-modal transition where T 1 is set to 280 seconds and T 2 is set to 700 seconds in this case. Finally, the topology of the simulated network is fully recovered.

Figure 5: The inferred topology graph in simulated dataset. 4.2

Real-life Experiments

In real-life experiments, the experimental setup consists of two outdoor cameras and one indoor camera with non-overlapping ﬁelds of view as shown in Figure 1. We use one-hour-long videos where both pedestrians and bicyclists are observed to recover the topology between cameras. In single camera motion detection and tracking, Gaussian Mixture Model and Kalman ﬁlter are applied respectively. The appearance similarity threshold τ is set according to the inter-class and intra-class similarity. More details of dominant color representation and its evaluation can be found in our previous work [1]. The time window T 1 in Eq.(5) is set to 150 seconds. The time

node1

1

3 2

4 5

9

6 7

8

Intra-camera Connection Inter-camera Connection

Figure 6: The inferred topology graph in real-life dataset. We compare the proposed method with Makris et al. [2] and the appearance integrated work [3]. If there is a link between the exit node and entry node in the topology graph, the cross-correlation function of departure and arrival events will have a clear peak which indicates the most probable transition time between two nodes [2, 3]. Take the cross-correlation function between node 2 and node 4 for an example, there is an actual link between these two nodes in topology graph. Figure 8 shows cross correlations on our real-life dataset. Figure 8(a) and Figure 8(b) show cross correlations by [2] while Figure 8(c) and Figure 8(d) show cross correlations by the appearance integrated work [3]. The size of time window T 1 is also set to 150 seconds. T 1 is also a parameter in [2, 3]. In appearance integrated work, our proposed dominant color based method is used in object matching. In Makris et al. [2], all pairs of arrival and departure events contribute to the distribution of transition time. Furthermore, there are many visually very similar objects in the scene. No salient peak can be detected by both methods. In our proposed method, temporal constraints encoded in the trellis help disambiguate the

node2

node3

1

0

0.23

0

0.18

0.05

0.13

0

0.27

0.73

0

0.06

0.35

0.02

0

0.73

node4 node5

0

node6 node7 node8 node9

node3

0

0

node4

0.14

0.79

0

0

0.23

0.15

0

0.15

0.64

0.36

0.04

0.04

node5

0.17

0

0

0

0

0

0.72

0.04

node6

0.22

0.05

0

1

0

0.13

0.15

0

node7

0.08

0

0

0.08

0

0.16

node8

0

0

0

0

0

0

0.19

node9

0.01

0

0

0.01

0

0

0

0

0.11

0.89 0.81

1

Figure 7: Transition probabilities between nodes. 4

Count

6 4

5

50

2

100

150

0 0

Time

node2-node4

2

4

1

2 0 0

node4-node2

3

3 2

100

Time

150

0 0

1

0.5

1 50

node4-node2

1.5

Count

node2-node4

8

Count

10

Count

interval T 2 of successive departure events in Eq.(6) is set to 40 seconds. The proposed method successfully recovers the topology of camera network without any false links in Figure 6. Transition probabilities between nodes in Figure 6 are shown in Figure 7. Each entry at (node i, node j) denotes the transition probability of objects leaving node i and later appearing at node j. We also highlight intra-camera connections and valid inter-camera connections in Figure 7. The intra-camera connection probabilities are deﬁned as the percentage of people disappearing from one node and appearing at another node in the ﬁeld of view of each single camera. The threshold for valid inter-camera transition probability across cameras is set to 0.5. Note that in Figure 6, there are links from node 2−4, 4−5, 5−7 and 2−7. We do not know if there is an actual link from node 2 − 7 or through node 4 and node 5 in camera 2. In order to recover the topology, we have computed the transition probability between node 2−4, node 5−7 and node 2−7. The transition probability from node 2 − 4 is larger than the transition probability from node 2 − 7. The links with higher transition probability are used to derive the topology of camera networks. We can see that the link from node 2 − 7 is actually through node 4 and node 5 in camera 2 which is consistent with the ground truth.

node2

node1

50

100

Time

150

0 0

50

100

Time

Figure 8: The estimated cross-correlations. (a,b) Previous approach in [2]. (c,d) Previous approach in [3]. appearance matching process which improves the accuracy of topology estimation.

5

Conclusions

In this paper, we have presented an approach for automatically recovering the topology of multiple cameras with non-overlapping ﬁelds of view. Our method explicitly considers the correspondence between observations across cameras through ﬁnding continuous paths in a trellis. From the trellis, we can estimate both the spatial and temporal topology of a network of cameras. Future work will concentrate on extending the proposed method to a more general setting of camera network with overlapping and non-overlapping ﬁelds of view.

Acknowledgement This work was supported by the National Natural Science Foundation of China (Grant No.60736018, 60723005), National Hi-Tech Research and Development Program of China (2009AA01Z318), National Science Founding (60605014, 60875021).

References [1] Y. Cai, K. Huang, and T. Tan. Matching tracking sequences across widely separated cameras. In ICIP, pages 765 – 768, 2008. [2] D. Makris, T. Ellis, and J. Black. Bridging the gaps between cameras. In CVPR, volume 2, pages 205–210, 2004. [3] C. Niu and E. Grimson. Recovering non-overlapping network topology using far-ﬁeld vehicle tracking. In ICPR, pages 944–949, 2006. [4] M. Piccardi and E. D. Cheng. Multi-frame moving object track matching based on an incremental major color spectrum histogram matching algorithm. In CVPR, pages 19–27, 2005. [5] K. Tieu, G. Dalley, and W. L.Grimson. Inference of nonoverlapping camera network topology by measuring statistical dependence. In ICCV, pages 1842–1849, 2005. [6] J. Yuan, W. Wang, J. Meng, Y. Wu, and D. Li. Mining repetitive clips through ﬁnding continuous paths. In ACMMM, pages 289–292, 2007.

150

Recovering the Topology of Multiple Cameras by ...

possible for human operators to monitor and analyze ... stract network of nodes and connections [2]. ... nectivity between nodes in a network of cameras. The.

Download PDF

201KB Sizes 1 Downloads 230 Views

Report

Recovering the Topology of Multiple Cameras by ...

Recommend Documents