Recovering the Topology of Multiple Cameras by Finding Continuous Paths in a Trellis
1
Yinghao Cai1,2 , Kaiqi Huang1 , Tieniu Tan1 and Matti Pietik¨ainen2 National Laboratory of Pattern Recognition, Institute of Automation, Chinese Academy of Sciences 2 Machine Vision Group, Department of Electrical and Information Engineering, University of Oulu, Finland {kqhuang, tnt}@nlpr.ia.ac.cn, {yinghao.cai, mkp}@ee.oulu.fi Abstract
In this paper, we propose an unsupervised method for recovering the topology of multiple cameras with non-overlapping fields of view. The nodes in the topology graph are defined as entry/exit zones in each camera while the connectivity between nodes is inferred through finding continuous paths in a trellis where appearance information and temporal information of moving objects are encoded. Unlike previous methods which assume a single mode transition distribution between nodes, our method is capable of dealing with multi-modal transition situations when both cars and pedestrians are in the scene. Results on simulated and real-life datasets demonstrate the effectiveness of the proposed method.
1
Introduction
With the ever increasing number of cameras involved in surveillance applications, it is becoming almost impossible for human operators to monitor and analyze dozens of video sequences effectively and efficiently. Therefore, automatic methods are required to analyze video inputs collected by a network of cameras. The prerequisite for global activity monitoring in a network of cameras is to establish correspondences between observations across cameras. Significant efforts in multicamera surveillance have been devoted to consistently labeling objects with overlapping or non-overlapping fields of view. To achieve automatic labeling, learning the topology of multiple cameras is necessary. The topology graph of multiple cameras is an abstract network of nodes and connections [2]. The nodes in the topology graph are defined either as entry/exit zones in each camera or single cameras. The connections in the topology graph are used to indicate the connectivity between nodes in a network of cameras. The topology graph of multiple cameras helps predict the
(a)
(b)
(c)
Figure 1: (a-c) Entry zones and exit zones for Camera 1, 2 and 3, respectively. Entry zones and exit zones are nodes in the topology graph. reappearance of moving objects. If one object disappears from the node of one camera’s field of view, we only need to search the reappearance of this object in nodes that are linked to the node of disappearance. In addition, transition time characteristics of the connection between nodes can be inferred through observing objects moving across cameras. Therefore, by defining topology graph, the computational complexity of tracking across multiple cameras is significantly reduced. Many methods have been put forward to recover the topology between cameras. In Makris et al. [2], all pairs of arrival and departure events contribute to the distribution of transition time. A peak in the distribution of transition time indicates the most probable transition time and the connectivity between cameras. Niu et al. [3] further weighted the temporally correlating information by appearance information. Only those observations which look similar in appearance are used to derive the transition distribution. Methods mentioned above [2, 3] implicitly assume a single mode transition distribution which can not be adapted in situations when both cars and pedestrians are part of observations. In this paper, we propose an unsupervised method for recovering the topology of multiple cameras with non-overlapping fields of view. The nodes in the topology graph are defined as entry/exit zones in each camera as in Figure 1. Some nodes work as both entry zones and exit zones. The entry zones and exit zones are obtained by grouping the start points and end points of tra-
Camera1 Exit node 2
Sort by time
Sort bycolor similarity
4
5
6
7
191-228
347-393
526-557
571-622
609-641
298-314
468-489
625-642
683-702
698-714
280-313
698-714
8
740-754
9
665-688
698-714
740-754
10
11
12
13
14
15
783-818
878-912
894-928
999-1024
1036-1075
1145-1189
891-918
969-988
986-1008
1081-1104
986-1008
969-988
1169-1198
1265-1291
1223-1250
1203-1224
266-295
Camera2 Entry node 4
Figure 2: Trellis between exit node 2 and entry node 4. The number under each object indicates the time of appearance and disappearance of objects under the field of view of each camera measured in seconds. jectories through k-means clustering [2]. The connectivity between nodes in the topology graph is inferred through finding continuous paths in a trellis where appearance matching information and temporal information of moving objects are encoded. Our method is based on the assumption that, if two cameras are connected, some of moving objects will appear in the fields of view of both cameras under certain temporal constraints. Based on this assumption, we build a trellis between each pair of exit zone and entry zone. The inspiration of building a trellis comes from [6] where the trellis is used to mine repetitive clips from video databases. Unlike previous methods which assume a single mode transition distribution between nodes, our method is capable of dealing with multimodal transition situations when both cars and pedestrians are in the scene. Experimental results on a simulated dataset and real-life dataset demonstrate the effectiveness of the proposed method.
2
Object Representation and Matching
As we mentioned before, we build a trellis between each pair of exit zone and entry zone. A typical trellis between exit node 2 and entry node 4 is shown in Figure 2. To build such a trellis, for each object disappears from node 2, we search for its best matches in the appearing objects at node 4 in a discrete time buffer. The similarity between two objects is measured based on their visual appearance. In this section, we present our object representation and matching method based on dominant colors. Note that, we present a simple yet effective method to match objects across cameras to build the trellis. Any object representation and matching methods can be easily used alongside to improve the accuracy of object matching. Color spatial information is important in discriminating one object from another since objects may have similar color components with different layouts. To this
end, we partition the blob of moving object into regular patches for localization of color components as shown in Figure 3. Then, the i-th patch of model image P is represented by the first k dominant colors along with their frequencies of occurrence these colors appearing on the target: Rip = {(C1 , W1 ), ..., (Ck , Wk )}.
Figure 3: Partition the blob of moving object into regular patches. Each patch of one object is matched against its corresponding patch of another object as illustrated in Figure 3. The similarity measure between two patches Rip and Riq is defined as: S im(Rip , Riq ) = min(P(Rip |Riq ), P(Riq |Rip ))
(1)
where P(Riq |Rip ) is the probability of observing dominant color representation of Riq in Rip which is defined as: M ip
P(Riq |Rip )
=
n=1
i min{W p,n ,
Mqi
m=1
i i δ(C ip,n , Cq,m )Wq,m }
|N pi |
(2)
where |N pi | is the number of foreground pixels in the ith patch of model image P. M ip and Mqi are numbers of i is the frequency of dominant colors in each patch. W p,n the n-th color appearing in the i-th patch of model image i P. δ(C ip,n , Cq,m ) equals to 1 if two dominant colors are close enough according to the color distance defined in [4]. P(Rip |Riq ) can be defined similarly. So the similarity between model image P and query image Q is: min(N p ,Nq )
S im(P, Q) =
i=1
S im(Rip , Riq )
min(N p , Nq )
(3)
where N p and Nq are numbers of patches in image P and Q respectively. More details of dominant color representation can be found in our previous work [1].
3
Finding Continuous Paths in a Trellis
To build the trellis in Figure 2, for each object Oi1,2 which disappears from node 2 of camera 1, we search j for objects O2,4 appearing at node 4 of camera 2 in a discrete time buffer (0, T 1 ]. In Figure 2, the number under each object indicates the period of its appearance measured in seconds. The matches of object Oi1,2 are sorted by similarity in descending order which corresponds to the i-th column in the trellis. We denote the match-set i of object Oi1,2 by N1,2 which is defined as: j j i N1,2 = {O2,4 : S im(Oi1,2 , O2,4 ) ≥ τ}, st :
0<
j entry(O2,4 )
−
exit(Oi1,2 )
≤ T1
(4) (5)
where entry(.) and exit(.) denote the time of appearance and disappearance of objects under the field of view of i i+m each camera. For object vk ∈ N1,2 and object vl ∈ N1,2 , we define there is a path between vk and vl if they satisfy: i+m |(exit(vl ) − exit(vk )) − (exit(O1,2 ) − exit(Oi1,2 ))| ≤ T 2
(6)
where m ≥ 1. The process of defining a path in trellis is illustrated in Figure 4. The value of m equals to one in Figure 4. For each step, we check existing paths to see if it can grow in the next step (m = 1). Since objects disappearing from the first camera might not be observed in the second camera later, if the path can not grow in the current step, we move on to see if it can grow afterwards (m > 1). After discovering all the paths in the trellis, we can find the longest path denoted by the red line in Figure 2. Then the transition probability between exit node 2 and entry node 4 is defined according to: P=
Num o f people in the longest path Num o f people exiting f rom node 2
(7)
Similarly, we compute the transition probability between each pair of exit zone and entry zone. If the transition probability is larger than a threshold, then there is a link between the exit node and the entry node. Note that transition probability between two nodes are directional. For example, the transition probability of objects disappearing from node 2 and appearing at node 4 may differ from the transition probability of objects disappearing from node 4 and appearing at node 2. In addition, the transition probability of one node to another also varies in a day due to varying motion patterns in a day.
4
Experimental Results and Analysis
We evaluate the effectiveness of the proposed method on a simulated dataset and a real-life dataset.
Figure 4: Define the path in trellis. 4.1
Simulated Experiments
The simulation is based on the network structure shown in Figure 5. The number in each node indicates the ID of the node in the network. The departure time of one hundred moving objects in simulated experiments is generated by a Poisson (0.1) process [5]. The transition distribution between nodes follows a mixture of Gamma (16.67, 0.33) and Gamma (266.67, 1.33) corresponding to motion patterns of cars and pedestrians in the scene [5]. We also assume the transition time in each camera’s field of view follow Gamma (15, 0.35) and Gamma (270, 1.35) for cars and pedestrians respectively. For simplicity, we do not include the appearance information in the building of trellis in simulated experiments. That is to say, all objects within a time buffer (0, T 1 ] are used to build the trellis. T 1 and T 2 are determined by characteristics of multi-modal transition where T 1 is set to 280 seconds and T 2 is set to 700 seconds in this case. Finally, the topology of the simulated network is fully recovered.
Figure 5: The inferred topology graph in simulated dataset. 4.2
Real-life Experiments
In real-life experiments, the experimental setup consists of two outdoor cameras and one indoor camera with non-overlapping fields of view as shown in Figure 1. We use one-hour-long videos where both pedestrians and bicyclists are observed to recover the topology between cameras. In single camera motion detection and tracking, Gaussian Mixture Model and Kalman filter are applied respectively. The appearance similarity threshold τ is set according to the inter-class and intra-class similarity. More details of dominant color representation and its evaluation can be found in our previous work [1]. The time window T 1 in Eq.(5) is set to 150 seconds. The time
node1
1
3 2
4 5
9
6 7
8
Intra-camera Connection Inter-camera Connection
Figure 6: The inferred topology graph in real-life dataset. We compare the proposed method with Makris et al. [2] and the appearance integrated work [3]. If there is a link between the exit node and entry node in the topology graph, the cross-correlation function of departure and arrival events will have a clear peak which indicates the most probable transition time between two nodes [2, 3]. Take the cross-correlation function between node 2 and node 4 for an example, there is an actual link between these two nodes in topology graph. Figure 8 shows cross correlations on our real-life dataset. Figure 8(a) and Figure 8(b) show cross correlations by [2] while Figure 8(c) and Figure 8(d) show cross correlations by the appearance integrated work [3]. The size of time window T 1 is also set to 150 seconds. T 1 is also a parameter in [2, 3]. In appearance integrated work, our proposed dominant color based method is used in object matching. In Makris et al. [2], all pairs of arrival and departure events contribute to the distribution of transition time. Furthermore, there are many visually very similar objects in the scene. No salient peak can be detected by both methods. In our proposed method, temporal constraints encoded in the trellis help disambiguate the
node2
node3
1
0
0.23
0
0.18
0.05
0.13
0
0.27
0.73
0
0.06
0.35
0.02
0
0.73
node4 node5
0
node6 node7 node8 node9
node3
0
0
node4
0.14
0.79
0
0
0.23
0.15
0
0.15
0.64
0.36
0.04
0.04
node5
0.17
0
0
0
0
0
0.72
0.04
node6
0.22
0.05
0
1
0
0.13
0.15
0
node7
0.08
0
0
0.08
0
0.16
node8
0
0
0
0
0
0
0.19
node9
0.01
0
0
0.01
0
0
0
0
0.11
0.89 0.81
1
Figure 7: Transition probabilities between nodes. 4
Count
6 4
5
50
2
100
150
0 0
Time
node2-node4
2
4
1
2 0 0
node4-node2
3
3 2
100
Time
150
0 0
1
0.5
1 50
node4-node2
1.5
Count
node2-node4
8
Count
10
Count
interval T 2 of successive departure events in Eq.(6) is set to 40 seconds. The proposed method successfully recovers the topology of camera network without any false links in Figure 6. Transition probabilities between nodes in Figure 6 are shown in Figure 7. Each entry at (node i, node j) denotes the transition probability of objects leaving node i and later appearing at node j. We also highlight intra-camera connections and valid inter-camera connections in Figure 7. The intra-camera connection probabilities are defined as the percentage of people disappearing from one node and appearing at another node in the field of view of each single camera. The threshold for valid inter-camera transition probability across cameras is set to 0.5. Note that in Figure 6, there are links from node 2−4, 4−5, 5−7 and 2−7. We do not know if there is an actual link from node 2 − 7 or through node 4 and node 5 in camera 2. In order to recover the topology, we have computed the transition probability between node 2−4, node 5−7 and node 2−7. The transition probability from node 2 − 4 is larger than the transition probability from node 2 − 7. The links with higher transition probability are used to derive the topology of camera networks. We can see that the link from node 2 − 7 is actually through node 4 and node 5 in camera 2 which is consistent with the ground truth.
node2
node1
50
100
Time
150
0 0
50
100
Time
Figure 8: The estimated cross-correlations. (a,b) Previous approach in [2]. (c,d) Previous approach in [3]. appearance matching process which improves the accuracy of topology estimation.
5
Conclusions
In this paper, we have presented an approach for automatically recovering the topology of multiple cameras with non-overlapping fields of view. Our method explicitly considers the correspondence between observations across cameras through finding continuous paths in a trellis. From the trellis, we can estimate both the spatial and temporal topology of a network of cameras. Future work will concentrate on extending the proposed method to a more general setting of camera network with overlapping and non-overlapping fields of view.
Acknowledgement This work was supported by the National Natural Science Foundation of China (Grant No.60736018, 60723005), National Hi-Tech Research and Development Program of China (2009AA01Z318), National Science Founding (60605014, 60875021).
References [1] Y. Cai, K. Huang, and T. Tan. Matching tracking sequences across widely separated cameras. In ICIP, pages 765 – 768, 2008. [2] D. Makris, T. Ellis, and J. Black. Bridging the gaps between cameras. In CVPR, volume 2, pages 205–210, 2004. [3] C. Niu and E. Grimson. Recovering non-overlapping network topology using far-field vehicle tracking. In ICPR, pages 944–949, 2006. [4] M. Piccardi and E. D. Cheng. Multi-frame moving object track matching based on an incremental major color spectrum histogram matching algorithm. In CVPR, pages 19–27, 2005. [5] K. Tieu, G. Dalley, and W. L.Grimson. Inference of nonoverlapping camera network topology by measuring statistical dependence. In ICCV, pages 1842–1849, 2005. [6] J. Yuan, W. Wang, J. Meng, Y. Wu, and D. Li. Mining repetitive clips through finding continuous paths. In ACMMM, pages 289–292, 2007.
150