Dept. of Electrical and Computer Engineering Dept. of Computer Science and Engineering Michigan State University East Lansing, MI 48824, USA

‡

Abstract— Network surveillance systems provides real time monitoring of target area and target objects. The development of wireless communication and sensing technology make it possible to deploy networked surveillance systems in various environments. Networked surveillance systems provide an extended perception and distributed sensing capability in monitored environments through the use of multiple networked sensors. We consider a surveillance network where the sensors are static. The task of tracking targets in a surveillance network is challenging because of the following reasons: (1) The location of the sensors need to be optimally deployed. (2) the view of the sensors need to be optimized so that at a given time the targets are shown with a discernable resolution for feature identification. (3) it is important to devise stable control algorithms for accomplishing the surveillance task. When the target moves, it is important to switch the sensing task between sensors to maintain the visibility of the target with adequate resolution. This paper presents a novel method to deploy static sensors given a target region and a dynamic programming method to optimally switch sensors when the target moves. Finally, simulation results demonstrate the efficacy of the proposed approach for tracking targets over an area.

I. I NTRODUCTION Networked surveillance systems have received much attention from the research community due to their many pervasive applications [1]. Technological advances in wireless networking and distributed robotics have been leading to increasing research on distributed sensing applications using wireless sensor networks. Infrastructure-less surveillance and monitoring are important applications of such rapidly deplorable sensor networks. Various types of sensors with varying sensing modalities such as cameras, infrared detector arrays, laser rangerfinders, omnidirectional acoustic sensors etc., can be instantly deployed in hostile environments, inaccessible terrain and disaster relief operations to obtain vital reconnaissance information about the area being surveyed. In a surveillance network, the surveillance nodes are equipped with sensors and communication capabilities. Due to the infrastructure less architecture, the surveillance network can be deployed into hostile or ad hoc environments where common networks are impossible to be setup. Thus, wireless surveillance networks have many advantages in carrying out area surveillance or object tracking tasks. Video feedback is an essential component of the surveillance system. A single human operator cannot effectively monitor a large area by looking at dozens of monitors

showing raw video output. That amount of sensory overload virtually guarantees that information will be ignored and requires a prohibitive amount of transmission bandwidth. In [2] an approach is presented which provides an interactive, graphical user interface (GUI) showing a synthetic view of the environment, upon which the system displays dynamic agents representing people and vehicles. This approach has the benefit that visualization of scene events is no longer tied to the original resolution and viewpoint of a single video sensor and the operator can therefore infer proper spatial relationships between multiple objects and scene features. Another program called the the Modular Semi-Automated Forces (ModSAF) program provides a 2-D graphical interface similar to the VSAM GUI [3], with the ability to insert computer-generated human and vehicle avatars that provide simulated opponents for training [4]. The ModStealth program generates an immersive, realistic 3-D visualization of texture-mapped scene geometry and computer-generated avatars [5]. Although automatic image analysis and video understanding tools [2] can be used to facilitate identification of targets and activation of alarms or logs for certain surveillance tasks, the operator needs the video feedback to make decisions about the tracking task which may not have been pre-programmed or to independently task the network based on the current feedback received from the sensors. The video feedback provided to the operator can be transmitted over either analog or digital channels [6]. The use of analog modulating and transmission techniques for surveillance applications has been reported in [7]. Receiving digital video feedback over an IP-based network from the networked camera sensors has also received much attention with the development of the various video compression and communication protocols [8], [9], [10], [11], [12]. Since multiple cameras are deployed to track the identified targets, multiple, concurrent feedback video streams maybe required for monitoring the target. These sensors initiating these streams will be changing from time to time as the target moves out of range of the current sensors tracking it. However, providing multiple unnecessary (unrelated to the task) video feedback streams often causes loss of attention span of the operator and makes it hard to keep track of the various activities over the cameras. Hence only video streams from relevant sensors should be presented to the operator

on a per activity basis. This is done through automatic or manual switching of the camera streams that are presented to the operator. In this research we focus on the problem of optimally deploying multiple sensors to maximize the ability to monitor the target location. Also given a moving target, the proposed approach can predict the motion of the target and thus dynamically allocate an optimal switching strategy among the sensors. The remaining paper is organized as follows: Section II provides a brief introduction to networked surveillance systems. Section III provides an introduction to dynamic systems and discusses optimal control of a discrete-time finite-state as adynamic programming problem. Section IV discusses switched video feedback and provides a dynamic systems model of a switched video feedback system. It further provides examples of switched video feedback algorithms and develops an assessment metric based on the camera locations. Section VI proposes a dynamic programming based method to optimally minimize switching cost and maintain the target resolution. Simulation results of the proposed approach are provided in section VII. Finally, the conclusions and discussions are provided in section VIII. II. N ETWORKED S URVEILLANCE S YSTEMS Networked surveillance systems have received much attention from the research community due to their many pervasive applications [1]. Based on their mobility and rapid-deployment capability, the operation of an MSN’s can be divided into two separate phases, namely deployment and surveillance. Infrastructure-less rapid deployment is an important characteristic of MSN’s and many research efforts have been dedicated to optimally deploy sensor nodes for increasing their total sensing area [13] [14] [15]. A global kinematic representation of a network of connected sensors R = {R1 , R2 , ..., Rn } using only localization information between one-hop neighbors is suggested in [13]. This relationship between the various nodes is key to sharing meaningful information between them. Figure 1 shows the general architecture of a sensor node. The target perception module is responsible for detecting and classifying the various targets in the active field of view (FOV) of the sensor and performing temporal consolidation of the detected targets over multiple frames of detection. Moving target detection and classification is known to be a difficult research problem [16]. Many approaches such as active background substraction [17] [18] and and temporal differentiation have been suggested for detecting and classifying various types of moving targets including single humans and human groups to vehicles and wildlife [2] [18]. The next problem would be to classify and associate the various detected image blobs to discernable targets and maintain their temporal tracks in order to pervasively track them. Various approaches such as extended Kalman filtering, pheromone routing and bayesian belief nets have been suggested for maintaining the track of the carious targets [19].

Fig. 1.

Architecture of Sensor Node.

The individual sensor nodes maintain information regarding the observations of their neighboring nodes and broadcast (within their locality) their own observations. Based on the combined observations, each node develops a list of targets being actively tracked and the status of its peer nodes and stores this information in the targets table and the sensor nodes table, respectively. In the targets table, the native as well as observed characteristics of the target objects, observed by the respective sensors are stored. The targets table also stores information indicating the node that sensed these characteristics. Nodes also store peer information, such as location, active FOV and total capable FOV of the peer. The individual target observations are maintained in a targets table which is used to store information about multiple targets being tracked in the network by various nodes. The sensor node table stores information about the various nodes in the current neighborhood. Based on the information available in the two tables, the target location module makes a decision about the target location to track. In this research we focus on the problem of optimally deploying multiple sensors to maximize the ability to monitor the target location. Also given a moving target, the proposed approach can predict the motion of the target and thus dynamically allocate an optimal switching strategy among the sensors. III. DYNAMIC S YSTEMS A controlled dynamical system is a system Σ = [X , Γ, U, φ] where, • X is an arbitrary topological space called the state space of Σ; • Γ is the time set, which is a transition semigroup with identity; • U is a nonempty set called the control-value space of Σ;and • The map φ : X × Γ × U 7→ X is a continuous function satisfying the identity and semigroup properties [20]. The dynamical system can also be denoted by the system Σ = [X , Γ, U, f ] where the transition function f is the generator of the extended transition function φ [20]. A discrete-time dynamic system is a dynamic system Σ for which Γ = Z where, Z is the set of integers. The discretetime dynamic system, Σ is finite dimensional if both X and U are finite dimensional and the dimension of the system Σ is the dimension of X . A system Σ is complete if every input is admissible for every state.

A. Optimal Control and Dynamic Programming Consider a finite dimensional discrete-time dynamic system Σ and function b : X × Γ × U 7→ R+ that takes on non-negative real values. We can now define a trajectory cost function as: B(τ, σ, x, ω) =

τ −1 X

b(i, ξ(i), ω(i))

(1)

i=σ

where, ω is a sequence of inputs and ξ = ϕ(x, ω) is a sequence of states of the dynamical system given the initial state x and a sequence of control inputs ω. The optimization problem can be stated as: Given a discrete-time finite dimensional dynamic system Σ, a trajectory cost function B and a pair of times σ < τ , and an initial state x(σ) find a sequence of control inputs ω admissible for state x(σ) which minimizes B. More generalized problems can be introduced which require the total cost to be a function of the final state or by adding constraints to the final state etc. In order to find a control input sequence ω that minimizes the trajectory cost function B, we could list all possible control input sequences ω = uσ , uσ+1 , ...uτ which are elements of U and compute the complete cost of all the trajectories generated by all ω ∈ U [σ,τ ) . This could indeed entail a prohibitively large computation cost. Alternately we could use the Dynamic Programming method and inductively construct the Bellman function V and the optimal control input law K, backwards in time i.e, from τ towards σ as: V : [σ, τ ] × X 7→ R+

(2)

K : [σ, τ ) × X 7→ U

(3)

The Bellman function V (s, x) should satisfy for any s ∈ [σ, τ ], and each x ∈ X , V (s, x) = min B(τ, σ, x, ω) ω

(4)

and the optimal control input satisfies the condition ξ(j + 1) = φ(ξ(j), j, K(j, ξ(j))), j = s, s + 1, ..., τ ξ(s) = x

(5)

for each s ∈ [σ, τ ) and each x ∈ X . The computation effort required to solve this problem will be significantly lower than tabulating all the control input sequences. However storage requirements for this procedure will be significantly large. IV. S WITCHED V IDEO F EEDBACK Video feedback is an essential component of the surveillance system. The task of capturing, transmitting and displaying the live video stream from the various sensors to the requesting clients is handled by the video subsystem. Since multiple cameras are deployed to track the identified targets, multiple, concurrent feedback video streams may be required for monitoring the target. These sensors initiating these streams will be changing from time to time as the

target moves out of range of the current sensors tracking it. However, providing multiple unnecessary (unrelated to the task) video feedback streams often causes loss of attention span of the operator and makes it hard to keep track of the various activities over the cameras. Hence only video streams from relevant sensors should be presented to the operator on a per-activity basis. This is done through automatic or manual switching of the camera streams that are presented to the operator. However, the constant switching of the video stream can be quite disruptive for human cognizance of the remote environment and adversely affect decision making. Hence the system should try to reduce the number of video feed switches required for the same tracking task. The video feedback switch is based on the capability of the tracking cameras to visually resolve the target and discern it features i.e., the resolution of the target sustained at the cameras. This switching is directly influenced by the locations of the cameras in the scene. In the case of manually deployed camera configurations, the system designer must take into consideration the adverse effects of video switching and must decide the camera locations in order to minimize the switches while continuously tracking the target. In this section we propose a metric as a performance measure of the video feedback switching algorithm based on the configuration of the cameras. A. Modeling a Switched Video Feedback System as a Dynamic System The configuration of the cameras C can be defined as the collection of all the parameters of the cameras involved. That is: C = {C1 , C2 , . . . , CN }

(6)

where, Ci ∈ Rp is a vector containing all the intrinsic and extrinsic parameters of the camera. The words “layout” and “placement” will also be alternatively used with camera configuration configuration but they will reflect the internal parameters of the camera too. Consider a monitored region as a domain E ⊂ Rn under surveillance. A finite number (N ) of cameras are distributed in this region and are involved in the surveillance task of maintaining visibility of a target as it moves within the monitored region. The monitored region E can be discretized on a finite dimensional grid G = {Vi |Vi ∈ Rn , i=1,..,g} which consists of finite number, g, of vertices Vi . Note that grid G can be generated from the domain E using various approximate cell decomposition methods where the cells have a pre-defined shape and size in order to achieve a certain resolution. Consider any continuous time target trajectory in two dimensions kc (τ ) : Γc 7→ R2 . For every τ ∈ Γc , kc (τ ) provides as an output the location of the target. This continuous time trajectory can be approximated as a discrete-time finite dimensional map on a grid as k(t) : Γ 7→ UG

(7)

where, Γ = Z and UG represents the finite dimensional space of the locations of the vertices Vi of the grid G. The dynamic sensor switching problem can be modeled as a discrete-time, finite-dimensional dynamic system. The state space of this system is the finite set of camera states defined as: Q = {qi |qi = (i, Ci ), i ∈ (1, ..., N ), Ci ∈ C} where N is the total number of cameras and C is space of configurations of all cameras. Using the above definitions we can now describe the model of a dynamic sensor switched system as a finite-dimensional discrete-time dynamic system V as: V = [Q, Γ, U, φ] φ : Q × Γ × UG × U1 7→ Q

(8)

where U := UG ×U1 . Here, u ∈ U1 is the control inputs while k(t) ∈ UG , t ∈ Γ, the sequence of grid locations in G that the target visits, is the reference inputs to the system. The reference inputs evolve according to a predefined function of time t and depend only on the target motion which may not be known apriori. The control inputs are used to control the trajectory of the system to ensure the target is covered and can also be used for optimization of the various metrics being used. The system V may not be complete as all the transitions between various cameras may not be enabled at all times based on the visibility constraints of the various states for the target locations provided. B. Example Switched Video Feedback Algorithm - Best Resolution Consider a set of N cameras Q = {(i, Ci )|i = 1, ..., N Ci ∈ C}. The sensor selection strategy is dependent on the resolution sustained by the target at each of the cameras and the one with the best resolution will be selected to provide feedback to the human operator. This sensor selection strategy takes into account only on the best resolution sustained at all the cameras which implies that the control input u, is based solely on the camera configurations {Ci }, and the target locations k(t). The strategy does not depend on the current camera tracking the target i.e., sensor switching costs are not taken into account. In order to implement this strategy in the dynamic systems model, we define a function Rres : UG ×Γ 7→ U1 which maps the locations of the target at various time instances to the space of the control input variables u based on the resolution sustained by the target at each camera. Notice that R is not a function of the current state of the dynamic system i.e., the current camera. This implies that the system does not have memory. The above algorithm can be implemented as follows. Let qbest represent the camera that has the least distance to the target, i.e., under the assumption of homogenous camera sensors, qbest can view the target with the best resolution. For a given target location k(t) and time t, qbest can be written as:

qbest (t) = qi | min dist(Ci (t), k(t)), visible(q(t), k(t)) == 1 i=1toN

(9)

Using the definition of qbest the next sensor to switch to i.e., q(t + 1) can be written as: q(t + 1) = qbest (k(t + 1)). C. Example Switched Video Feedback Algorithm - Persistent Camera In the best resolution based video feedback algorithm, the current camera tracking the target was not taken into account in the input to the dynamic system. Switching cameras in a surveillance network can lead to the disorientation of the human operator and is also generally associated with a switching time delay. Hence we should minimize the number of switches when tracking a target. However, the target should sustain a certain resolution at the tracking camera for recognition and classification purposes. In order to implement this strategy, define a function Rper : Q × UG × Γ 7→ U1 which maps the current tracking camera and the locations of the target at various time instances to the space of the control input variables u. This implies that the state of the system i.e. q(t) is used along with the resolution in order calculate the control input to the dynamic system. The above algorithm can be implemented using the definition of qbest from previous section. The sensor selected at the next time step q(t + 1) can be written as: q(t + 1) = 8 min(dist(Cbest (t + 1), k(t + 1) + ǫ, dist(q(t), k(t + 1))), > < if visible(q(t), k(t + 1)) = 1 (10) qbest (t + 1) > : if visible(q(t), k(t + 1)) = 0

V. A SSESSMENT M ETRIC FOR C AMERA D EPLOYMENT USING S WITCHED V IDEO F EEDBACK In Section IV, a switched video feedback system for realtime video feedback to the human operator to assist in the monitoring and decision making task has been described. However, the constant switching of the video stream can be quite disruptive for human cognizance of the remote environment and adversely affect decision making. Hence the system should try to reduce the number of video feed switches required for the same tracking task. The video feedback switch is based on the capability of the tracking cameras to visually resolve the target and discern it features i.e., the resolution of the target sustained at the cameras. This switching is directly influenced by the locations of the cameras in the scene. In the case of manually deployed camera configurations, the system designer must take into consideration the adverse effects of video switching and must decide the camera locations in order to minimize the switches while continuously tracking the target. In this section we propose a metric as a performance measure of the video feedback switching algorithm based on the configuration of the cameras.

The switching metric M ∈ R+ maps the configuration space of the video algorithm [21] (which includes the previous feedback camera) and a scalar potential field over the configuration space of the video algorithm to a positive scalar. Given the configuration of the cameras C, the video algorithm can be thought of as a mapping from the combined space of current feedback camera and target location to the feedback camera space. The scalar potential ϕ(p) ∈ R+ , p ∈ E provides a relative importance to the current target location and can be chosen to bias the importance of the target locations. M=

Z

∇V(f, C) ϕ(p) df

(11)

F

where, f ∈ F ⊂ Q × E and p ∈ E. ∇V ∈ [0, 1] is the spatial derivative of the output of the video feedback algorithm and represents at which points in the video feedback configuration space F the output of the video feedback algorithm changes. It is akin to edge segmentation in image processing [22]. Integration of these points over the configuration space represents the number of switching surfaces present in the complete space E. A lower number of switching surfaces will imply that the feedback camera location does not switch a lot with the free motion of the target. The term ϕ(p) is just a scalar potential which reflects the importance of the particular point p ∈ E and can be used to bias the surveillance space E. The switching metric M in conjunction with other metrics such as target resolution at various locations in E can be computed for a large number of randomly generated configurations and a sub-optimal solution can be derived for the placement of the cameras. Owing the large configuration space of the deployment algorithm, various evolutionary computation and optimization schemes can be utilized in order to minimize the total placement metric. VI. O PTIMIZED TARGET T RACKING USING S WITCHED V IDEO F EEDBACK Given the scenario having a large number of static cameras distributed in an environment with significant overlap of their viewing regions, the problem is to identify the minimum number of cameras required to track a given target trajectory. We propose to use dynamic programming as an optimal control strategy in order to minimize the total number of cameras required to view the target with adequate resolution. In order to minimize the switching cost and maintain the resolution of the target, we construct a graph based on the camera location and the visibility of the targets to the cameras, given the predicted motion of the target on the grid G. Figure 2 shows a part of the graph constructed for each grid point in G that the target traverses. All the cameras that can observe a particular grid point are listed as the possible nodes to switch to. An extra node column is added to accommodate the resolution metric as shown in the Figure 2. The switching cost between the different nodes is tabulated as ’1’ if the tracking sensor is switched at this grid point and as ’0’ if the same camera is retained.

The entire graph is constructed for all the points sequentially visited by the target on the grid G. Note that this procedure does not mandate any contiguity requirements on the path traversed by the target in order to construct the graph. Once the entire graph is constructed, dynamic programming can be used in order to find the optimal switching sequence. In order to construct the entire graph, it was assumed that the complete motion of the target on the grid was known. However, it is a very strict requirement to know the entire path of the target. Given the current and past observations of the target, the future trajectory of the target can be predicted. Based on this finite time prediction, the graph can be constructed for the predicted trajectory and the camera switching sequence can be calculated for the predicted trajectory. As the target location evolves, the future trajectory of the target can be predicted and a camera switching sequence can be computed based on the prediction. This procedure enables us to extend the dynamic programming algorithm for computing the switched camera sequence to a case where the target trajectory is not know in advance but can be predicted (for a finite look ahead time) using the current and past observations of the target location. Using this trajectory prediction based algorithm a sub-optimal solution to the camera selection problem can be computed. A. Sensor Switching Optimization Implemented in Dynamic Programming The procedure for constructing the graph is shown in Algorithm 1. After the graph is constructed, we use the dynamic programming method, a modified version of Dijkstra algorithm to find the optimal switching strategy (Algorithm 2). In the graph generating stage, we enumerate all the cameras that can see the current grid point with an acceptable resolution. The resolution metric is marked over each camera as a weight on the vertex. However, in order to run the dynamic programming method, it is desirable to have all the weight on the edges. Thus each camera is represented by two vertexes in the graph, with an edge connecting them that has a weight equaling to the resolution metric. We also enumerate all the cameras that can see the grid points specified by the prediction vector. Between two prediction grid points we connect the cameras based on their switching metric. If the two cameras are the same, the switching metric is set to 0, otherwise a non zero switching metric is set on the edge. In Figure 2, all nonzero metrics are set to 1 for the sake of simplicity. VII. S IMULATION S TUDIES A. Sensor Deployment Consider a switched surveillance scenario with two cameras as shown in Figure 3. Camera 1 is kept static at location (0, 0) while the location of camera 2 is changed from 0-100 on both the X and Y axes. Both the cameras are directed at 45 degrees and have a viewing cone of 90 degrees. Figures 4, 5 and 6 depict the change in the three metrics namely simple

c1

Resolution Metric

c1'

0

c1

Resolution Metric

c1' 100

1 c2

Resolution Metric

c2'

c2 1

Resolution Metric

c2'

90

...

80

...

... 70

... Resolution Metric

ck’

ck

Resolution Metric

ck’ 60

Fig. 2.

Cam2

50

Y

ck

Generating Graph for the Camera Switching Strategy

40

30

20

10 Cam1 0

0

10

20

30

Fig. 3.

40

50 X

60

70

80

90

100

Two camera setup

5000

4000

3000 Metric

Algorithm 1 [G, W ] = GraphGen(pv ) 1: for all cam in CameraSet do 2: if pv (1) is visible by cam then 3: Add two nodes representing cam to G 4: Connect the two nodes with the resolution metric of cam in W 5: end if 6: end for 7: pprev = pv (1) 8: Delete pv (1) from pv 9: for all p in pv do 10: for all cam in CameraSet do 11: if p is visible by cam then 12: Add two nodes representing cam to G 13: Connect the two nodes with the resolution metric of cam 14: Connecting the nodes generated by p and pprev and set the switching cost to W if they are for different cameras. 15: end if 16: end for 17: end for

2000

1000

0 0 50 40

60

80

100

20

0

Y

Fig. 4.

The Metric Plot When Simple Distance Is Used

800

600 Metric

distance based camera switching metric, persistent distance metric and the resolution metric, when the location of camera 2 is moved all over the grid. Figure 4 shows the result of simple distance based metric simulation. Simple Distance switching is based on the following. If the target is being monitored by the current camera, and the distance between the target and that camera is greater than the distance between the target and some other camera, then the system will switch which camera is tracking the target, thus creating a cost. The total of the costs at each point within the grid is the total cost for a given camera configuration. Persistence distance (Figure 5) is nearly exactly the same as simple distance, except that there is threshold related to when you switch, i.e. if the distance between the current camera and the next best camera is above some threshold, then you switch and assign a cost. The main purpose of this persistence method, is to avoid a constant switching, which could lead to user disorientation. This will require that the switch will provide the user with not only a better view, but a significantly better view based on you persistence threshold. The resolution metric (Figure 6), is based on the concept that the closer the sensor is to the target the higher the

100

X

400

200

0 0

100 50 20

40

60

80

100

0

X

Fig. 5.

The Metric Plot When Persistence Is Used

Y

80 75 Metric

Algorithm 2 p = Dijkstra(G, W, s, t) 1: for all v in V (G) do 2: d(v) ← +∞ {Initialize the distance vector} 3: p(v) ← undefined {Initialize the previous node vector} 4: end for 5: d(s) ← 0 {The source has 0 distance to itself} 6: C ← ∅ {Initialize the checked node set} 7: Q ← V (G) {Copy the vertex set into a working set} 8: while Q 6= ∅ do 9: u ← ExtractM in(Q) {Extract the vertex with minimum value in d} 10: C ← C ∪ {u} 11: if u = t then 12: return 13: end if 14: for all edge (u, v) do 15: if d(u) + W (u, v) < d(v) then 16: d(v) ← d(u) + W (u, v) 17: p(v) ← u 18: end if 19: end for 20: end while 21: if !t ∈ C then 22: p←∅ 23: end if

70 65 60 0

55 100

20 40

80

60

60 40

80

20

100

0

Y

X

Fig. 6.

The Resolution Metric Plot

1

10

2

9 8

3

7

resolution is for that sensor. So for the resolution metric, the cost is the distance between the grid point and the best sensor to view that grid point. Then the total cost of a given sensor configuration is the sum of the cost at each grid point within the viewing area. We have developed a series of metrics and used a Monte Carlo simulation method to find the optimal location of the cameras based on this metric. The simulation results of a 10 camera placement problem on a 100x100 grid is presented in this section. The metric used to calculate the cost of the camera placement is a combination of the resolution based switching metric and a best resolution metric. The configuration of the cameras was varied and over 100,000 sets of random camera locations were generated for the Monte-carlo simulation. The Monte-carlo simultions were conducted in 10 iterations with each iteration simulating 10,000 sets of random locations. The convergence of the Monte-carlo scheme was verified by noting that the minimum cost of all the 10 batches was nearly identical. The parameters of the cameras that were varied were the X and Y position and the pointing angle θ. The viewing cone angle of the cameras was kept constant. Figure 7 shows the optimal placement of the cameras. Based on the defined switches, we run the Monte Carlo simulation a number of iterations to obtain the best camera placement.

6 5

4

Fig. 7. Optimal camera placement for 10 cameras using Monte-carlo based optimization

B. Optimized Target Tracking using Switched Video Feedback In the simulation scenario, we deployed a set of cameras in a 100x100 grid area. A trajectory of sine wave is simulated and the switching of the cameras are captured along with the switching cost and resolution cost. In the simulation 11 cameras are placed as indicated by circles in Figure 8. The line under the camera circle indicates the facing of the camera. In the first simulation scenario, it is assumed that the whole trajectory is know at the starting point, thus a better switching strategy is given with only six switches. The number along the sine trajectory indicates that camera is switched to be the active camera at the point (Figure 8). In another more realistic scenario, we assume that the trajectory is only know for the next two grid points, and the third grid point is predicated through linear extrapolation. It is shown that more switches are needed in this case (Figure 9). However we expect less switches are needed when a more sophisticated

R EFERENCES

100 90 80 1

70

Y

60

2

6

3

7

50

4

8 2

5

9

10

11

8 11 10

40 6

30

9

7

20 10 0

0

20

40

60

80

100

X

Fig. 8.

Camera Switching for the Whole Path

100 90 80 1

70

Y

60

2

6

3

7

50

4

8 2

5

9

10

8

11 5 10 11

10

40 6

30

9

7

20 10 0

0

20

40

60

80

100

X

Fig. 9.

Camera Switching with Trajectory Prediction

prediction method is used. The cost associated with Figure 8 and Figure 9 are 1772.58 and 2094.47 respectively. We see a 18% improvement of the costs if we can see the whole path. VIII. C ONCLUSION In this paper we examined the problem of optimally placing a number of cameras in a given field to facilitate a surveillance task. We develop a series of metrics and use Monte Carlo method to obtain optimal camera placement strategies. Another problem we studied is related to camera switching strategy given a full or part of a trajectory path of a moving target. We develop a dynamic programming based approach to generate the switching strategy and optimize both the switching and resolution metrics. The simulation results show our method works well for surveillance tasks and scale well for large deployments.

[1] C. Regazzoni, V. Ramesh, and G. E. Foresti, “Special issue on third generation surveillance systems,” Proceedings of the IEEE, vol. 89, Oct. 2001. [2] R.T.Collins, A.J.Lipton, H.Fujiyoshi, and T.Kanade, “Algorithms for cooperative multisensor surveillance,” Proceedings of the IEEE, vol. 89, pp. 1456–1477, 2001. [3] R. T. Collins, A. J. Lipton, and T. Kanade, “Introduction to the special section on video surveillance,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 22, p. 745746, August 2000. [4] A. J. Courtemanche and A. Ceranowicz, “Modsaf development status,” in Proceeding of the 5th Conference on Computer Generated Forces and Behavioral Representation, Orlando, FL, May 1995, p. 313. [5] K. Cauble, “The openscene modstealth,” in Proceedings of the Simulation Interoperability Workshop, Paper 97S-SIW-008 1997. [6] C. S. Regazzoni, C. Sacchi, and C. Dambra, “Remote cable-based video surveillance applications: the avs-rio project,” in Proceedings of ICIAP99, Venice , Italy, September 2729 1999, p. 12141215. [7] P. Rybski, S. Stoeter, M. Gini, D. Hougen, and N. Papanikolopoulos, “Performance of a distributed robotic system using shared communications channels,” Robotics and Automation, IEEE Transactions on, vol. 18, no. 5, pp. 713 – 727, October 2002. [8] ITU-H.261, “Video codec for audio-visual services at 64-1920 kbit/s,” ITU-T Recommendation H.261, 1993. [9] ITU-H.263, “Video coding for low bit rate communication,” ITU-T Recommendation H.263, March 1996. [10] I.-T. R. H. . I. 11496-10, “Advanced video coding,” Final Committee Draft, Document JVT-E022, September 2002. [11] H. Schulzrinne, S. Casner, R. Frederick, and V. Jacobson, “Rtp: A transport protocol for real-time applications,” RFC1889, July 1997. [12] L. Berc, W. Fenner, R. Frederick, S. McCanne, and P. Stewart, “Rtp payload format for jpeg-compressed video,” RFC2435, October 1998. [13] J. Tan, N. Xi, W. Sheng, and J. Xiao, “Modeling multiple robot systems for area coverage and cooperation,” in International Cconference on Robotics and Automation, 2004. [14] S. Stoeter, P. Rybski, M. Erickson, M. Gini, D. Hougen, D. Krantz, N. Papanikolopoulos, and M. Wyman, “A robot team for exploration and surveillance: Design and architecture,” in Proceedings of the International Conference on Intelligent Autonomous Systems, 2000, pp. 767–774. [15] J. Cortes, S. Martinez, T. Karatas, and F. Bullo, “Coverage control for mobile sensing networks,” in International Conference on Robotics and Automation, 2003. [16] K. Toyoma, J. Krumm, B. Brumitt, and B. Meyers, “Wallflower: Principles and practice of background maintainence,” in International Conference on Computer Vision, 1999, pp. 255–261. [17] T.Matsuyama and N.Ukita, “Real-time multitarget tracking by a cooperative distributed vision system,” Proceedings of the IEEE, vol. 90, no. 7, pp. 1136–1150, 2002. [18] T. Boult, R. J. Micheals, X. Gao, and M. Eckmann, “Into the woods: Visual surveillance of noncooperative and camouflaged targets in complex outdoor settings,” Proceedings of the IEEE, vol. 89, Oct. 2001. [19] R. Brooks, C. Griffin, and D. Friedlander, “Self-organized distributed sensor network entity tracking,” International Journal of High Performance Computing, vol. 16, no. 2, 2002. [20] E. Sontag, Mathematical Control Theory. Springer-Verlag, 1990. [21] S. M. LaValle and S. A. Hutchinson, “Optimal motion planning for multiple robots having independent goals,” IEEE Transactions on Robotics and Automation, vol. 14, no. 6, pp. 912–925, December 1998. [22] R. Pal, Nikhil and S. K. Pal, “A review on image segmentation techniques,” Pattern Recognition, vol. 26, no. 9, pp. 1277–1294, 1993.