Multiple People Activity Recognition Using MHT over DBN Andrei Tolstikov, Clifton Phua, Jit Biswas, and Weimin Huang Institute for Infocomm Research, Singapore {atolstikov,cwcphua,biswas,wmhuang}@i2r.a-star.edu.sg

Abstract. Multiple people activity recognition system is an essential step in Ambient Assisted Living system development. A possible approach for multiple people is to take an existing system for single person activity recognition and extend it to the case of multiple people. One approach is Multiple Hypothesis Tracking (MHT) which provides capabilities of multiple people tracking and activity recognition based on the Dynamic Bayesian Network Model. The advantage of such systems is that the number of people can vary, while the disadvantage is that the activity recognition configuration cannot be done if only multiple people data is available for training. Keywords: Multiple Hypothesis Tracking, Dynamic Bayesian Network, Activity Recognition.

1

Introduction

To become practical, Ambient Assisted Living systems have to be able to handle presence of multiple people in the environment. However, so far most research on activity recognition in smart environments is focused on recognition of actions of a single person. In most cases, such systems, being configured for single person, are simply not able to work correctly in the presence on more people. One example of such activity recognition systems are systems based on Dynamic Bayesian Network (DBN) [5], [3], [7]. A possible approach is to take a well-developed method of activity recognition for a single person and try to extend or accommodate it for the case of multiple people. In fact, this was the approach taken by the target tracking community where single object tracking was extended to multiple targets by using data association methods, which assigns only a part of sensor data to a particular target. One of the most powerful data association methods is Multiple Hypothesis Tracking (MHT) [1]. This paper describes our early attempt to use a combination of DBN and MHT for multiple-people activity recognition. The contribution of the paper is a track generation method which uses only few sensors in deciding whether a new track must be generated and addresses the problem of sensor readings being shared between different people, which may be a common problem in home environment. B. Abdulrazak et al. (Eds.): ICOST 2011, LNCS 6719, pp. 313–318, 2011. c Springer-Verlag Berlin Heidelberg 2011 

314

2

A. Tolstikov et al.

Related Work

Dynamic Bayesian Networks and its special cases such as Hidden Markov Model (HMM) models are popular in computer vision, mostly to exploit temporal information. For learning figure dynamics, Pavlovic et al [5] proposes DBN-based switching linear dynamic system (SLDS) model with an approximate Viterbi inference technique. To interpret group activities, Gong et al [2] uses Dynamically Multi-Linked Hidden Markov Model (DML-HMM) which is built using Schwarz’s Bayesian Information Criterion based factorization. Human activity recognition is done using DBN for long-duration activities [3] with the time frame of hours or short-duration activities [7] with the time frame of seconds. Multiple Hypothesis Tracking was originally developed for radar target tracking [1], however, it was recently used for people tracking as well mainly using analog sensors such as laser range finders or video cameras. A spatial affordance map is a non-homogeneous spatial Poisson process, in order to derive expressions for life-long Bayesian learning. This map computes refined probability distributions over hypotheses in a multi-hypothesis tracker for motion prediction (Luber et al, [4]). Observe-and-explain is able to compute multiple possibilities of tracking with a sufficient amount of observations, even with severe occlusions (Ryoo and Aggarwal, [6]). Combination of DBN as a low-level filter with MHT for deciding on data association is used by Zaidel and Kr¨ ose in [9] for video-based object identity tracking. However, unlike human activity, the actual state of the object identity does not change, therefore building a track of changes in the state is not required. The closest method to ours is proposed by Wilson and Atkeson [8]. The authors described a method of multiple person location and activity tracking which used Discrete Bayes Filter, which again is a simple case of DBN, for single person tracking, and solved data association problem by Rao-Blackwellised particle filter. They used simple binary sensors and achieved activity recognition accuracy is about 98% for 2-people and 85% for 3-people.

3

Method Description

MHT creates parallel instances of possible variation of events and allows postponing the decision on how system state changes and which sensor readings should be used for which target until sufficient evidence is accumulated. Hypothesis is a collection of objects being tracked O, each Oi having a track - a sequence of states (Xit , Xit−1 , Xit−2 , ...). In the case of target tracking [1], each track may be updated when a radar has an observation, which is a point in space where possible target is detected. New hypothesis is generated whenever there is ambiguity on how track should be updated. An observation may only be used for a target state update if it lies within a gate, an error tolerance area around estimated target location. Implicitly, the gate assumes certain probability for these error bounds. In MHT, each observation is assigned only to one target, existing or new. The main steps for tracking with MHT are:

Multiple People Activity Recognition Using MHT over DBN

315

– – – –

Acquire sensor data, form observations Generate hypotheses, if necessary For each hypothesis, assign observations to targets For each target of each hypothesis, update target state using assigned observations – Evaluate and prune hypotheses For multiple target tracking in airspace, the Kalman Filter is used for changing the state of a single object and MHT decides which sensor data should be used. For multiple people tracking at home, DBN is a filter for updating single person state and MHT provides a set of sensor used by each DBN. However, there are three important differences. First, the state of the activity is an abstract discrete state and the gate in this space is different from continuous space. Second, the sensor data is collected from many discrete sensors and it is not obvious how observations should be formed. And third, in home environment, many sensors may provide shared readings - single sensor readings may be generated and assigned to more than one person. 3.1

Model

Human activity can be presented by a discrete variable with states of this variable describing mutually disjoint combination of actions. Sensing information in home environment is often pre-processed and presented by discrete sensor feature values. The Dynamic Bayesian Network model for recognition of activity is therefore a discrete state model and is essentially a Joint Probability Distribution (JPD) of two time steps of variables P r(X t−1 , Vt−1 , St−1 , X t , Vt , St ), where X is the state or activity we are trying to estimate, set S are sensors and set V are intermediate variables. Dependencies between time slices for X and V change the JPD P r(X t , St ) at the previous time step. JPD is used to estimate current state of X using measurements Z, that is P r(X|St = Zt ). Part of the DBN are the state prediction model P r(X t+1 |X t , St = Zt )

(1)

P r(St = Zt |X t−1 , St−1 = Zt−1 )

(2)

and the sensor gating model

The DBN model allows any subset of set S to be used as evidence for any time step. And since some of the sensor data in environment with multiple people is due to actions of other people, we have to use only partial sensor set provided by a data association solution. However, since the model is only for one person, the dataset used for DBN configuration must contain sensor readings only from single person. 3.2

Data Association Method

In general, for human activity recognition, many sensors readings are important to reach confident estimation. Therefore many sensors need to be assigned to each

316

A. Tolstikov et al.

target at each time step. However, each sensor combination, may potentially give different distribution of P r(X t ) and, therefore, strictly speaking, different states and gates of the target at the next time moment. However, since there may be exponential number of sensor combinations it is not practical to test all of them. Instead, since we have a discrete state space of the target space given by set of possible values X, and we focus on the most likely state according to current estimation. As an observation, we select only one sensor combination for each possible neighboring state of the current most likely state. In our approach we define three classes of sensor values: 1. Sensors values which do not need data association, since they provide person ID or bound to specific person. Examples of such sensors are RFID readers or wearable sensors such as wearable accelerometers. The presence of this kind of sensors is required since we want to keep tracking ID of the person doing activities. Information from these kind of sensors is also useful for hypothesis pruning. 2. Sensor values which can provide information related to only one person. Examples are chair occupancy sensor or small door switch. Note, that in many cases the assumption of a single-person may be valid only for certain values and certain features of a sensor. For example, “person sitting” state for a chair is valid only to one person but “person not sitting” is valid for everyone. Similar, the state “door open” can be related to many people, but “door was just opened” only to one. 3. Sensor values which often shared, for example passive infrared sensors; or shared values mentioned above. We only need to solve data association problem for classes 2 and 3. In our method we focus on the tracks, that is sequence of states from each person, rather than hypotheses as collection of tracks. We define track as a sequence of states (Xit , Xit−1 , Xit−2 , ...) produced by a given DBN model and sensor input (St , St−1 , St−2 , ...) by any valid sensor assignment (Sti , St−1 , St−2 , ...). The asi i signment is valid if for a given hypothesis the sensor assignment to targets of the hypothesis all values of the class 1 above are always assigned, values of the class 2 are assigned to at most one target and sensor combination for each target is fea= Zt−1 )>0 sible, i.e. the probability in the Equation 2 P r(Sti = Zti |Xit−1 , St−1 i i For each time step of the DBN tracking a single person, we do the following actions: 1. For each track i 2. Using prediction from time moment t − 1 (Eq.1) find the most likely t current state Xmax 3. Assign all sensors of class 1 4. If assignment is not feasible, terminate track 5. Assign all the sensors of class 3 and valid sensors of class 2 for which t , ) are above some small (i.e. 0.05) gating threshold probabilities P r(Sjt |Xmax α (for both classes). Total sensor set for steps 3 and 5 is now Smax t 6. Find neighbor states of Xmax , P r(Xkt+1 ) > β for Xkt+1 ∈ Xt+1 β

Multiple People Activity Recognition Using MHT over DBN

7. 8. 9. 10.

3.3

317

For each Xkt+1 ∈ Xt+1 β Assign sensors as in step 3 and 5 If for Xk Sk = Smax , generate new track t , If there are other tracks for the same target having the same Xmax merge tracks by leaving the most probable Evaluation

We evaluated the proposed method using simple scenario of two people using shared space and doing some basic activities such as answering phone, using cupboard, sitting on the chair and eating. The ID of the person was established by a wearable accelerometer to evaluate the tracks and by the RFID at the entrance to detect the presence. The DBN shown in the figure was obtained using collected data from a single person present in the environment. Although the scenario is simple and therefore DBN model has a good recognition performance, the addition of MHT allows resolution of sensor conflicts.

Fig. 1. Structure of the DBN used for both subjects and a simple map of the installed system (top). Results for simple two-person scenario - shaded areas mark the moments when additional hypotheses have to be generated (bottom)

4

Conclusion

We provided an outline of multiple people activity recognition in home environment. The method uses Dynamic Bayesian Network model of a single person’s

318

A. Tolstikov et al.

activities which is extended to multiple people case by using Multiple Hypothesis Tracking for data association. The early experimental results show that this method may be successful in tracking activities of multiple people in the system with sensors that provides shared readings.

References 1. Blackman, S.S.: Multiple Hypothesis Tracking For Multiple Target Tracking. IEEE Aerospace and Electronic Systems Magazine 19, 5–18 (2004) 2. Gong, S., Xiang, T.: Recognition of group activities using dynamic probabilistic networks. In: Proceedings of International Conference on Computer Vision (ICCV 2003), p. 742. IEEE Computer Society, Washington, DC, USA (2003) 3. van Kasteren, T., Noulas, A., Englebienne, G., Kr¨ ose, B.: Accurate activity recognition in a home setting. In: Proceedings of the 10th International Conference on Ubiquitous Computing, UbiComp 2008, pp. 1–9. ACM, New York (2008), http://doi.acm.org/10.1145/1409635.1409637 4. Luber, M., Tipaldi, G.D., Arras, K.O.: Spatially Grounded Multi-hypothesis Tracking of People. In: Proceedings of ICRA 2009 Workshop on People Detection and Tracking (2009) 5. Pavlovic, V., Rehg, J.M., Cham, T.J., Murphy, K.P.: A dynamic bayesian network approach to figure tracking using learned dynamic models. In: Proceedings of International Conference on Computer Vision (ICCV 1999), pp. 94–101 (1999) 6. Ryoo, M.S., Aggarwal, J.K.: Observe-and-explain: A new approach for multiple hypotheses tracking of humans and objects. In: IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 1–8 (2008) 7. Tolstikov, A., Biswas, J., Tham, C.-K., Yap, P.: Eating activity primitives detection a step towards adl recognition. In: Proceedings of the 10th International Conference on e-Health Networking, Applications and Services, Healthcom 2008 (2008) 8. Wilson, D., Atkeson, C.: Simultaneous tracking and activity recognition (STAR) using many anonymous, binary sensors. Pervasive Computing, 62–79 (2005) 9. Zajdel, W., Kr¨ ose, B.: Bayesian network for multiple hypothesis tracking. In: Proceedings of the 14th Dutch-Belgian Artificial Intelligence Conference, BNAIC 2002, pp. 379–386 (2002)

LNCS 6719 - Multiple People Activity Recognition ... - Springer Link

Keywords: Multiple Hypothesis Tracking, Dynamic Bayesian Network, .... shared space and doing some basic activities such as answering phone, using.

203KB Sizes 2 Downloads 266 Views

Recommend Documents

multiple people activity recognition using simple sensors
Depending on the appli- cation, good activity recognition requires the careful ... sensor networks, and data mining. Its key application ... in smart homes, and also the reporting of good results by some ..... WEKA data mining software: An update.

multiple people activity recognition using simple sensors
the accuracy of erroneous-plan recognition system for. Activities of Daily Living. In Proceedings of the 12th. IEEE International Conference on e-Health Network-.

LNCS 6361 - Automatic Segmentation and ... - Springer Link
School of Eng. and Computer Science, Hebrew University of Jerusalem, Israel. 2 ... OPG boundary surface distance error of 0.73mm and mean volume over- ... components classification methods are based on learning the grey-level range.

LNCS 6621 - GP-Based Electricity Price Forecasting - Springer Link
learning set used in [14,9] at the beginning of the simulation and then we leave ..... 1 hour on a single core notebook (2 GHz), with 2GB RAM; the variable ...

LNCS 6683 - On the Neutrality of Flowshop Scheduling ... - Springer Link
Scheduling problems form one of the most important class of combinatorial op- .... Illustration of the insertion neighborhood operator for the FSP. The job located.

LNCS 4261 - Image Annotations Based on Semi ... - Springer Link
MOE-Microsoft Key Laboratory of Multimedia Computing and Communication ..... of possible research include the use of captions in the World Wide Web. ... the Seventeenth International Conference on Machine Learning, 2000, 1103~1110.

LNCS 7335 - Usage Pattern-Based Prefetching: Quick ... - Springer Link
Oct 8, 2010 - The proposed scheme is implemented on both Android 2.2 and Linux kernel 2.6.29. ... solution for reducing page faults [1, 2, 3, 10]. The number ...

LNCS 4191 - Registration of Microscopic Iris Image ... - Springer Link
Casey Eye Institute, Oregon Health and Science University, USA. {xubosong .... sity variance in element m, and I is the identity matrix. This is equivalent to.

LNCS 3174 - Multi-stage Neural Networks for Channel ... - Springer Link
H.-S. Lee, D.-W. Lee, and J. Lee. In this paper, we propose a novel multi-stage algorithm to find a conflict-free frequency assignment with the minimum number of total frequencies. In the first stage, a good initial assignment is found by using a so-

LNCS 3352 - On Session Identifiers in Provably Secure ... - Springer Link
a responding server process in network service protocol architecture [23]. A ... 3. proposal of an improved 3PKD protocol with a proof of security using a.

LNCS 4731 - On the Power of Impersonation Attacks - Springer Link
security or cryptography, in particular for peep-to-peer and sensor networks [4,5]. ... entity capable of injecting messages with arbitrary content into the network.

LNCS 4325 - An Integrated Self-deployment and ... - Springer Link
The VFSD is run only by NR-nodes at the beginning of the iteration. Through the VFSD ..... This mutual effect leads to Ni's unpredictable migration itinerary. Node Ni stops moving ... An illustration of how the ZONER works. The execution of the ...

LNCS 650 - Emergence of Complexity in Financial ... - Springer Link
We start with the network of boards and directors, a complex network in finance which is also a social ... from the board of Chase Manhattan Bank. Boards of ...

LNCS 4233 - Fast Learning for Statistical Face Detection - Springer Link
Department of Computer Science and Engineering, Shanghai Jiao Tong University,. 1954 Hua Shan Road, Shanghai ... SNoW (sparse network of winnows) face detection system by Yang et al. [20] is a sparse network of linear ..... International Journal of C

LNCS 3973 - Local Volatility Function Approximation ... - Springer Link
S&P 500 call option market data to illustrate a local volatility surface estimated ... One practical solution for the volatility smile is the constant implied volatility approach .... Eq. (4) into Eq. (2) makes us to rewrite ˆσRBF (w; K, T) as ˆσ

LNCS 6621 - GP-Based Electricity Price Forecasting - Springer Link
real-world dataset by means of a number of different methods, each cal- .... one, that we call GP-baseline, in which the terminal set consists of the same variables ...

LNCS 4261 - Image Annotations Based on Semi ... - Springer Link
Keywords: image annotation, semi-supervised clustering, soft constraints, semantic distance. 1 Introduction ..... Toronto, Canada: ACM Press, 2003. 119~126P ...

LNCS 7601 - Optimal Medial Surface Generation for ... - Springer Link
parenchyma of organs, and their internal vascular system, powerful sources of ... but the ridges of the distance map have show superior power to identify medial.

LNCS 6622 - NILS: A Neutrality-Based Iterated Local ... - Springer Link
a new configuration that yields the best possible fitness value. Given that the .... The neutral degree of a given solution is the number of neutral solutions in its ...

LNCS 4258 - Privacy for Public Transportation - Springer Link
Public transportation ticketing systems must be able to handle large volumes ... achieved in which systems may be designed to permit gathering of useful business ... higher powered embedded computing devices (HPDs), such as cell phones or ... embedde

LNCS 2747 - A Completeness Property of Wilke's Tree ... - Springer Link
Turku Center for Computer Science. Lemminkäisenkatu 14 ... The syntactic tree algebra congruence relation of a tree language is defined in a natural way (see ...