Hidden Permutation Model and Location-Based Activity Recognition

Hung Bui SRI International

Dinh Phung, Svetha Venkatesh, Hai Phan Curtin University of Technology

Talk Outline 

Why model permutations?



Distribution of random permutations



Hidden Permutation Model (HPM)



How to estimate HPM parameters?



How to perform approximate inference?



Experiments with location-based activity recognition

Why Model Permutations? 

Permutations arise in many real-world problems 



Usually, there is an unknown matching that needs to be recovered    



Data association, information extraction from text, machine translation, activity recognition

Correspondence in data association Field-to-value matching in IR Word/phrase matching in machine translation A permutation is the simplest form of matching

Brute-force computation is at least O(n!)

Permutations in Activity Recognition 

Many activities require carrying out a collection of substeps, each performed just once (or a repeated a small number of times) 





Factors affecting ordering between steps:   



AAAI travel = (get_approval, book_hotel, book_air_ticket, register, prepare_slides, do_travel) Ordering of steps is an unknown permutation that needs to be recovered

Strongly ordered: A enables B; A and B follow a timetable Weakly ordered: A performed before B out of habit Unordered: A performed before B by chance

Learning these ordering constraints from data can lead to better recognition performance

Permutations and Markov Models 

Standard HMM does not enforce permutation constraints

xn = x1 ? xn = x2 ?. . .





Permutation constraints lead to awkward graphical models, since conditional independence is lost Need a more direct way of defining distribution on permutations

Distributions on Permutations 

Let Per(n) = permutations of {1,2,…,n} 



Multinomial over Per(n) requires n! parameters (Kirshner et al, ICML 2003)

Exponential Family

f : P er(n) → Rd : feature function λ ∈ Rd : natural parameters

Very general Few parameters

E.F. distribution on permutations

Pr(x | λ) = exp {f (x), λ − A(λ)} Log-partition function   A(λ) = ln 

x∈P er(n)



exp (f (x), λ)

Expensive

Exponential Family on Permutations (cont.) 

What features to use?    



Factors affecting ordering between activity steps: Strongly ordered: A enables B; A and B follow a timetable Weakly ordered: A performed before B out of habit Unordered: A performed before B by chance

Does step i appear before step j in x? −1 fij (x) = I{x−1 < x i j }



With no loss of information, keep only fij (x) for i < j

d=

n(n−1) 2

features (also num. parameters)

Exponential Family on Permutations (cont.) 

Simplified densityforms  Pr(x | λ) = exp 

Pr(x | λ) = exp









−1 i


l
 λij − A(λ)

λxl xk − A(λ)



Sum over all in-order pairs

Example x = (2 4 1 5 3)

λ2,4 + λ2,5 + λ2,3 + λ4,5 + λ1,5 + λ1,3

Some Properties 

Swapping xi and xi+1

x′ = (x1 , . . . xi+1 , xi , . . . , xn )  −λx ,x ′ Pr(x |λ) e i i+1 if xi < xi+1 = Pr(x|λ) eλxi+1 ,xi if xi > xi+1 Cost of switching adjacent (i, j), i < j is eλij 

Reverse permutation

x′ = (xn , xn−1 . . . x1 )  exp( i
const(λ)

Hidden Permutation Model 

“Graphical Model”

Pr(x|λ)

Pr(ot |xt = i, η) = M ult(ηi ) 

Joint distribution

Pr(x, o|λ, η) = Pr(x|λ)

n 

t=1

Pr(ot |xt , ηxt )

Max. Likelihood Estimation, Permutation Known 

Log-likelihood function: L(λ, η) = ln P (x | λ) + ln P (o| x, η)



Optimize η 



trivial (count frequency)

Optimize λ  

Convex problem Derivative:

i appears before j ?

▽λij (L) = fij (x) −



fij (x)P (x | λ)

x

Pr( i appears before j)

Max. Likelihood Estimation, Permutation Unknown 

Log-likelihood function l (λ, η) =

K 

k=1

 

log

  x



P (ok , x | λ, η)

Need to jointly optimize both λ, η ; Non-convex problem Can we use EM ?  M-step to for λ does not have a closed form 

Can try coordinate ascent:   

Fix η and improve λ by one gradient step Fix λ and improve η by EM (now has closed form) Didn’t work as well as simple gradient ascent

Max. Likelihood Estimation, Permutation Unknown 

Derivative for λ ▽λij (l) =



fij (x)P (x| o, λ, η) Pr( i appears before j given o)

x





fij (x)P (x | λ)

Pr( i appears before j)

x



Derivative for η 

Avoid dealing with constraints by transforming to natural parameter for multinomial ▽ηiv (l) =



∈ o[v]}P (x | o, λ, η) I{x−1 i

x

− Pr(v|ηi )

Pr( i appears at one of v’s position(s) given o)

Approximate Inference via MCMC 

Typical “inference” problem requires calculating an expectation.



Expectations can be approximated if we can generate sample from x ∼ Pr(x|λ)



How to draw random permutations? 

Try a well-known MCMC idea   

Start with a random initial permutation Randomly switch two positions Accept new permutation with probability



 |λ) min PP(x (x|λ) , 1 ′

Location-Based Activity Recognition on Campus Detection Problem

Student Activity Routines (Permutation with Partial-Order Constraints)

X

Atomic Activities Atomic activities Banking Lecture 1 Lecture 2 Lecture 3 Lecture 4 Group meeting 1 Group meeting 2 Group meeting 3 Coffee Breakfast Lunch

X

Corresponding Locations

Physical locations Bank Watson theater Hayman theater Davis theater Jones theater Bookmark cafe, Library, CBS Library, CBS, Psychology Bld Angazi cafe, Psychology Bld TAV, Angazi cafe, Bookmark cafe TAV, Angazi cafe, Bookmark cafe TAV, Bookmark cafe

GPS “Places”

X

“Places” from GPS 

Preprocessing  Removal of points above a speed threshold  Often missing precisely the samples we want! (e.g. buildings)  Interpolation within a day and across days  Clustered into groups to find significant places using DBSCAN

Detection Performance Activity 1

Activity 2

HMM KIR HPM HMM KIR HPM

TP 18.2 18.5 19.1 17.9 18.0 18.8

FP 19.5 2.0 4.1 4.4 0.7 0.4

Precision 48.3% 90.2% 82.3% 80.3% 96.3% 97.9%

Recall 91.0% 92.5% 95.5% 89.5% 90.5% 94.0%

In a long sequence of GPS “places”, detect occurrences of activity routine

Simulated Data, Supervised (Atomic Activities Given) Activity 1

Activity 2

NBC HMM KIR HPM NBC HMM KIR HPM

TP 16.6 18.3 18.3 19.1 17.1 17.7 18.1 18.5

FP 11.1 19.8 8.5 5.1 11.0 3.8 4.7 0.5

Precision 59.9% 48.0% 68.3 % 78.9% 60.9% 82.3% 79.4 % 97.4%

Simulated Data, Unsupervised

Recall 80.3% 91.5% 91.5% 95.5% 85.5% 88.5% 90.5% 92.5%

NBC HMM HPM

TP 6 8.5 9.8

FP 4 5.3 1.9

Precision 60% 61.6% 83.8%

Real Data, Unsupervised

Recall 60% 85% 98%

Conclusion  







Modelling permutation is hard, but not impossible A general way to parameterize distribution over permutations using the exponential family If permutation is not observed, use the Hidden Permutation Model (HPM) Demonstrated better performance than other models that do not exploit permutation constraints, as well as naïve multinomial permutation model (Kirshner et al). Future work  

Generalize to permutations with repetitions In supervised mode, a discriminative formulation similar to CRF might work better

x

Curtin University of Technology ... Data association, information extraction from text, machine translation .... Group meeting 1 Bookmark cafe, Library, CBS.

459KB Sizes 3 Downloads 530 Views

Recommend Documents

No documents