State-Space Inference and Learning with Gaussian Processes Ryan Turner

Seattle, WA March 5, 2010 joint work with Marc Deisenroth and Carl Edward Rasmussen

Turner (Engineering, Cambridge)

State-Space Inference and Learning with Gaussian Processes

1

Outline

Motivation for dynamical systems Expectation Maximization (EM) Gaussian Processes (GP) Inference Learning Results

Turner (Engineering, Cambridge)

State-Space Inference and Learning with Gaussian Processes

2

Motivation measurement device (sensor)

position, velocity g(position,noise)

system

filter

p(position, velocity)

throttle

controller

estimating (latent) states from noisy measurements Turner (Engineering, Cambridge)

State-Space Inference and Learning with Gaussian Processes

3

Setup xt−1

f

g

zt−1

g

zt

xt = f (xt−1 ) + w, yt = g(xt ) + v,

f

xt

xt+1 g

zt+1

w ∼ N (0, Q)

v ∼ N (0, R)

x: latent state, y: measurement learning: find f and g using y1:T

Turner (Engineering, Cambridge)

State-Space Inference and Learning with Gaussian Processes

4

The Goal

Learn the NLDS in an nonparametric and probabilistic fashion EM algorithm. Requires inference (filtering and smoothing) and prediction in nonlinear dynamical systems (NLDS) using moment matching. filtering: find distribution p(xt |y1:t ) smoothing: find distribution p(xt |y1:T ) prediction: find distribution p(yt+1 |y1:t )

Gaussian process inference and learning (GPIL) algorithm

Turner (Engineering, Cambridge)

State-Space Inference and Learning with Gaussian Processes

5

Expectation Maximization

EM iterates between two steps, the E-step and the M-step. E-step (or inference step): find a posterior distribution p(X|Y, Θ). M-step: maximize the expected log-likelihood Q = EX [log p(X, Y|Θ)] wrt Θ.

Turner (Engineering, Cambridge)

State-Space Inference and Learning with Gaussian Processes

6

Pictorial introduction to Gaussian process regression 4

f(x)

2

0

−2

−4 −5

0 x

Turner (Engineering, Cambridge)

State-Space Inference and Learning with Gaussian Processes

5

7

Pictorial introduction to Gaussian process regression 4

f(x)

2

0

−2

−4 −5

0 x

Turner (Engineering, Cambridge)

State-Space Inference and Learning with Gaussian Processes

5

7

Pictorial introduction to Gaussian process regression 4

f(x)

2

0

−2

−4 −5

0 x

Turner (Engineering, Cambridge)

State-Space Inference and Learning with Gaussian Processes

5

7

Pictorial introduction to Gaussian process regression 4

f(x)

2

0

−2

−4 −5

0 x

Turner (Engineering, Cambridge)

State-Space Inference and Learning with Gaussian Processes

5

7

Existing Methods for nonlinear systems

Extended Kalman Filter (EKF) [Maybeck, 1979]. Unscented Kalman Filter (UKF) [Julier and Uhlmann, 1997]. Assumed Density Filter (ADF) [Boyen and Koller, 1998, Opper, 1998]. Radial Basis Functions (RBF) [Ghahramani and Roweis, 1999]. Neural networks [Honkela and Valpola, 2005]. Other GP approaches [Wang et al., 2008, Ko and Fox, 2009b] GPDM and GPBF. GPs for filtering in the context of the UKF, the EKF [Ko and Fox, 2009a], and the ADF [Deisenroth et al., 2009].

Turner (Engineering, Cambridge)

State-Space Inference and Learning with Gaussian Processes

8

f( · ) xτ −1

xτ +1

xt−1

xt

xt+1

yτ −1

yτ +1

yt−1

yt

yt+1

training

Turner (Engineering, Cambridge)

g( · )

test

State-Space Inference and Learning with Gaussian Processes

9

Advantages of GPIL Model f and g with GPs: f ∼ GP f , g ∼ GP g . GPs account for three uncertainties: system noise measurement noise model uncertainty

Integrates out the latent states (not MAP) unlike [Wang et al., 2008, Ko and Fox, 2009b]. Tractable algorithm for approximate inference (smoothing) in GP state-space models. Learning without ground-truth observations xi of the latent states. 4

f(x)

2

0

−2

−4 −5

Turner (Engineering, Cambridge)

0 x

5

State-Space Inference and Learning with Gaussian Processes

10

E-Step: Forward sweep

time update p(xt−1 |z1:t−1 ) xt−1

f

measurement update

p(xt |z1:t−1 )

p(xt |z1:t−1 )

p(xt |z1:t )

xt

xt

xt g zt

zt p(zt |z1:t−1 ) 1) predict next hidden state

2) predict measurement

measure zt

3) hidden state posterior

Backward sweep also analytic

Turner (Engineering, Cambridge)

State-Space Inference and Learning with Gaussian Processes

11

Predictions Using Moment Matching

1

1

0.5

0.5 0 −0.5

−1

−1

xt+1

0 −0.5

−1.5

−1.5

−2

−2

−2.5

−2.5

−3

−3

−3.5

−3.5

−4

2

1.5

1

0.5

0

−4 −0.5

Turner (Engineering, Cambridge)

0

(xt,ut)

0.5

State-Space Inference and Learning with Gaussian Processes

1

12

M-Step

xt−1

f

g

zt−1

Turner (Engineering, Cambridge)

f

xt g

zt

xt+1 g

zt+1

State-Space Inference and Learning with Gaussian Processes

13

Pseudo-training data β6

2 β2

β3

1 0

α1

α5

α2 α3

α7 α6

β4

−1 −2

β5

α4

β7

β1

−2

−1

Turner (Engineering, Cambridge)

0

1

2

State-Space Inference and Learning with Gaussian Processes

14

Why We Need Pseudo-training Data

α, β xt−1

xt

xt+1

yt−1

yt

yt+1

ξ, υ GP f and GP g are not full GPs, but rather sparse GPs Turner (Engineering, Cambridge)

State-Space Inference and Learning with Gaussian Processes

15

Why We Need Pseudo-training Data

xt → xt+1 given α and β is a GP prediction.

xt−1 is (uncertain) test input. α and β are standard GP training set. xt+1 ⊥ xt−1 |xt , α, β

Markovian property.

Without using a pseudo training set, xt+1 ⊥ xt−1 |xt , f conditions on ∞-dimensional object f intractable

Turner (Engineering, Cambridge)

State-Space Inference and Learning with Gaussian Processes

16

The Auxiliary Function

We decompose Q into Q = E [log p(X, Y|Θ)] = E[log p(x1 |Θ)] X X   T T X X + E log p(xt |xt−1 , Θ)+ log p(yt |xt , Θ) X | {z } t=1 | {z } t=2 Transition

Turner (Engineering, Cambridge)

Measurement

State-Space Inference and Learning with Gaussian Processes

17

The Auxiliary Function

We decompose Q into Q = E [log p(X, Y|Θ)] = E[log p(x1 |Θ)] X X   T T X X + E log p(xt |xt−1 , Θ)+ log p(yt |xt , Θ) X | {z } t=1 | {z } t=2 Transition

Measurement

using the factorization properties of the model.

Turner (Engineering, Cambridge)

State-Space Inference and Learning with Gaussian Processes

17

The Transition Contribution

EX [log p(xt |xt−1 , Θ)]   M   (xti − µi (xt−1 ))2 1X EX +EX log σi2 (xt−1 ) =− 2 2 i=1 σi (xt−1 ) | {z } | {z } Complexity Term Data Fit Term

Turner (Engineering, Cambridge)

State-Space Inference and Learning with Gaussian Processes

18

The Transition Contribution

EX [log p(xt |xt−1 , Θ)]   M   (xti − µi (xt−1 ))2 1X EX +EX log σi2 (xt−1 ) =− 2 2 i=1 σi (xt−1 ) | {z } | {z } Complexity Term Data Fit Term

We approximate the data fit     EX (xti − µi (xt−1 ))2 (xti − µi (xt−1 ))2 EX ≈ σi2 (xt−1 ) EX [σi2 (xt−1 )] and lower bound the EM lower bound with     EX log σi2 (xt−1 ) ≤ log EX σi2 (xt−1 ) .

Turner (Engineering, Cambridge)

State-Space Inference and Learning with Gaussian Processes

18

Synthetic Data 8

ground truth posterior mean pseudo targets

6

f(x)

4 2 0 −2

−3

−2

−1

0

1

2

3

−3

−2

−1

0 1 2 3 xState-Space Inference and Learning with Gaussian Processes

0.5

0

Turner (Engineering, Cambridge)

19

Snow Data

4

4

posterior mean pseudo targets

3 snowfall in log−cm

3

2

1

2

1

0

0

−1

−1 1

0.1

0.01

−1 Turner (Engineering, Cambridge)

0

1

2 x

3

4

5

State-Space Inference and Learning with Gaussian Processes

20

Quantitative Results

Method TIM Kalman ARGP NDFA GPDM GPIL ? UKF EKF GP-UKF

NLL synth. 2.21±0.0091 2.07±0.0103 1.01±0.0170 2.20±0.00515 3330±386 0.917 ± 0.0185 4.55±0.133 1.23±0.0306 6.15±0.649

RMSE synth.

Turner (Engineering, Cambridge)

2.18 1.91 0.663 2.18 2.13 0.654 2.19 0.665 2.06

NLL real 1.47±0.0257 1.29±0.0273 1.25±0.0298 14.6±0.374 N/A 0.684 ± 0.0357 1.84±0.0623 1.46±0.0542 3.03±0.357

RMSE real

1.01 0.783 0.793 1.06 N/A 0.769 0.938 0.905 0.884

State-Space Inference and Learning with Gaussian Processes

21

Conclusions

GPs for flexible distribution over nonlinear dynamical systems. Filtering and smoothing based on moment matching Learning the dynamical system (even without ground-truth latent state)

Turner (Engineering, Cambridge)

State-Space Inference and Learning with Gaussian Processes

22

References Boyen, X. and Koller, D. (1998). Tractable inference for complex stochastic processes. In Proceedings of the 14th Conference on Uncertainty in Artificial Intelligence (UAI 1998), pages 33–42, San Francisco, CA, USA. Morgan Kaufmann. Deisenroth, M. P., Huber, M. F., and Hanebeck, U. D. (2009). Analytic moment-based Gaussian process filtering. In Bouttou, L. and Littman, M. L., editors, Proceedings of the 26th International Conference on Machine Learning, pages 225–232, Montreal, Canada. Omnipress. Ghahramani, Z. and Roweis, S. (1999). Learning nonlinear dynamical systems using an EM algorithm. In Advances in Neural Information Processing Systems 11, pages 599–605. Honkela, A. and Valpola, H. (2005). Unsupervised variational Bayesian learning of nonlinear models. In Saul, L. K., Weiss, Y., and Bottou, L., editors, Advances in Neural Information Processing Systems 17, pages 593–600. MIT Press, Cambridge, MA. Julier, S. J. and Uhlmann, J. K. (1997). A new extension of the Kalman filter to nonlinear systems. In Proceedings of AeroSense: 11th Symposium on Aerospace/Defense Sensing, Simulation and Controls, pages 182–193, Orlando, FL, USA. Ko, J. and Fox, D. (2009a). GP-BayesFilters: Bayesian filtering using Gaussian process prediction and observation models. Autonomous Robots, 27(1):75–90. Ko, J. and Fox, D. (2009b). Learning GP-BayesFilters via Gaussian Process Latent Variable Models. In Proceedings of Robotics: Science and Systems, Seattle, USA. Maybeck, P. S. (1979). Stochastic Models, Estimation, and Control, volume 141 of Mathematics in Science and Engineering. Academic Press, Inc. Opper, M. (1998).

Turner (Engineering, Cambridge)

State-Space Inference and Learning with Gaussian Processes

23

## State-Space Inference and Learning with Gaussian Processes

State-Space Inference and Learning with Gaussian Processes. Ryan Turner. Seattle, WA. March 5, 2010 joint work with Marc Deisenroth and Carl Edward Rasmussen. Turner (Engineering, Cambridge). State-Space Inference and Learning with Gaussian Processes. 1 ...

#### Recommend Documents

Automatic Model Construction with Gaussian Processes - GitHub
This chapter also presents a system that generates reports combining automatically generated ... in different circumstances, our system converts each kernel expression into a standard, simplified ..... (2013) developed an analytic method for ...

Automatic Model Construction with Gaussian Processes - GitHub
One can multiply any number of kernels together in this way to produce kernels combining several ... Figure 1.3 illustrates the SE-ARD kernel in two dimensions. Ã. = â ...... We'll call a kernel which enforces these symmetries a MÃ¶bius kernel.

This model, which we call additive Gaussian processes, is a sum of functions of all ... way on an interaction between all input variables, a Dth-order term is ... 3. 1.2 Defining additive kernels. To define the additive kernels introduced in this ...

Automatic Model Construction with Gaussian Processes - GitHub
just an inference engine, but also a way to construct new models and a way to check ... 3. A model comparison procedure. Search strategies requires an objective to ... We call this system the automatic Bayesian covariance discovery (ABCD).

Deep Gaussian Processes - GitHub
Because the log-normal distribution is heavy-tailed and its domain is bounded .... of layers as long as D > 100. ..... Deep learning via Hessian-free optimization.

Modeling and Inference for Spatial Processes with ...
4 .2 Mean Square Error ofVarious Estimators . . . . . . . . . . . . . . . . 4 1 ... 4 .4 Mean Square Error Various Estimators . . . . . . . . . . . . . . . . . 4 ...... Scope of this StWq dba.

Collaborative Multi-output Gaussian Processes
model over P outputs and N data points can have ... A motivating example of a large scale multi-output ap- ... We analyze our multi-out model on a toy problem.

LEARNING AND INFERENCE ALGORITHMS FOR ...
Department of Electrical & Computer Engineering and Center for Language and Speech Processing. The Johns ..... is 2 minutes, and the video and kinematic data are recorded at 30 frames per ... Training and Decoding Using SS-VAR(p) Models. For each ...

Learning Gaussian Mixture Models with Entropy Based ...
statistical modeling of data, like pattern recognition, computer vision, image analysis ...... Degree in Computer Science at the University of. Alicante in 1999 and ...

Learning Gaussian Mixture Models with Entropy Based ...
statistical modeling of data, like pattern recognition, computer vision, image ... (MAP) or Bayesian inference [8][9]. â Departamento de ... provide a lower bound on the approximation error [14]. In ...... the conversion from color to grey level. Fi

Efficient Variational Inference for Gaussian Process ...
Intractable so approximate inference is needed. â¢ Bayesian inference for f and w, maximum likelihood for hyperparameters. â¢ Variational messing passing was ...

Efficient Variational Inference for Gaussian Process ...
covariance functions Îºw and Îºf evaluated on the test point xâ wrt all of the ..... partment of Broadband, Communications and the Dig- ital Economy and the ...

Structured Learning with Approximate Inference - Research at Google
little theoretical analysis of the relationship between approximate inference and reliable ..... âsoftâ algorithmic separability) gives rise to a bound on the true risk.

Inference in models with adaptive learning
Feb 13, 2010 - Application of this method to a typical new Keynesian sticky-price model with perpetual ...... Princeton, NJ: Princeton University Press. Hodges ...

Gaussian Margin Machines - Proceedings of Machine Learning ...
we maintain a distribution over alternative weight vectors, rather than committing to ..... We implemented in matlab a Hildreth-like algorithm (Cen- sor and Zenios ...

Occupation Times of Gaussian Stationary Processes ...
We investigate the existence of local times for Gaussian processes. Let. ÂµÏ(A) = â«. 1 ... process with a spectral measure F satisfying two conditions. â« +â. ââ.

Gaussian Margin Machines - Proceedings of Machine Learning ...
separable samples, we can relax the inequality constraints by introducing a slack variable Î¾i for each point xi and aug- menting the objective function with a ...

Efficient Inference and Structured Learning for ... - Research at Google
constraints are enforced by reverting to k-best infer- ..... edge eâ,0 between vâ1 and v0. Set the weight ... does not affect the core role assignment, the signature.

Efficient Inference and Structured Learning for ... - Semantic Scholar
1 Introduction. Semantic role .... speech tag).1 For each lexical unit, a list of senses, or frames ..... start, and then subtract them whenever we assign a role to a ...

Clustering with Gaussian Mixtures
Clustering with. Gaussian Mixtures. Andrew W. Moore. Professor. School of Computer Science. Carnegie Mellon University www.cs.cmu.edu/~awm.