The Kalman Filter A brute force derivation Weijie Chen∗ Department of Political and Economic Studies University of Helsinki 6 June, 2012



Email: [email protected]

1 Introduction The Kalman filter algorithm has very high shreshhold for comprehension, and the logic is twisted with sophisticated mathematical notations. In this note, we derive the standard Kalman filter following the logic of Harvey (1990)[1]. All essential steps are presented, readers are required to think through each step carefully. The preliminary knowledge requires some advanced knowledge of matrix algebra and basic knowledge of multivariate normal distribution. The specific advanced matrix algebra techniques used in this notes will be fully explained in the first section. Then the second section will be derivation of the Kalman filter based on previous results.

2 Matrix Partitioning There are three partitioning techniques employed in Kalman filter derivation, they are inverse of partitioned symmetric matrix and determinant calculation of partitioned matrix and quadratic decomposition respectively.

2.1 Inverse of Partitioned Symmetric Matrices We need some basic facts from matrix algebra before we move onwards. The first one is (A + CBD)−1 = A−1 + A−1 C(B −1 + DA−1 C)−1 DA−1 It is easy to show it holds, (A + CBD)[A−1 − A−1 C(B −1 + DA−1 C)−1 DA−1 ] =(A + CBD)A−1 − (A + CBD)A−1 C(B −1 + DA−1 C)−1 DA−1 =I + CBDA−1 − (C + CBDA−1 C)(B −1 + DA−1 C)−1 DA−1 =I + CBDA−1 − CB(B −1 + DA−1 C)(B −1 + DA−1 C)−1 DA−1 =I + CBDA−1 − CBDA−1 = I

2

(1)

With this fact in mind, we move forwards. We have symmetric matrix A, partitioned into four blocks  A=

A11 A12





=

A21 A22

A11 A12 A012

 

A22

A11 has the dimension of p × p, A22 is q × q. With the same partitioned dimension, the inverse matrix B, which is also a symmetric matrix, can be partitioned accordingly  B=

B11 B12 0 B12 B22

  = A−1

The identity matrix In can be written as  In = AB =   =

A11 A12 A012

 

A22

B11 B12 0 B12

 

B22

0 A11 B11 + A12 B12 A11 B12 + A12 B22

A012 B11

+

0 A22 B12

A012 B12

+ A22 B22





=

Ip

0

 0

Iq

Now we have four identities 0 A11 B11 + A12 B12 = Ip

A11 B12 + A12 B22 = 0 0 A012 B11 + A22 B12 =0

A012 B12 + A22 B22 = Iq 0 and B Isolate B11 , B12 , B12 22 on one side, four identities above become −1 0 B11 = A−1 11 − A11 A12 B12

B12 = −A−1 11 A12 B22 0 0 B12 = −A−1 22 A12 B11 −1 0 B22 = A−1 22 − A22 A12 B12

3



0 and B 0 The idea is to represent B11 , B12 , B12 22 completely by blocks of A, we plug B12

into B11 −1 −1 0 B11 = A−1 11 − A11 A12 (−A22 A12 B11 ) −1 −1 0 = A−1 11 + A11 A12 A22 A12 B11

then collect B11 −1 0 −1 B11 − A−1 11 A12 A22 A12 B11 = A11 −1 −1 0 (Ip − A11 A12 A−1 22 A12 )B11 = A11 0 (A11 − A12 A−1 22 A12 )11 = Ip 0 −1 (A11 − A12 A−1 = B11 22 A12 )

Use the formula (1), we can get another form of B11 −1 −1 −1 −1 0 0 B11 = A−1 11 − A11 A12 (A22 + A12 A11 A12 ) A12 A11 0 Similarly for others, substitute B11 into B12 0 0 B12 = −A−1 22 A12 B11 −1 0 0 A22 B12 = −A012 B11 = −A012 (A−1 11 − A11 A12 B12 ) −1 0 0 A22 B12 = −A012 A−1 11 + A12 A11 A12 B12 −1 0 B12 A22 = −A−1 11 A12 + B12 A12 A11 A12 −1 B12 A22 − B12 A012 A−1 11 A12 = −A11 A12 −1 B12 (A22 − A012 A−1 11 A12 ) = −A11 A12 −1 0 −1 B12 = −A−1 11 A12 (A22 − A12 A11 A12 )

4

Next one, substitute B22 into B12 B12 = −A−1 11 A12 B22 −1 0 A11 B12 = −A12 B22 = −A12 (A−1 22 − A22 A12 B12 ) 0 0 0 0 B12 A11 = −A−1 22 A12 + B12 A12 A22 A12 −1 0 0 0 B12 (A11 − A12 A−1 22 A12 ) = −A22 A12 −1 0 −1 0 0 B12 = −A−1 22 A12 (A11 − A12 A22 A12 )

And last one B22 , −1 0 −1 0 B22 = A22 − A−1 22 A12 B12 = A22 − A22 A12 (−A11 A12 B22 ) −1 0 −1 B22 = A−1 22 + A22 A12 A11 A12 B22 −1 0 −1 A−1 22 = B22 − A22 A12 A11 A12 B22 −1 −1 0 A−1 22 = (Iq − A22 A12 A11 A12 )B22 −1 B22 = (A22 − A012 A−1 11 A12 )

In summary, we can represent partitioned matrix B by the blocks of A 0 −1 B11 = (A11 − A12 A−1 22 A12 )

(2)

−1 −1 0 B12 = −A−1 11 A12 (A22 − A12 A11 A12 )

(3)

−1 0 −1 0 0 B12 = −A−1 22 A12 (A11 − A12 A22 A12 )

(4)

−1 B22 = (A22 − A012 A−1 11 A12 )

(5)

2.2 Determinant of Partitioned Matrices Decompose the partitioned matrix A into two matrices as below     A11 A12 A11 0 I A−1 11 A12      A= = A21 A22 A012 I 0 A22 − A012 A−1 11 A12   0 I A12 A11 − A12 A−1 22 A12 0    = 0 A22 A−1 I 22 A21

5

   

Then we could calculate the determinant   A11 A12  = |A11 ||A22 − A012 A−1 |A| =  11 A12 | A21 A22 0 = |A22 ||A11 − A12 A−1 22 A12 |

(6)

(7)

2.3 Quadratic decomposition For any vector u, v and a symmetric matrix A = A0 , we have following theorem u0 Au − 2u0 Av + v 0 Av = u0 Au − u0 Av − u0 Av + v 0 Av = u0 A(u − v) − (u0 − v 0 )Av = (u − v)0 A(u − v)

(8)

= (v − u)0 A(v − u)

(9)

To understand the step from the second to the third equation, you shall be awared of the fact that u0 Av = v 0 Au, which is a scalar. Transpose only change the notation, but not the scalar itself.

3 Conditional Multivariate Guassian Distribution We have a n dimensional random vector  x=

x1

 

x2 x1 and x2 have dimension p and q respectively. x follows a multivariate normal distribution x ∼ N (µ, Σ) where µ = E(x) and Σ = E(x − µ)2 , partitioned accordingly     µ1 Σ11 Σ11    and Σ= µ2 Σ21 Σ22

6

The joint density of x is   1 1 0 −1 p exp − (x − µ) Σ (x − µ) n 2 (2π) 2 |Σ|   1 1 exp − Q(x , x ) = 1 2 2 (2π)n/2 |Σ|1/2

f (x) = f (x1 , x2 ) =

where we define the quadratic form Q(x1 , x2 ) = (x − µ)0 Σ−1 (x − µ) Partition the equation above  Q(x1 , x2 ) = [(x1 − µ1 )0

(x2 − µ2 )0 ] 

Σ11 Σ12 Σ21

Σ22



x1 − µ 1

 

 x2 − µ 2

Multiply the matices out then collect terms,  Q(x1 , x2 ) =

h

(x1 − µ1 )0 Σ11 + (x2 − µ2 )0 Σ21 (x1 − µ1 )0 Σ12 + (x2 − µ2 )0 Σ22

i

x 1 − µ1



 x 2 − µ2

= (x1 − µ1 )0 Σ11 (x1 − µ1 ) + (x2 − µ2 )0 Σ21 (x1 − µ1 ) + (x1 − µ1 )0 Σ12 (x2 − µ2 ) + (x2 − µ2 )0 Σ22 (x2 − µ2 ) Because Σ21 = Σ12 and (x1 − µ1 )0 Σ12 (x2 − µ2 ) is a scalar, thus Q(x1 , x2 ) = (x1 − µ1 )0 Σ11 (x1 − µ1 ) + 2(x1 − µ1 )0 Σ12 (x2 − µ2 ) + (x2 − µ2 )0 Σ22 (x2 − µ2 ) (10) We need to figure out what Σ11 , Σ12 and Σ22 are by using facts presented in Section 2.2, −1 −1 −1 −1 0 0 −1 0 Σ11 = (Σ11 − Σ12 Σ−1 = Σ−1 11 + Σ11 Σ12 (Σ22 − A12 Σ11 Σ12 ) Σ12 Σ11 22 Σ12 )

(11)

−1 0 −1 0 −1 −1 −1 Σ22 = (Σ22 − Σ012 Σ−1 = Σ−1 11 Σ12 ) 22 + Σ22 Σ12 (Σ11 − Σ12 Σ22 Σ12 ) Σ12 Σ22

(12)

−1 0 −1 Σ12 = −Σ−1 = (Σ21 )0 11 Σ12 (Σ22 − Σ12 Σ11 Σ12 )

(13)

7



Substitute (11),(12) and (13) into (10) 0 −1 Q(x1 , x2 ) = (x1 − µ1 )0 (Σ11 − Σ12 Σ−1 22 Σ12 ) (x1 − µ1 ) −1 0 −1 − 2(x1 − µ1 )0 [Σ−1 11 Σ12 (Σ22 − Σ12 Σ11 Σ12 ) ](x2 − µ2 ) −1 + (x2 − µ2 )0 [(Σ22 − Σ012 Σ−1 11 Σ12 ) ](x2 − µ2 ) −1 −1 −1 0 −1 0 = (x1 − µ1 )0 [Σ−1 11 + Σ11 Σ12 (Σ22 − Σ12 Σ11 Σ12 ) Σ12 Σ11 ](x1 − µ1 ) −1 0 −1 − 2(x1 − µ1 )0 [Σ−1 11 Σ12 (Σ22 − Σ12 Σ11 Σ12 ) ](x2 − µ2 ) −1 + (x2 − µ2 )0 [(Σ22 − Σ012 Σ−1 11 Σ12 ) ](x2 − µ2 )

= (x1 − µ1 )0 Σ−1 11 (x1 − µ1 ) −1 −1 0 −1 0 + (x1 − µ1 )0 Σ−1 11 Σ12 (Σ22 − Σ12 Σ11 Σ12 ) Σ12 Σ11 (x1 − µ1 ) −1 0 −1 − 2(x1 − µ1 )0 Σ−1 11 Σ12 (Σ22 − Σ12 Σ11 Σ12 ) (x2 − µ2 ) −1 + (x2 − µ2 )0 (Σ22 − Σ012 Σ−1 11 Σ12 ) (x2 − µ2 )

= (x1 − µ1 )0 Σ−1 11 (x1 − µ1 ) −1 −1 0 0 0 Σ Σ−1 (x1 − µ1 ) + [Σ12 Σ−1 11 (x1 − µ1 )] (Σ22 − Σ12 Σ11 Σ12 ) {z }| {z } | 12 11 {z } | u0

u

A

−1 0 −1 0 −2[Σ12 Σ−1 (x2 − µ2 ) 11 (x1 − µ1 )] (Σ22 − Σ12 Σ11 Σ12 ) | {z }| {z } | {z } −2u0

v

A

Σ12 )−1 (x2 − µ2 ) + (x2 − µ2 )0 (Σ22 − Σ012 Σ−1 | {z } | {z 11 } | {z } v0

v

A

= (x1 − µ1 )0 Σ−1 11 (x1 − µ1 ) + [(x2 − µ2 ) − Σ012 Σ−1 (x1 − µ1 )]0 (Σ22 − Σ012 Σ−1 Σ12 )−1 [(x2 − µ2 ) − Σ012 Σ−1 (x1 − µ1 )] | {z 11 }| {z 11 }| {z 11 } (v−u)0

A

(v−u)

The first term of the second equation uses the fact (1), the last equation is using the fact (9).

8

Make more definitions to assist the derivation b = µ2 + Σ012 Σ−1 11 (x1 − µ1 ) A = Σ22 − Σ012 Σ−1 11 Σ12 Q1 (x1 ) = (x1 − µ1 )0 Σ−1 11 (x1 − µ1 ) −1 −1 0 0 −1 0 Q2 (x1 , x2 ) = [(x2 − µ2 ) − Σ012 Σ−1 11 (x1 − µ1 )] (Σ22 − Σ12 Σ11 Σ12 ) [(x2 − µ2 ) − Σ12 Σ11 (x1 − µ1 )] −1 −1 0 0 −1 0 = {x2 − [µ2 + Σ012 Σ−1 11 (x1 − µ1 )]} (Σ22 − Σ12 Σ11 Σ12 ) {(x2 − [µ2 + Σ12 Σ11 (x1 − µ1 )]} | {z } | {z } | {z } A−1

b

b

= (x2 − b)0 A−1 (x2 − b) The manipulation above is for change of variables in the density function, it will be sooner clear below. And notice that 0 −1 Q(x1 , x2 ) = Q1 (x1 ) + Q2 (x1 , x2 ) = (x1 − µ1 )0 Σ−1 11 (x1 − µ1 ) + (x2 − b) A (x2 − b)

Rewrite the joint distribution f (x1 , x2 )   1 1 f (x1 , x2 ) = exp − Q(x , x ) 1 2 2 (2π)n/2 |Σ|1/2   1 1 = exp − Q(x1 , x2 ) 1/2 2 (2π)n/2 |Σ11 |1/2 |Σ22 − Σ012 Σ−1 11 Σ12 |   1 1 0 −1 = exp − (x1 − µ1 ) Σ11 (x1 − µ1 ) × 2 (2π)p/2 |Σ11 |1/2   1 1 0 −1 (x − b) A (x − b) exp − 2 2 2 (2π)q/2 |A|1/2 = P (x1 , µ1 , Σ11 ) P (x2 , b, A) The second equation uses fact (7) and rest of steps are clear. The conditional distribution of x2 given realisation of x1 is f2|1 (x2 |x1 ) =

f (x1 , x2 ) f (x1 , x2 ) = R∞ f (x1 ) −∞ f (x1 , x2 )dx2

where we could see that Z ∞ Z f (x1 , x2 )dx2 = P (x1 , µ1 , Σ11 ) −∞



−∞

9

P (x2 , b, A)dx2 = P (x1 , µ1 , Σ11 )

because P (x2 , b, A) is a probability density function. Then it is crystal clear that f2|1 (x2 |x1 ) =

P (x1 , µ1 , Σ11 ) P (x2 , b, A) = P (x2 , b, A) P (x1 , µ1 , Σ11 )

And we end the derivation of condition properties   1 1 0 −1 exp − (x2 − b) A (x2 − b) f2|1 (x2 |x1 ) = 2 (2π)q/2 |A|1/2 where mean vector is b = µ2 + Σ012 Σ−1 11 (x1 − µ1 )

(14)

A = Σ22 − Σ012 Σ−1 11 Σ12

(15)

and covariance matrix is

4 Time-Variant State Space Model The Kalman filter is based the recursive state space model. This section will present you the general setting of state space model we need in derivation of the Kalman filter. First, we define the transition equation xt = ct + Ft xt−1 + Bt vt

E(vt ) = 0 and Var (vt ) = Qt

(16)

the coefficient matrix Ft and Bt are labelled with time t, which indicates a time-variant state space model1 . The xt is the state vector which describes the state of the system, and it is an unobservable vector. And ct is a deterministic term. The measurement equation is given by yt = Ht xt + dt + ωt

E(ωt ) = 0 and Var (ωt ) = Rt

yt is observable, namely input data. dt is a deterministic term. Assume that E(ωt vk0 ) = 0 1

If all variables are stationary, the state-space model is time-invariant.

10

(17)

and also E(vt x00 ) = 0 and E(ωt x00 ) = 0

5 Derivation of The Kalman Filter With the Gaussian assumption of vt , x0 has multivariate Gaussian distribution x0 ∼ N (E(x0 ), P0 ) From transition equation x1 = c1 + F1 x0 + B1 v1 Take expectation E(c1 + F1 x0 + B1 v1 ) = c1 + F1 E(x0 ) = E(x1|0 )

(18)

which is a conditional expectation based on information known at time 1. Also take variance Var (c1 + F1 x0 + B1 v1 ) = F1 Var (x0 )F10 + B1 Var (v1 )B10 = Var (x1|0 ) = F1 P0 F10 + B1 Q1 B10 = P1|0

(19)

which is the conditional variance based on information know at time 1. Rewrite the state space model as x1 = E(x1|0 ) + [x1 − E(x1|0 )] y1 = H1 E(x1|0 ) + d1 + H1 [x1 − E(x1|0 )] + ω1 the purpose of rewriting is clear by noticing that [x1 − E(x1|0 )] is one-step ahead forecast error. Then they follow a multivariate normal distribution       x1 E(x1|0 ) P1|0 P1|0 H10   ∼ N  ,  y1 H1 E(x1|0 + d1 ) H1 P1|0 H1 P1|0 H10 + R1

11

We have assumed that the expectation of forecast error E([x1 − E(x1|0 )]) = 0. Use the results from Section 3, (14) and (15), we have2 E(x1 ) = E(x1|0 ) + P1|0 H10 (H1 P1|0 H10 + R1 )−1 { y1 − [H1 E(x1|0 ) + d1 ]} | {z } | {z } | {z } |{z} | {z } Σ021

µ1

x2

Σ−1 22

µ2

P1 = P1|0 − P1|0 H10 (H1 P1|0 H10 + R1 )−1 H1 P1|0 {z } | {z } |{z} | {z } | Σ11

Σ021

Σ−1 22

(20)

(21)

Σ21

All above derivation is just the first iteration of the Kalman filter from period 0 to 1. In general we have prediction equations and updating equations in the Kalman filtering algorithm, from (18) E(xt|t−1 ) = ct + Ft E(xt−1 ) and also from (19) Pt|t−1 = Ft Pt−1 Ft0 + Bt Qt Bt0 The two equations are prediction equations, the philosophy behind the name indicates that we are using the information at time t − 1, and t − 1 only, to predict the variables at t. After prediction, we are waiting for the new observations coming into the system, once a new observation yt comes, we update the system E(xt ) = E(xt|t−1 ) + Pt|t−1 H10 (Ht Pt|t−1 Ht0 + Rt )−1 [yt − Ht E(xt|t−1 ) − xt ] Pt = Pt|t−1 Ht0 (Ht Pt|t−1 Ht0 + Rt )−1 Ht Pt|t−1 these two equations are update equations, they are from (20) and (21). In conclusion, the philosophy behind the Kalman filter is to recursively update the information of the system, this is the most original version of the Kalman filter. The 2

The notation for x2 in the underbrace might be confused with state vector, however this is in line with the notation used in Section 3, specifically equation (14).

12

Kalman filter has very restrict assumptions about the stochastic components of the system, if we relax those assumptions, the Kalman filter will be modified accordingly to handle more general situations.

References [1] Harvey, A. (1990): The Econometric Analysis of Time Series, the MIT Press, 2nd Edition [2] Kalman, R. (1960): ‘A New Approach to Linear Filtering and Prediction Problem,’ Journal of Basic Engineering,pp. 35-45 (March 1960)

13

The Kalman Filter

The joint density of x is f(x) = f(x1,x2) .... The manipulation above is for change of variables in the density function, it will be ... Rewrite the joint distribution f(x1,x2).

226KB Sizes 0 Downloads 266 Views

Recommend Documents

20140304 pengantar kalman filter diskrit.pdf
Department of Computer Science. University of North Carolina Chapel Hill. Chapel Hill, NC 27599-3175. Diperbarui: Senin, 24 Juli 2006. Abstrak. Di tahun 1960 ...

Kalman filter cheat sheet.pdf
Page 1 of 1. Kalman Filter. Kalman filter: a data fusion algorithm - best estimate of current state given: Prediction from last known state pdf (probability density ...

The Kalman Filter - Yiqian Lu 陆奕骞
The Kalman filter algorithm has very high shreshhold for comprehension, and the logic is twisted with sophisticated mathematical notations. In this note, we derive the standard. Kalman filter following the logic of Harvey (1990)[1]. All essential ste

Importance Sampling-Based Unscented Kalman Filter for Film ... - Irisa
Published by the IEEE Computer Society. Authorized ..... degree of penalty dictated by the potential function. ..... F. Naderi and A.A. Sawchuk, ''Estimation of Images Degraded by Film- ... 182-193, http://www.cs.unc.edu/˜welch/kalman/media/.

Effect of Noise Covariance Matrices in Kalman Filter - IJEECS
In the Kalman filter design, the noise covariance matrices. (Q and R) are .... statistical variance matrix of the state error, the optimality of the Kalman filter can be ...

Descalloping of ScanSAR Image using Kalman Filter ...
IJRIT International Journal of Research in Information Technology, Volume 1, ... Therefore such techniques are not suitable to be applied to processed data. A.

Extended Kalman Filter Based Learning Algorithm for ...
Kalman filter is also used to train the parameters of type-2 fuzzy logic system in a feedback error learning scheme. Then, it is used to control a real-time laboratory setup ABS and satisfactory results are obtained. Index Terms—Antilock braking sy

Descalloping of ScanSAR Image using Kalman Filter ...
IJRIT International Journal of Research in Information Technology, Volume 1, Issue 4, ... two major artifacts in processed imges known as scalloping and inter-.

Importance Sampling Kalman Filter for Image Estimation - Irisa
Kalman filtering, provided the image parameters such as au- toregressive (AR) ... For example, if we consider a first-order causal support (com- monly used) for ...

6DOF Localization Using Unscented Kalman Filter for ...
Email: [email protected], {ce82037, ykuroda}@isc.meiji.ac.jp. Abstract—In this ..... of the inliers is high, the accuracy of visual odometry is good, while when the ..... IEEE International Conference on Robotics and Automation, Vol. 1, ... Sur

Kalman Filter for Mobile Robot Localization
May 15, 2014 - Algorithm - This is all you need to get it done! while true do. // Reading robot's pose. PoseR = GET[Pose.x; Pose.y; Pose.th]. // Prediction step. ¯Σt = (Gt ∗ Σt−1 ∗ GT t )+(Vt ∗ Σ∆t ∗ V T t ) + Rt. // Update step featu

Effect of Noise Covariance Matrices in Kalman Filter - IJEECS
#1, *2 Electrical and Computer Engineering Department, Western Michigan ... *3 Electrical Engineering Department, Tafila Technical University, Tafila, Jordan.

What is the Kalman Filter and How can it be used for Data Fusion?
(encoders and visual) to solve this issue. So for my math project, I wanted to explore using the Kalman Filter for attitude tracking using IMU and odometry data.

Unscented Kalman Filter for Image Estimation in film-grain noise - Irisa
May 18, 2009 - exposure domain. Among ... in density domain, noise v is additive white Gaussian with ..... The proposed method performs the best both in.

Unscented Kalman Filter for Image Estimation in film-grain noise - Irisa
May 18, 2009 - exposure domain. Among earlier works ... In the density domain, the degraded image can be modeled as .... An MRF with such a quadratic reg-.

Vessel Enhancement Filter Using Directional Filter Bank
Jul 21, 2008 - reduced compared to that in the original one due to its omni-directional nature. ...... Conference on Signals, Systems and Computers, Vol. 2, 1993, pp. ... matching, IEEE Trans. on Circuits and Systems for Video. Technology 14 ...

The Rational Inattention Filter - Cornell Economics
optimal signal in a special case of the dynamic attention choice problem analyzed here, the AR(1) case. Furthermore, Steiner, Stewart, and Matejka (2015) study a general dynamic model with discrete choice under rational inattention. They show that th

Bilateral Filter - GitHub
where the normalization term. Wp = ∑. xiϵΩ fr(||I(xi) − I(x)||)gs(||xi − x||) ... gs is the spatial kernel for smoothing differences in coordinates. This function can be a ...

[PDF] Fundamentals of Kalman Filtering
addition, both FORTRAN and MATLAB* source code are available electronically for all ... Deep Learning (Adaptive Computation and Machine Learning Series).