EE 396: Lecture 4 Ganesh Sundaramoorthi Feb. 19, 2011 In the last lectures, we constructed by using the additive noise model and MAP estimation, the following energy to be minimized : Z Z 2 (1) E(u) = − log p(u|I) = (I(x) − u(x)) dx + α |∇u(x)|2 dx, Ω
Ω
where E : U → R, I : Ω → R is the observed or measured image, and α > 0. The u ∈ U that minimizes E is our estimate of the denoised image. We also derived the Euler-Lagrange equations for the energy above, and they are ( u(x) − α∆u(x) = I(x) x ∈ int(Ω) . (2) ∂u x ∈ ∂Ω ∂n (x) = 0 Euler-Lagrange equations are in general necessary conditions for a local minimum/maximum. However, last time we have shown that E is convex and thus, we know that if there exists a solution to the EulerLagrange equation then it must be a global minimizer. Today, we are going to understand if a solution to the Euler-Lagrange equations exists, and if so, we are going to develop algorithms to numerically solve (2).
0.1
Analytic Solution to the Euler-Lagrange Equation
The left hand side of (2) is a linear operator acting on u: indeed, we have that u − α∆u = (Id − α∆)u
(3)
where Id : U → U is the identity map, and so if u1 , u2 ∈ U, then (Id − α∆)(u1 + u2 ) = Id(u1 + u2 ) − α∆(u1 + u2 ) = u1 + u2 − α u1 − α
n X ∂ 2 u1 i=1
∂x2i
(x) + u2 − α
n X ∂ 2 u2 i=1
∂x2i
n X ∂ 2 (u1 + u2 ) i=1
∂x2i
(x) =
(x) = (Id − α∆)u1 + (Id − α∆)u2 ,
which implies linearity of the operator. Let us define for convenience the (linear) operator L : U → U:
Then we have the PDE:
L = Id − α∆.
(4)
( (Lu)(x) = I(x) x ∈ int(Ω) ∂u x ∈ ∂Ω ∂n = 0
(5)
1
Let us now try to solve the above PDE. We define a kernel on the Ω; the kernel is a map K : Ω × Ω → R. The arguments of K are going to be x and y, respectively. Let us now apply the Divergence Theorem that we have seen in the last lecture to the vector field U (y) = K(x, y)∇y u(y) − ∇y K(x, y)u(y) defined for each x ∈ Ω : Z Z div [K(x, y)∇y u(y) − ∇y K(x, y)u(y)] dy; (K(x, y)∇y u(y) − ∇y K(x, y)u(y)) · N (y) dS(y) = Ω
∂Ω
(6) note that K and u must be smooth, i.e., have continuous partials, for the above Divergence Theorem to hold. Also note that the above divergence is taken with respect to y. We see that div [K(x, y)∇y u(y)] = ∇y K(x, y) · ∇y u(y) + K(x, y)∆y u(y)
(7)
div [∇y K(x, y)u(y)] = ∆y K(x, y)u(y) + ∇y K(x, y) · ∇y u(y).
(8)
Thus (6) becomes Z Z ∂u ∂K K(x, y) (y) − (x, y)u(y) dS(y) = (K(x, y)∆y u(y) − ∆y K(x, y)u(y)) dy. (9) ∂n ∂n ∂Ω Ω R R Now multiplying the above equation by α and adding 0 = Ω K(x, y)u(y) dy − Ω K(x, y)u(y) dy to the right hand side, we have: ∂K ∂u (x, y)u(y) dS(y) = α K(x, y) (y) − ∂n ∂n ∂Ω Z [K(x, y)(u(y) − α∆u(y)) − u(y)(K(x, y) − α∆y K(x, y))] dy = Ω Z [K(x, y)Lu(y) − u(y)Ly K(x, y)] dy (10) Z
Ω
Let us be a little informal (but later we will see that this informal argument can be justified) : suppose that we can find a kernel K such that the following holds : ( Ly K(x, y) = δ(x − y) for all x, y ∈ int(Ω) (11) ∂K for x ∈ int(Ω), y ∈ ∂Ω ∂n (x, y) = 0 where δ indicates the (two-dimensional) Dirac delta function. Also, suppose u solves (5). Then (10) becomes Z 0= K(x, y)I(y) dy − u(x); (12) Ω
that is
Z u(x) =
Ω
K(x, y)I(y) dy;
(13)
K is known as the Green’s function for L, and we sometimes write (with abuse of notation) K = L−1 . Remark 1. Recall from signal processing, that if we have a linear system that is also shift-invariant (sometimes known as time-invariant), then you may recall that the output of the system is the input signal convolved
2
with the impulse response. In other words, if x : R → R is the input signal, and if h : R → R is the impulse response of a linear shift invariant system, S, i.e., S(h)(t) = δ(t), then Z h(t − τ )x(τ ) dτ = (h ∗ x)(t). (14) S(x)(t) = R
We can verify this by writing
Z x(t) =
R
x(τ )δ(t − τ ) dτ ;
(15)
now applying S to both sides of the above equation and using linearity and shift-invariance of S, we have Z Z x(τ )h(t − τ ) dτ. (16) x(τ )Sδ(t − τ ) dτ = Sx(t) = R
R
Note the resemblance of the formula to (13). Because in general L is not shift-invariant (since Ω is in general not shift-invariant), the kernel will not depend on the difference x − y and so it cannot be in general written as a convolution. In the case Ω = R2 , though, L is shift-invariant and thus, K depends on x − y, and the result is that u is the convolution of K with the image! Remark 2. You may be wondering why we studied all this mathematics of the Bayesian approach, MAP estimation, calculus of variations, and PDE, and then in the end we arrive at result that is well known in signal processing and to all of you : to reduce the noise, convolve the image with a low-pass filter! Why the need for talking about models, noise and prior statistics, etc... ? The answer is that the whole framework we have developed now allows us to make better assumptions about the image formation process, the noise statistics, and the prior class of denoised images, and then we have the techniques already developed will allow us to develop algorithms in the case of different assumptions. As you can imagine, the assumptions we have already made are not very good! As we shall soon see in the subsequent lectures, when we choose different priors on the class of denoised images or the different noise statistics, etc.. we will see that we do not get linear systems, and in particular, the methods of filtering developed in signal processing are simply inadequate! We now need to calculate the Green’s function for the operator L defined above. Unfortunately, there is no closed form solution for K (at least that I know of) for general Ω or even when Ω = R2 . To get some idea of how these kernels look like, we derive the Green’s function for the case when α is large and Ω = Rn . Note that when α is large then Lu(x) ≈ −α∆u(x). Because, Ω = Rn is shift invariant, we know that the kernel will depend only on the difference x − y. Therefore, we solve −α∆K(x) = δ(x).
(17)
Note that ∆ is rotationally invariant, i.e., if K solves the above then K ◦R(x) = K(Rx) (where RT R = Idn is the n × n identitiy matrix) is also a solution to the above. Therefore, K should be radially symmetric, i.e., K(x) = ν(|x|) for some function ν : R → R. Computing the gradient of K, we have that ∇K(x) = ∇ν(|x|) = ν(|x|) and
x |x|
(18)
X n n X ∂ xi x2 |x| − x2i /|x| n−1 0 ∆K(x) = ν(|x|) = ν 00 (|x|) i2 +ν 0 (|x|) = ν 00 (|x|)+ ν (|x|). (19) 2 ∂xi |x| |x| |x| |x| i=1
i=1
3
For x 6= 0, we solve ∆K(x) = 0, that is ν 00 (|x|) +
n−1 0 ν (|x|) = 0 |x|
(20)
which is equivalent to n−1 C d log ν 0 (r) = − =⇒ log ν 0 (r) = −(n − 1) log r + log C =⇒ ν 0 (r) = n−1 dr r r where r = |x| and C > 0 is some constant. Integrating again, we find that ( C log r + B n = 2 ν(r) = C +B n ≥ 2. rn−2 Using the above reasoning, we choose K to be ( 1 log |x| − 2π K(x) = 1
1
n(n−2)a(n) |x|n−2
n=2 n≥3
(21)
(22)
(23)
where a(n) is the volume of the unit ball B1 (0) = {x ∈ Rn : |x| ≤ 1} in Rn . The above is know as the fundamental solution of the Laplace equation. We now prove the following theorem that is taken from [1] : Theorem 1. Let I : R2 → R be an image and further assume that I ∈ C 2 (R2 , R) and I and its derivatives are nonzero only on Ω, which bounded and a closed set (we often write I ∈ Cc2 (R2 , R)). Then Z 1 u(x) = − K(x − y)I(y) dy (24) α R2 solves −α∆u = I. Proof. In class; see handout! Note that showing the existence of the equations (5) for second order operators requires the uses of the theory of Sobolev spaces, and that is out of the scope of this course. But, indeed the solution of (2) exists. The kernel formulation of the solution of the PDE is not particularly useful for numerical implementation since the kernel is typically singular (i.e., it blows up at the origin), and thus we develop numeric solutions directly from the PDE. However, the kernel formulation aids in our understanding of the solution of the PDE.
References [1] Lawrence C. Evans. Partial Differential Equations, volume 19 of Graduate Studies in Mathematics. American Mathematical Society, 1997.
4