EE 396: Lecture 8 Ganesh Sundaramoorthi March 8, 2011
In the past lectures, we have constructed a framework for which we can construct any denoising algorithm. We saw that the denoising problem is ill-posed, and cannot be solved without knowing • how the image is produced, • how the noise is generated, • and class of possible true images. In the last lectures, we’ve assumed an • affine camera image genratation, • additive noise model with Gaussian iid (spatially) noise, • and we’ve explored two different priors on the class of true images both coming from the observation that images are composed mainly of small patches that have near constant intensity, and therefore most of the derivatives in the image are near zero. We started with a normal iid assumption on image derivatives, but we later saw that by empirical evidence, that image derivatives are more closely distributed by a Laplacian distribution, but that is still not perfect, which led to TV denoising. Our results for TV indicate that even if we have perfect iid Gaussian noise (that is our image noise formation model is accurate), we still did not get a denoised result that appeared “natural,” although TV denoisng better preserved edges than the Gaussian prior assumption. Now, we could try to adjust our prior to get closer to the empirical image derivative distribution, but I claim that you would not do much better than the TV results. Indeed if you were to generate a random sample from our assumed prior distribution, Z 1 p(u) ∝ exp − 2 |∇u(x)| dx , (1) 2σp Ω then a typical sample would look like Figure 1. Indeed, a typical sample from that distribution, seems to lack the realism of natural images. Indeed, if we are to get more realistic denoised images, then we must incorporate higher level information of natural scenes. We explore one such high level information today. In our TV model, we have assumed that ux1 (x), ux2 (x) ∼ Lap(0, λ) are iid x ∈ Ω. Discretizing this, we have that u(x + ∆x) − u(x) ∼ Lap(0, λ∆x), (2) which shows that nearby pixels are correlated, and also distance pixels show some correlation, but the variance grows as the pixels are further away. 1
Figure 1: A random image sampled from our prior distribution model of iid Laplacian image derivatives. Image taken from [2]. We know that in a realistic image, there are correlations in pixels even when they are distance from each other. Indeed, we now make the assumption of image self-similarity. That is, an image is composed of small number of patches that are repeated. This observation is the basis of many works in computer vision. Consider the examples in natural images: • Trees are composed of many leaves that are similar to each other. • The classroom is composed of many chairs that are nearly identical. • A building is composed of tiles that are similar and repeated. We are going to incorporate this information into our prior. We shall generalize (2) so that even distant pixels can be correlated. Indeed, we assume that u(x) − u(y) ∼ N (0, σ(x, y))
(3)
where σ : Ω → R+ is a spatially varying standard deviation. The idea is to construct σ so that if the patches of the image, which we take to be the image restricted to the balls, BR (x) = {z ∈ Ω : |z − x| < R} and BR (y) = {z ∈ Ω : |z − y| < R}, around x and y are visually similar, then x and y should be correlated (σ low) 1 . We shall defer the actual definition of σ til later. We consider {u(x)}x∈Ω as a random process and assume that • u(x) − u(y) ∼ N (0, σ(x, y)) as before • u(x1 ) − u(y1 ) and u(x2 ) − u(y2 ) are independent provided x1 6= x2 or y1 6= y2 . Now if x1 and x2 belong to a patch which is similar to a patch which contains y1 and y2 , then the assumption is not really true, but nevertheless for computational tractibility, we make this assumption. 1
Note this does not really give a way to generate a sample from the prior distribution since to generate a sample from the distribution, one would need to know σ, but that is defined in terms of the image u itself - so this seems to be a recursive definition! However, given an image, we can still calculate the prior probability.
2
The two assumptions above resemble some generalization of the classical Wiener Process that is useful in many fields including electrical engineering and financial data prediction. Using the above assumptions, we can calculate the probabilty ! n N Y X (u(xi ) − u(yi ))2 p(u(x1 ) − u(y1 ), . . . , u(xn ) − u(yn )) = (4) p(u(xi ) − u(yi )) ∝ exp − 2σ(xi , yi )2 i=1
i=1
where {(xi , yi )}ni=1 are distinct pairs of points, and we have use the independence assumption of distinct pairs. Applying a limiting argument, we have that Z Z (u(x) − u(y))2 dx dy , (5) p(u) ∝ exp − 2σ(x, y)2 Ω Ω where note that the double integral arises because we must sum over all pairs of points (x, y) ∈ Ω × Ω. Still assuming the Gaussian noise model, we calculate the MAP estimate, and this requires that we minimize the energy Z Z Z 1 (u(x) − u(y))2 2 E(u) = 2 (I(x) − u(x)) dx + dx dy; (6) 2σn Ω 2σ(x, y)2 Ω Ω choosing w(x, y) = 1/σ(x, y)2 , and α > 0, we can equivalently minimize the energy Z Z Z 2 E(u) = (I(x) − u(x)) dx + α w(x, y)(u(x) − u(y))2 dx dy, Ω
Ω
(7)
Ω
where we will assume that • w is symmetric (w(x, y) = w(y, x) for all x, y ∈ Ω) R • Ω w(x, y) dx = 1 for all y ∈ Ω • w(x, y) ≥ 0, which makes sense since w = 1/(2σ 2 ). We note that the function above does not fit into the form Z F (u(x), ∇u(x), x) dx,
(8)
Ω
and thus we cannot use our earlier formulas to derive the Euler-Lagrange equations and check convexity, we R will have to do it directly from the definitions. To check convexity, note that Ω (I − u)2 dx is convex in u, and therefore, it remains to check that the second term is convex. To see this, note that [(1 − t)u1 (x) + tu2 (x) − ((1 − t)u1 (y) + tu2 (y))]2 = [(1 − t)(u1 (x) − u1 (y)) + t(u2 (x) − u2 (y))]2 (9) ≤ (1 − t)(u1 (x) − u1 (y))2 + t(u2 (x) − u2 (y))2 (10) by applying convexity of the function f (x) = x2 . Since w(x, y) ≥ 0, we can multiply w(x, y) on both sides and preserve the inequality, and then integrating over Ω × Ω on both sides, shows that the second term of the energy is convex. Therefore, E is convex. 3
Let us now compute the Euler-Lagrange equations of E, which we do directly from definition : d dE(u) · v = E(u + tv) dt t=0 Z d 2 =α (I(x) − u(x) − tv(x)) dx Ω dt t=0 Z Z d 2 + dx dy w(x, y)(u(x) + tv(x) − u(y) − tv(y)) Ω Ω dt t=0 Z Z Z 2w(x, y)(u(x) − u(y))(v(x) − v(y)) dx dy 2(u(x) − I(x))v(x) dx + = Ω Ω Z Z ZΩ w(x, y)(u(x) − u(y)) dy v(x) dx 2(u(x) − I(x))v(x) dx + 4α = Ω Ω Ω Z Z Z w(x, y)u(y) dy v(x) dx 2(u(x) − I(x))v(x) dx + 4α u(x)v(x) − = Ω
Ω
(11) (12) (13) (14) (15) (16)
Ω
where R we have used symmetry of w, Fubini’s Theorem to interchange order of integration, and the property that Ω w(x, y) dx = 1. Hence the Euler-Lagrange equation is Z ∇E(u)(x) = (1 + 2α)u(x) − I(x) − 2α w(x, y)u(y) dy = 0, (17) Ω
and therefore, 1 2α u(x) = I(x) + 1 + 2α 1 + 2α
Z w(x, y)u(y) dy,
(18)
Ω
and when the regularization α is chosen large, we have that Z u(x) = w(x, y)u(y) dy,
(19)
Ω
and this produces what is known as the Non-Local Means algorithm [1]. Indeed, the authors choose the denoised image as Z u(x) = w(x, y)I(y) dy. (20) Ω
We are left to specify the choice of w, which is chosen to be 1 1 exp − 2 w(x, y) = C(x) h
!
Z
|I(x + z) − I(y + z)|2 dz
(21)
BR (0)
where h and R are parameters of the algorithm, and Z
1 C(x) = exp − 2 h Ω
!
Z
2
|I(x + z) − I(y + z)| dz
dy,
(22)
BR (0)
note that the similarity measure is simply the L2 distance between patches. Note that I is a noisy image, and to combat noise in the patch estimates, one can use I ∗ G (where G is a Gaussian filter) in the definition of w instead. 4
References [1] A. Buades, B. Coll, and J.M. Morel. A non-local algorithm for image denoising. In Computer Vision and Pattern Recognition, 2005. CVPR 2005. IEEE Computer Society Conference on, volume 2, pages 60–65. IEEE, 2005. [2] B.A. Olshausen and D.J. Field. Vision and the coding of natural images. American Scientist, 88:238, 2000.
5