L1 Total Variation Primal-Dual Active Set Method with Conjugate Gradients for Image Denoising Marrick Neri Institute of Mathematics University of the Philippines Diliman ABSTRACT The L1 TV-PDA method developed by Neri [9] to solve a regularization of the L1 TV model is effective in removing salt and pepper noise and random valued noise. However, despite the convergent behavior of L1 TV-PDA, computer memory limitations prevented the method from denoising images with dimensions n×n, where n is much greater than 28 . This problem is due to the cost of obtaining the inverse of a Hessian-type matrix to solve a linear system. This research aims to show that with the help of numerical methods such as the conjugate gradient method (CGM), L1 TV-PDA can denoise images with higher dimensions, thereby showing that the method is scale independent. The paper ends with a numerical study.
1.
INTRODUCTION
Given an observed noisy image d in a domain Ω ⊂ R2 , a mathematical model to denoise d is (P)
min
Z
s∈BV (Ω) Ω
|s − d| dx + α
Z
Ω
|∇s| dx.
Problem P is a convex L1 total variation model that is effective in denoising images with outlier or impulse noise [10]. The first integral term is the fidelity term, which retains essential features of the image, and the second is the total variation (TV) term, which is effective in removing noise. Chan and Esedo¯glu [3] showed that P is contrast invariant in the sense that if s(x) is a solution for the observed image d(x), then cs(x) is a solution for the image cd(x) for c > 0. Moreover, the regularization process using the L1 model has more dependence on the shapes of image features than on their contrast. The
authors of [3] observed that small features in the image maintain their contrast even as α is increased, maintaining good contrast until, at some threshold for α , these features suddenly disappear. However, the L1 -problem is nondifferentiable in both integral terms. Direct methods like the gradient method and the Newton method cannot be applied. Also, the problem is not strictly convex, thus allowing nonunique solutions. An approach to obtain a unique solution is to globally regularize both the fidelity and TV terms with a small positive parameter ε (for instance, see [3]). The resulting functional is globally smooth and the regularized L1 model admits a unique solution. With the global regularization, however, comes the weakening of the distinct property of the L1 model to recover noise-free pixels. A way to circumvent the nondifferentiability issues of a convex function is by exploiting the primal and dual forms of the function. This approach was implemented in [4], [2], and [8], to mention a few. In [7], primal-dual active set methods were shown to be semismooth Newton methods, with superlinear convergence rate. Here, we utilize the approach in [7] to the L1 problem. A primal-dual active set algorithm, the L1 TV-PDA was developed in [9], and this method was shown to be convergent. However, the method could only be applied on small-sized images (e.g., 512 × 512) since the implemented MATLAB code utilized the backslash function to solve a linear system. Specifically, to solve for x in Ax = b, where A is positive definite, the solution is x = A/b. This function can be costly in memory usage whenever n is large. The conjugate gradient method (CGM) is a popular tool in obtaining solutions to symmetric positive definite systems. The method generates a sequence of search directions in updating iterates and residuals corresponding to the iterates. The highlight of this method is that although the length of the sequences are long, only a small number of vectors needs to be kept in memory.
Moreover, the method converges to the solution in at most m iterates, where m is the length of the vector solution. These properties of CGM make it recommendable to solve the linear system imbedded in the L1 TV-PDA. This research utilizes CGM on the L1 TV-PDA and investigates the capability of the resulting method on denoising large scale images with salt-and-pepper noise and random-valued noise. In the next section, we look into the dual function of P. Moreover, we regularize P and derive the corresponding dual. We also discuss optimality conditions on the solutions to the problems involved. In section 3, we present the L1 TV-PDA method. The conjugate gradient method (CGM) as it is ised in L1 TV-PDA is discussed in section 4. The paper ends with a numerical study.
2. DUAL OF MODEL P Let h·, ·iH = h·, ·iL1 ,L∞ . In [9], the Fenchel dual of P was obtained as the following problem: (P∗ )
min hd, −div piH .
p∈H(div)
s. t. |div p|∞ |p|∞
≤ 1, ≤ α.
(1) (2)
Combining the equations in (3) and (4) we get max{λ , s¯ϕ − d }div p¯ϕ − s¯ϕ + d = 0.
Likewise, (5) and (6) may be rewritten as max{γ , ∇s¯ϕ } p¯ϕ − α ∇s¯ϕ = 0.
In [9], the author proposed solving regularized versions of the primal P and its dual P∗ . For positive parameters λ and γ , a regularization of P∗ is (P∗ϕ )
min
p ∈ L2 (Ω)2 |p| ≤ α |div p|∞ ≤ 1
γ 2 2 kpkL2
+ hd, −div pi + 2λα kdiv pk2L2 .
where the subscript ϕ represents a given λ γ -pair. The dual of (Pϕ∗ ) is (Pϕ )
min
Z
s∈H10 (Ω) Ω
Ψ(s) + α
Z
Ω
3.
THE L1 TV-PDA METHOD
Consider a discretization of the image d into an n × n pixel-square. We stack the columns of d to re-form it as a vector in RN , where N = n2 . In this discrete setting, let s, sϕ ∈ RN and let p, pϕ ∈ R2N . The discrete form of Problem (P) is given by N
N
min
∑ |si − di | + α ∑ ((Dx s)i )2 + ((Dy s)i )2 )
s∈RN i=1
1 2
,
i=1
where Dx and Dy denote the discrete derivatives in the x and y directions, respectively. The discrete version of the regularized model (Pϕ ) is N
N
∑ Ψh (s)i + α ∑ Φh (∇h s)i .
s∈RN i=1
i=1
where ( |si − di | − λ2 Ψh (s)i = 1 |s − di |2 2λ i
if |si − di | ≥ λ , and if |si − di | < λ
( |∇h s|i − 2γ Φh (∇h s) = 1 2 2γ |∇h s|i
if |∇h s|i ≥ γ if |∇h s|i < γ .
The optimality conditions (7) and (8) are discretized as follows: max{λ , s¯ϕ − d } ⋆ divh p¯ϕ − s¯ϕ + d = 0 (9) max{γ , ∇h s¯ϕ } ⋆ p¯ϕ − α ∇h s¯ϕ = 0,
where ⋆ denotes the Hadamard product, i.e., for u, v ∈ Rm , (u ⋆ v)i = ui vi for i = 1, . . . , m.
Φ(∇s) dx.
For any λ , γ > 0 a solution pair (s¯ϕ , p¯ϕ ) for problems Pϕ and Pϕ∗ should satisfy s¯ϕ − d div p¯ϕ − s¯ϕ + d = 0, if s¯ϕ − d ≥ λ (3) λ div p¯ϕ − s¯ϕ + d = 0, if s¯ϕ − d < λ (4) and
∇s¯ϕ p¯ϕ − α ∇s¯ϕ = 0 if ∇s¯ϕ ≥ γ γ p¯ϕ − α ∇s¯ϕ = 0, if ∇s¯ϕ < γ
(8)
It was shown in [9] that in the appropriate function spaces, as λ , γ → 0, the solution of Pϕ converges strongly to the solution of P, and the solution of Pϕ∗ converges weakly to the solution of P∗ .
min
Problem P∗ is convex, but not strictly convex. Also, since the objective is only linear, the solution is not unique because the kernel of the divergence operator is not a singleton. To obtain an approximate unique solution, we smoothen problem P∗ by adding to it one or more regularizing terms.
(7)
(5) (6)
Let mλ = max{λ , |s − d|} and mγ = max{γ , |∇h s|}. Also, for w ∈ Rn , let D(w) be the n × n diagonal matrix whose ith diagonal entry is wi . Using the system in (9), we formulate a Newton step at the approximate solution (sk , pk ): Ak D(mkλ )∇⊤ δs h = δp −Bk ∇h D(mkγ )
k k −D(mλk )∇⊤ h p −s +d k k −D(mγ )p + α ∇h sk
(10)
are feasibility conditions on the dual variable. When pk is feasible and Bk is positive semidefinite, we obtain the following lemma.
where Ak
= =
Bk
k k IN − D(−∇⊤ h p ) χAλk J( s − d ), k k α I2N − χAk D(p )J( ∇h s ), γ
and J(F) denotes the Jacobian of the functional F. The characteristic matrices χAk and χAk are defined as χAk = γ
λ
λ
D(tλk ) ∈ RN×N with (tλk )i :=
1 if |s − d|i ≥ λ 0 else,
χAk = D(tγk ) ∈ R2N×2N γ
and
(tγk )i :=
1 0
P ROOF. See [9] for proof.
with if |∇h s|i ≥ γ else.
Clearly, D(mkλ ) is invertible, and scaling (10) by D(mkλ )−1 we get D(mkλ )−1 Ak ∇⊤ δs h = δp −Bk ∇h D(mkγ )
k k −1 k −∇⊤ h p − D(mλ ) (s − d) k k −D(mγ )p + α ∇h sk
.
(11)
Since D(mγk ) is invertible, we can solve for δ p ,
δ p = −pk + D(mkγ )−1 (α ∇h sk + Bk ∇h δs ).
(12)
Equation (12) gives us pk+1 := pk + δ p = D(mkγ )−1 (α ∇h sk + Bk ∇h δs ). (13) The remaining equation in (11) for δs is written as where we define H k and gk by
k
g
k −1 k = D(mλk )−1 Ak + ∇⊤ h D(mγ ) B ∇h
=
k −1 k −D(mλk )−1 (sk − d) − α ∇⊤ h D(mγ ) ∇h s .
The matrix H k is in general not symmetric because the last term is not. However it was shown in [8] that the matrix Bk is symmetric at the solution (sk , pk ) = (s, ¯ p). ¯ In fact, Bk is positive semidefinite under the following two conditions, for i = 1, . . . , 2N, |pki | ≤ α where k a ki bi k ci dik
= = = =
and
(14)
(bki + cki )2 ≤ 4aki dik ,
(15)
α − pki (∇h sk )i /(|∇h sk |)i −pki (∇h sk )i /(|∇h sk |)i −pki+N (∇h sk )i /(|∇h sk |)i α − pki+N (∇h sk )i /(|∇h sk |)i .
Condition (14) and the dual constraint (divh p) j ≤ 1, for j ∈ {1, . . . , N}
Whenever conditions (14) and (17) are not satisfied, we apply the projection of pk to the feasible region. For indices i which violate (14), we replace (pki , pki+N ) by βik (pki , pki+N ) where
βik :=
α . max{α , |pki |}
If in addition condition (17) is not met at some indices j, we multiply pkj by some scalar ν kj . For instance, we can take 1 k , ν kj := min β , δk j with δ k := maxi∈{1,...,N} {1, |(divh pk )i |} and replace pkj with ν kj pkj . This projection makes pk feasible and transforms the matrix H k to a positive semidefinite matrix k. H+ Whenever condition (15) is not satisfied, a dampening of entries b˜ ki and c˜ki by q µik = 2 a˜ki b˜ ki |b˜ ki + c˜ki |
H k δs = gk
Hk
L EMMA 1. Suppose the conditions in (14), (15), and (17) hold for all i ∈ {1, . . . , 2N}, j ∈ {1, . . . , N}, and k ∈ N. Then for all k ∈ N, the matrix H k is positive semidefinite.
(16)
is made, i.e., b˜ ki → µik b˜ ki and b˜ ki → µik b˜ ki . Here the entries ˜ c, a, ˜ b, ˜ d˜ are the variables in (16) after applying projection on p [8]. k which These modifications result in a matrix H+ will replace H k . We obtain a result similar to Lemma 3.4 in [8]. The proof is similar to the proof of Lemma 1.
k is positive semidefinite. L EMMA 2. The matrix H+
Although positive semidefiniteness does not guark , in our numerical implemenantee nonsingularity of H+ tations, the proposed active set method did not encounter any problem with ill-conditioning.
3.0.1 (17)
Proposed algorithm
We now propose an approximate generalized Newton method for solving the system of equations (9). This algorithm makes use of a primal-dual active set method.
Note that in step 4 of the algorithm, since there is no k is positive definite. Thus the guarantee that H k or H+ solution δs may not be unique.
on the development and convergence of CGM, see [1], [5], and [12]. We write here the algorithm of conjugate gradients as it is presented in [5].
Algorithm L1 TV-PDA
Algorithm CGM
1. Initialize (s0 , p0 ) ∈ RN × R2N . Set k = 0. 2. Determine the active and inactive sets, i.e., evaluate χAk and χAk . λ
If Q ∈ Rnn is symmetric positive definite, b ∈ Rn , and x0 ∈ Rn is an initial guess, then the following algorithm computes x ∈ Rn so that Qx = b.
γ
1. k = 0
3. If (14), (15), and (17) are not satisfied for all i = k . Otherwise, H k = H k . 1, . . . , 2N, compute H+ +
2. r0 = b − Qx0
4. For δs , solve
3. while rk 6= 0 k H+ δs
k
=g
(18)
k = k+1 If k = 1
5. Update pk+1 according to (13), sk+1 := sk + δs .
d1 = r0
6. Stop, or set k := k + 1 and return to step 2.
else ⊤ r ⊤ βk = rk−1 k−1 /rk−2 rk−2 dk = rk−1 + βk dk−1
In the numerical implementations in [9], to obtain δs the backslash (\) function in MATLAB was used, i.e., δs = k \gk . This function was able to work well when the H+ image size is up to 512 × 512, and even slightly higher. However, the method encounters a problem when the size is large, for instance, 1024 × 1024. An effective method in solving large scale linear systems at less memory costs is the conjugate gradient method. This research adapted the CGM to generate the step δs .
end ⊤ r ⊤ αk = rk−1 k−1 /dk Qdk
xk = xk−1 + αk dk rk = rk−1 − αk Qdk 4. end 5. x = xk
4. CGM The conjugate gradient method was developed and popularized by M. R. Hestenes and E. Stiefel [6]. This method finds the solution to the nonsingular linear system Qx = b.
We modify L1 TV-PDA to implement the CGM in solving for δs in (18). From Lemma 6.6 in [9], the matrix k is (at least) positive semidefinite. In our computer H+ runs, no problem with ill-posed matrices occured.
(19)
When Q is symmetric positive definite, the solution x¯ ∈ Rn to the system (19) is also a solution to the quadratic problem 1 (20) min f (x) = x′ Qx − b′ x. 2 Given no rounding-off error occurs, the method obtains x¯ in at most n iterations. Using a Gram-Schmidt procedure, the CGM generates Q-conjugate directions from the gradient vectors. Recall that vectors di and d j are Qconjugate if di′ Qd j = 0. Successive solution iterates of CGM minimize f over a progressively expanding linear manifold. Eventually, after at most n-iterates, the CGM obtains a solution which minimizes f over Rn . When the method is stopped early, the algorithm still gives a useful approximate solution. For more detailed discussions
With CGM, we were able to reconstruct an image with dimensions up to 1024 × 1024. This was not possible with the original L1 TV-PDA which utilizes the backslash function in MATLAB.
5.
NUMERICAL RESULTS
In the utility of the CGM, we used a tolerance of 10−2 and a maximum iteration count of 1000. Since CGM guarantees a global solution when the number of iterations is at most n2 , the output δs of CGM is an approximation of the global solution. Note that the number of inner iterations in the CGM subroutine is small compared with the maximum possible number of iterations, which is 10242 . Initial solution to the CGM is the zero vector. The algorithm was implemented using MATLAB in a 1.60 Hz dual processor computer with 1.12GB of RAM. An attempt to use preconditioned CGM using
a large scale image, the time spent doing so was prohibitively large. This author tried to use a lesser number of maximum iterations, e.g., 700 CGM iterations, but the CGM did not return a good approximate solution and the output image retained most of the noise. The slow convergence may be due to machine constraints or to a need for preconditioning. More advanced numerical methods such as biconjugate gradient stabilized algorithm are recommended to obtain faster convergence [11]. Figure 1: Test image ⊤
k H k as preconditioning matrix was not further purH+ + sued, for the moment, due to a step in the method where a scaled vector zk is computed that solves Mzk = rk , M is a positive definite preconditioning matrix. Direct solution would require an embedded CGM, which may result in a huge amount of computation time on the over-all.
Our test image, figure 1, is a 1024 × 1024 blocky image with several grayscale levels. There were attempts to use higher dimensions such as 1300 × 1300, but the machine ran out of memory. We add noise to the image before using L1 TV-PDA with CGM. Figure 2(a) is the image with 20% salt and pepper noise, i.e., 20% of the pixels are assigned a 0 or a 1. The range of grayscale values is scaled to [0,1]. In this example, the method was stopped at seven outer iterations because the reconstruction then was visually good enough (see figure 3(a)). The image with 20% random-valued noise is figure 2(b). The assignment of noise is done by selecting at random 20% of the pixels, and then assigning a random number between 0 and 1 to this pixel. As with the saltand-pepper noised image, the method returned a good reconstruction, figure 3(b), after seven iterations. Table 1 contains some statistics on the computational output. KKT stands for KKT residual which is the sum of the squares of the left hand sides of (9) evaluated at the current primal and dual iterate solutions. The residual of the reconstruction with respect to the known clean image is sres , while ctime is running time in seconds and iter is number of outer iterations. By S&P and RV are meant salt-and-pepper noise and random-valued noise, respectively. Table 1: Results for the L1 TV model on a 1024×1024 image example with 20% noise.
With S&P With RV
iter 7 7
KKT 266.18 112.13
sres 15.44 11.82
ctime 7340 7090
Despite the ability of L1 TV-PDA with CGM to solve
(a) With 20% salt and pepper noise
(b) With 20% random valued noise Figure 2: Noisy images
6.
REFERENCES
[1] D. Bertsekas. Nonlinear programming. Athena Scientific, 1995. [2] A. Chambolle. An algorithm for total variation minimization and applications. Journal of Mathematical Imaging and Vision, 20:89–97, 2004. [3] T. Chan and S. Esedo¯glu. Aspects of total variation regularized l 1 function approximation. SIAM Journal on Applied Mathematics, 65(5):1817–1837, 2005. [4] T. Chan, G. Golub, and P. Mulet. A nonlinear primal-dual method for total variation-based image restoration. SIAM Journal on Scientific Computing, 20:1964–1977, 1999. [5] G. Golub and C. Van Loan. Matrix Computations. The John Hopkins University Press, 3rd edition, 1996. [6] M. Hestenes and E. Stiefel. Methods of conjugate gradients for solving linear systems. Journal of Research of the National Bureau of Standards, 49(6):409–436, 1952.
(a) Salt and pepper denoised image
(b) Random valued denoised image Figure 3: Reconstructed images [7] M. Hintermüller, K. Ito, and K. Kunisch. A primal-dual active set strategy as a semismooth newton method. SIAM Journal on Optimization, 13(3):865–888, 2003. [8] M. Hintermüller and G. Stadler. An infeasible primal-dual algorithm for total bounded variation-based inf-convolution-type image restoration. SIAM Journal on Scientific Computing, 28(1):1–23, 2006. [9] M. Neri. Primal-dual methods in total variation regularized L1 and L2 functionals arising in image denoising. PhD thesis, University of the Philippines Diliman, 2008. [10] M. Nikolova. Minimizers of cost-functions involving nonsmooth data-fidelity terms. application to the processing of outliers. SIAM Journal on Numerical Analysis, 40(3):965–994, 2002. [11] Y. Saad. Iterative methods for sparse linear systems. SIAM, Philadelphia, 2nd edition, 2003. [12] L. Trefethen and D. Bau III. Numerical Linear Algebra. SIAM, 1997.