Super Greedy Type Algorithms and Applications in ...

Viewer
Transcript

Super Greedy Type Algorithms and Applications in Compressed Sensing By Entao Liu

Bachelor of Science Shandong University, 2005

Submitted in Partial Fulfillment of the Requirements for the Degree of Doctor of Philosphy in Mathematics College of Arts and Sciences University of South Carolina 2011 Accepted by:

Vladimir Temlyakov Major Professor

Colin Bennett Committee Member

Daniel Dix Committee Member

Vladimir Gudkov Committee Member

Pencho Petrushev Committee Member

Tim Mousseau Interim Dean of the Graduate School

c Copyright by Entao Liu, 2011

All Rights Reserved.

ii

Dedication This work is dedicated to my wife, Xi Chen, who is the motivation for me to work for a better future. Also this is dedicated to my parents, Feng-Shan Liu and Gui-Zhen Cheng, who have been the mainstay of my life.

iii

Acknowledgments I am heartily grateful to my academic advisor, Professor Vladimir N. Temlyakov, whose encouragement, guidance and support from the beginning of my doctoral candidacy to today enabled me to complete this dissertation. Next, I would like to acknowledge Professors Colin Bennett, Daniel Dix, Vladimir Gudkov, and Pencho Petrushev being on my advisory committee. In addition, I appreciate Professor George F. McNulty’s help with the LATEX format for the dissertation.

iv

Abstract In this manuscript we study greedy-type algorithms such that at a greedy step we pick several dictionary elements contrary to a single dictionary element in standard greedytype algorithms. We call such greedy algorithms super greedy type algorithms. In the general setting, we propose several new greedy algorithms which are Super Greedy Algorithm (SGA), Orthogonal Super Greedy Algorithm (OSGA), and Orthogonal Super Greedy Algorithm with Thresholding (OSGAT) as well as their weak versions. The central question to be studied is what, if any, are the advantages of super greedy type algorithms over the standard greedy type algorithms. This question is answered by studying their performance (rate of convergence) under M -coherent dictionaries. Some new phenomena are found. For instance, Orthogonal Super Greedy Algorithm has the same convergence rate compared to Orthogonal Greedy Algorithm (OGA) with respect to incoherent dictionaries. However, OSGA is computationally simpler than the standard Orthogonal Greedy Algorithm. The greedy approximation is already in serious numerical use, such as image/video processing, solution of operator equations, and etc. For instance, greedy approximation serves as one of the fundamental tools in sparse signal recovery. Using the supergreedy idea, we build new recovery algorithms in Compressed Sensing (CS) which are Orthogonal Multi Matching Pursuit (OMMP) and Orthogonal Multi Matching Pursuit with Thresholding Pruning (OMMPTP). The performances of there two algorithms are analyzed under Restricted Isometry Property (RIP) conditions.

v

Contents Dedication . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

iii

Acknowledgments . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

iv

Abstract . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

v

Chapter 1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . .

1

. . . . . . . . . . . . .

5

2.1

Introduction and Preliminaries . . . . . . . . . . . . . . . . . . . . . .

5

2.2

The Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

8

2.3

Proof of the Rate of Convergence . . . . . . . . . . . . . . . . . . . .

11

. . . . .

15

3.1

Introduction and Preliminaries . . . . . . . . . . . . . . . . . . . . . .

15

3.2

The Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

17

3.3

Proof of the Rate of Convergence . . . . . . . . . . . . . . . . . . . .

20

Chapter 2 Weak Super Greedy Algorithm

Chapter 3 Weak Orthogonal Super Greedy Algorithm

Chapter 4 Weak Orthogonal Super Greedy Algorithm with Thresholding . . . . . . . . . . . . . . . . . . . . . . . . .

25

4.1

Introduction and the Algorithm . . . . . . . . . . . . . . . . . . . . .

25

4.2

Proof of the Rate of Convergence . . . . . . . . . . . . . . . . . . . .

28

Chapter 5 Applications in Compressed Sensing . . . . . . . . . . .

30

5.1

Introduction of Compressed Sensing . . . . . . . . . . . . . . . . . . .

30

5.2

Orthogonal Matching Pursuit . . . . . . . . . . . . . . . . . . . . . .

34

vi

5.3

Orthogonal Multi Matching Pursuit . . . . . . . . . . . . . . . . . . .

38

5.4

Orthogonal Multi Matching Pursuit with Thresholding Pruning . . .

42

Bibliography . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

48

vii

Chapter 1 Introduction It is well known that nonlinear sparse representations are needed to significantly increase our ability to process (compress, denoise, etc) large data sets. Sparse representations of a function are not only a powerful analytic tool but they are widely used in many application areas such as signal/image processing and numerical computation. The key to finding a sparse representation of a target function is the concept of best m-term approximation (will be discussed shortly) of the target function with respect to a given system of functions (dictionary). The fundamental question in this regard is how to construct good methods (algorithms) of m-term approximation. It turned out that greedy algorithms provide an efficient approach to build good m-term approximants. Greedy Approximation Theory belongs to different areas of research: approximation theory, numerical analysis and compressed sensing. In the last two decades, many fundamental results of greedy approximation were developed. The major issue in greedy approximation is how we can find a representation with respect to a basis or dictionary, for a given target function in a certain space. Apparently, there is a big gap in difficulty between these two scenarios (representation with respect to a basis or a dictionary). As an illustration, we consider the expansion with respect to a given basis in Hilbert space H, and furthermore, throughout this thesis we only consider the greedy approximation in Hilbert spaces. For an orthonormal basis B := {bn }∞ n=1 of H with inner product h·, ·i, any element

1

f ∈ H has a unique expansion f∼

∞ X

hf, bn ibn ,

n=1

and the expansion converges to f in H. If we are allowed to use only m terms in the expansion, what is the best approximation with this budget? To answer this question, we define the best m-term approximation with regard to B as follows: σm (f ) := σm (f, B)H := inf kf − ck ,Λ

X

ck bk kH ,

k∈Λ

where the infimum is taken over coefficients ck and sets of indices Λ with cardinality |Λ| = m. Since we only confine the cardinality of Λ not the locations of these indices, it is a nonlinear problem to find the best m-term approximation. This means that the approximant does not come from linear spaces but rather from nonlinear manifolds. It is proven that the greedy approximation provides the ideal best m-term approximation with respect to an orthonormal basis. However, if B is an arbitrary redundant dictionary in which the orthogonality does not exist, the performance of greedy approximation is not so transparent. In this thesis we discuss the greedy approximation with regard to a redundant dictionary which has some specific features that can be used for our advantage. The specific structure of the dictionary can help either to improve rate of convergence results for known algorithms or to build more efficient algorithms with the same rate of convergence as known general algorithms. A specific feature of a dictionary – M coherence in our case – allows us to build a more efficient greedy algorithm – super greedy type algorithms – than known greedy algorithms. In the super greedy type algorithms, instead of picking only one element from the dictionary at each greedy step, we pick several good entries simultaneously. It is a natural idea that was used before, for instance, in [34]. In Chapter 2 a super greedy type algorithm is proposed which is called the Weak Super Greedy Algorithm (WSGA). We address its rate of convergence. Then a com2

parison between the Super Greedy Algorithm (SGA) and its analog, the Pure Greedy Algorithm (PGA), is given. In the computational sense, SGA is a little bit more expensive than PGA, because SGA makes an orthogonal projection in each iteration. However, there is a big difference between SGA and some other greedy algorithms we discuss in Chapter 3. Those projections SGA makes are restricted onto spans of a fixed number of elements. In fact, this fixed number is independent of the number of iterations. As a result, with regard to M -coherent dictionaries SGA sacrifices a little bit on the computation complexity at each iteration and gains a better convergence rate. In Chapter 3 we build a super greedy type algorithm which is called the Orthogonal Super Greedy Algorithm (OSGA) and its weak version, namely, the Weak Orthogonal Super Greedy Algorithm (WOSGA). We study their rates of convergence for elements from the closure of the convex hull of the symmetrized dictionary, which is standard in the theory of greedy approximation setting. It turns out that with regard to M coherent dictionaries the performance of OSGA is as good as the performance of the Orthogonal Greedy Algorithm (OGA). However, OSGA needs less iterations than OGA to obtain this rate which makes OSGA computationally simpler. It is known that a thresholding step is easier than a greedy step for numerical implementations. Motivated by this observation, in Chapter 4 we modify the super greedy idea into a popular thresholding technique. The Weak Orthogonal Super Greedy Algorithm with Thresholding (WOSGAT) is built and studied. Its rate of convergence is given for elements from the convex hull of the symmetrized dictionary, where the dictionary satisfies a more general condition than M -coherence. The last Chapter 5 is devoted to a new research area, Compressed Sensing (CS), where a family of greedy algorithms is utilized as an important reconstruction approach. Recently, CS has attracted considerable attention from both mathematical and engineering societies. We discuss the performance of the famous Orthogonal

3

Matching Pursuit (OMP) and the Orthogonal Multi Matching Pursuit (OMMP) which is proposed in this thesis. In Compressed Sensing, the space H = RN and the dictionary is finite. Therefore, instead of convergence rate we adopt a more natural criterion – the exact recovery – to discuss the performance of OMP and OMMP under Restricted Isometry Property (RIP) conditions . For many greedy recovery algorithms of Compressed Sensing, the knowledge of sparsity in advance is required. In this chapter we build a new algorithm called the Orthogonal Multi Matching Pursuit with Thresholding Pruning (OMMPTP), which requires the threshold information rather than the sparsity of the original signal to implement. Notice that we do not discuss numerical experiments, since this dissertation is mainly devoted to the theoretical results. Actually, in practice both OMMP and OMMPTP may perform much better than the theorems guarantee.

4

Chapter 2 Weak Super Greedy Algorithm 2.1

Introduction and Preliminaries

Before proceeding to the major issues, we recapitulate some notations and definitions from the theory of greedy approximation. Let H be a real Hilbert space with an inner product h·, ·i and the norm kxk := hx, xi1/2 . We say a set D of functions (elements) from H is a dictionary if each element g ∈ D has a unit norm (kgk = 1) and spanD = H. Sometimes it will be convenient for us to consider along with D the symmetrized dictionary D± := {±g, g ∈ D}. Let M (D) := sup |hϕ, ψi| ϕ6=ψ ϕ,ψ∈D

be the coherence parameter of a dictionary D. We say that a dictionary D is M coherent if M (D) ≤ M . The coherence parameter measures how far is the dictionary from an orthonormal basis, whose coherence parameter is equal to 0. For a general dictionary D we define the class of functions (elements) A01 (D, B) :=

 

X



k∈Λ

f ∈H:f =

ck gk ,

gk ∈ D, |Λ| < ∞,

X k∈Λ

|ck | ≤ B

  

and we define A1 (D, B) to be the closure (in H) of A01 (D, B). For the case B = 1, we denote A1 (D) := A1 (D, 1). Furthermore, we define A1 (D) to be the union of the classes A1 (D, B) over all B > 0. We define the norm |f |A1 (D) to be the smallest B such that f ∈ A1 (D, B). For more results, we refer the reader to [37] as a survey of the theory of greedy approximation.

5

For the purpose of getting an m-term approximant, the most natural algorithm comes to mind is the following. People can find the element in the dictionary which maximizes its inner product (in magnitude) with the target function then subtract its contribution to get a residual. In the literature, Huber [19] probably is the first one who studied this algorithm. Pure Greedy Algorithm (PGA). Initially, we define f0 := f . Then for each m ≥ 1 we inductively define: (1) ϕm ∈ D is an element of the dictionary D satisfying |hfm−1 , ϕm i| = sup |hfm−1 , gi|. g∈D

(2) Define the residual after mth iteration of the algorithm fm := fm−1 − hfm−1 , ϕm iϕm . (3) Find the approximant Gm (f ) := Gm (f, D) :=

m X

hfj−1 , ϕj iϕj .

j=1

In the above greedy step (1), we make an additional assumption that the maximizer always exists. If the dictionary is finite, obviously there exists such a maximizer. The following modification of PGA – Weak Greedy Algorithm – addresses the problem of existence of ϕm from step (1). Now we proceed to the corresponding definition of the weak version given by Temlyakov [35] in 2000. Let a natural number s and a weakness sequence τ := {tk }∞ k=1 , tk ∈ (0, 1], k = 1, 2, . . . be given. Weak Greedy Algorithm (WGA). Initially, we define f0 := f0τ = f . Then for each m ≥ 1 we inductively define: (1) ϕτm ∈ D is an element of the dictionary D satisfying the following inequality |hfm−1 , ϕτm i| ≥ tm sup |hfm−1 , gi|. g∈D

6

(2) Define the residual after mth iteration of the algorithm τ fm := fm := fm−1 − hfm−1 , ϕτm iϕτm .

(3) Find the approximant Gm (f ) :=

Gτm (f, D)

:=

m X

hfj−1 , ϕτj iϕτj .

j=1

At the greedy step (1) of WGA, one can pick some element which is close to the maximizer instead of the maximizer itself. This flexible manner guarantees the existence of ϕm providing tm ∈ (0, 1). In this thesis, we concentrate on the convergence rate of the algorithm. Consequently, we refer interested readers to a survey [37] for necessary and sufficient conditions of the convergence of WGA. Then, we recall some known results about the upper estimate of the rate of convergence of PGA and WGA with regard to general redundant dictionaries. The following theorem is proved by DeVore and Temlyakov [11] in 1996. Theorem 2.1. Let D be an arbitrary dictionary in H. Then, for each f ∈ A1 (D, B), we have kfm k = kf − Gm (f, D)k ≤ Bm−1/6 . The following theorem, proved by Temlyakov [35] in 2000, provides an upper estimate of WGA. Theorem 2.2. Let D be an arbitrary dictionary in H. Assume τ := {tk }∞ k=1 is a non-increasing sequence. Then, for f ∈ A1 (D, B), we have kf − Gτm (f, D)k ≤ B(1 +

m X

t2k )−tm /2(2+tm ) .

(2.1)

k=1

For a particular case tk = 1, k = 1, 2, . . . , this theorem coincides with the above Theorem 2.1 in the sense of order. In another particular case tk = t, for k = 1, 2, . . ., 7

(2.1) gives kf − Gtm (f, D)k ≤ B(1 + mt2 )−t/(4+2t) ,

t ∈ (0, 1].

This estimate implies the inequality kf − Gtm (f, D)k ≤ B1 (t)m−at |f |A1 (D) , with the exponent at approaching 0 linearly in t. The following Theorem proved by Livshitz and Temlyakov[27] in 2003 shows that the upper bound in Theorem 2.2 is sharp in a certain sense, i.e. this exponent cannot decrease to 0 at a slower rate than linear. Theorem 2.3. There exist an absolute constant b > 0 such that, for any t > 0, we can find a dictionary Dt and a function ft ∈ A1 (Dt ) such that, for some realization Gtm (ft , Dt ) of WGA, we have lim inf kft − Gtm (ft , Dt )kmbt /|ft |A1 (Dt ) > 0. m→∞

2.2

The Algorithm

In this section we define a super greedy type algorithm such that at a greedy step we pick several dictionary elements contrary to a single dictionary element in WGA. We now proceed to the definition of the Weak Super Greedy Algorithm with parameter s. Let a natural number s and a weakness sequence τ := {tk }∞ k=1 , tk ∈ (0, 1], k = 1, 2, . . . be given. Weak Super Greedy Algorithm (WSGA(s, τ )). Initially, we define f0 := f0s,τ := f . Then for each m ≥ 1 we inductively define: (1) ϕ(m−1)s+1 , . . . , ϕms ∈ D are elements of the dictionary D satisfying the following inequality. Denote Im := [(m − 1)s + 1, ms] and assume that min |hfm−1 , ϕi i| ≥ tm i∈Im

sup g∈D,g6=ϕi ,i∈Im

8

|hfm−1 , gi|.

(2) Let Fm := Fm (fm−1 ) := span(ϕi , i ∈ Im ) and let PFm denote an operator of orthogonal projection onto Fm . Define the residual after mth iteration of the algorithm s,τ fm := fm := fm−1 − PFm (fm−1 ).

(3) Find the approximant Gsm (f ) := Gs,τ m (f, D) :=

m X

PFj (fj−1 ).

j=1

In the case tk = t, k = 1, 2, . . ., we write t instead of τ in the notations. If t = 1, WSGA(s, 1) is just the Super Greedy Algorithm with parameter s (SGA(s)). For s = 1 the Super Greedy Algorithm coincides with the Pure Greedy Algorithm and the Weak Super Greedy Algorithm coincides with the Weak Greedy Algorithm. The following open problem (see [36], p.65, Open Problem 3.1) on the rate of convergence of PGA for the A1 (D) is a central theoretical problem in greedy approximation in Hilbert spaces. Open problem. Find the order of decay of the sequence γ(m) :=

sup (kf − G1m (f, D)k|f |−1 A1 (D) ),

f,D,{G1m }

where the supremum is taken over all dictionaries D, all elements f ∈ A1 (D) \ {0} and all possible choices of {G1m }. In 2004, Sil’nichenko [33] developed a method from Konyagin and Temlyakov [22] and improved the best known upper bound γ(m) ≤ Cm−0.182... . There is also some progress in the lower estimates. In 2007, Livshitz [26] using the technique from Livshitz and Temlyakov [27] proved the best known lower bound γ(m) ≥ Cm−0.1898 .

9

Furthermore, we introduce the following generalization of the quantity γ(m) to the case of the Weak Greedy Algorithms γ(m, τ ) :=

sup

−1 (kf − G1,τ m (f, D)k|f |A1 (D) ),

f,D,{G1,τ m }

where the supremum is taken over all dictionaries D, all elements f ∈ A1 (D) \ {0}, and all possible choices of {G1,τ m }. We prove here the following theorem. Theorem 2.4. Let D be a dictionary with coherence parameter M := M (D). Then, for s ≤ (2M )−1 , the algorithm WSGA(s, t) provides, after m iterations, an approximation of f ∈ A1 (D) with the following upper bound on the error: 1

−1 − 2 kf − Gs,t γ(m − 1, rt), m (f, D)k ≤ Ct s

where r :=

1−M s 1+M s

1

2

and C is an absolute constant.

Theorem 2.4 with t = 1 gives the following statement on SGA(s). Corollary 2.5. Let D be a dictionary with coherence parameter M := M (D). Then, for s ≤ (2M )−1 , the algorithm SGA(s) provides, after m iterations, an approximation of f ∈ A1 (D) with the following upper bound on the error: 1

kf − Gsm (f, D)k ≤ Cs− 2 γ(m − 1, r),

where r :=

1−M s 1+M s

1

2

and C is an absolute constant.

It is interesting to note that even in the case of the SGA(s), when the weakness parameter is 1, we have the upper bound of the error in terms of γ(m − 1, r) not in terms of γ(m − 1). For estimating γ(m, r) we use the above Theorem 2.2. For f ∈ A1 (D), we apply PGA and SGA(s). Then after sm iterations of PGA and m iterations of SGA(s), both algorithms provide sm-term approximants. For illustration purposes, take s = N 1/2 and m = N 1/2 . Then PGA gives kfN k ≤ (sm)−1/6 = N −1/6 10

and SGA(s) provides r

1

1

kfm k ≤ Cs−1/2 m− 2(2+r) = N − 4 − 2 θ , where θ :=

r 2(2+r)

and r =

1−M s 1+M s

1/2

,

1 4

+ 12 θ > 16 . Thus, in this particular case, the

SGA(s) has a better upper bound for the error than the PGA.

2.3

Proof of the Rate of Convergence

First we state a known result of Donoho, Elad, and Temlyakov [16] which will be frequently used in this thesis. Lemma 2.6. Assume a dictionary D is M -coherent. Then we have for any distinct ψj ∈ D, j = 1, . . . , N and for any aj , j = 1, . . . , N the inequalities N X

|aj |2 (1 − M (N − 1)) ≤ k

j=1

N X

aj ψj k2 ≤

N X

j=1

|aj |2 (1 + M (N − 1)).

j=1

Proof of Theorem 2.4. Let f=

∞ X

cj gj ,

gj ∈ D,

∞ X

|cj | ≤ 1,

|c1 | ≥ |c2 | ≥ . . .

.

(2.2)

j=1

j=1

Every element of A1 (D) can be approximated arbitrarily well by elements of the form (2.2). It will be clear from the below argument that it is sufficient to consider elements f of the form (2.2). Suppose ν is such that |cν | ≥ a/s ≥ |cν+1 |, where a=

t+3 . 2t

Then the above assumption on the sequence {cj } implies that ν ≤ bs/ac

and |cs+1 | < 1/s. We claim that elements g1 , . . . , gν will be chosen among ϕ1 , . . . , ϕs at the first iteration. Indeed, for j ∈ [1, ν] we have |hf, gj i| ≥ |cj | − M

∞ X

|ck | ≥ a/s − M (1 − a/s) > a/s − M.

k6=j

For all g distinct from g1 , . . . , gs we have |hf, gi| ≤ M + 1/s. 11

Our assumption s ≤ 1/(2M ) implies that M + 1/s ≤ t(a/s − M ). Thus, we do not pick any of g ∈ D distinct from g1 , . . . , gs until we have chosen all g1 , . . . , gν . Denote f 0 := f −

ν X

∞ X

cj gj =

j=1

cj g j .

j=ν+1

It is clear from the above argument that f1 = f − PH1 (f ) = f 0 − PH1 (f 0 ) = f 0 − Gs1 (f 0 ).

(2.3)

Define a new dictionary s

( P

D :=

k

)

j∈Λ cj gj j∈Λ cj gj k

P

|Λ| = s,

:

gj ∈ D, cj ∈ R .

Let Jl = [(l − 1)s + ν + 1, ls + ν]. We write f0 =

∞ X X

cj gj =

l=1 j∈Jl

where ψl =

P

j∈Jl

cj gj . Apparently

ψl kψl k

∞ X

kψl k

l=1

ψl , kψl k

(2.4)

∈ Ds for l ≥ 1. Equation (2.4) implies that

f 0 ∈ A1 (Ds ,

∞ X

kψl k).

l=1

By Lemma 2.6 we bound (1 − M s)

X

c2j ≤ kψl k2 ≤ (1 + M s)

j∈Jl

X

c2j .

(2.5)

j∈Jl

Then we obtain ∞ X

kψl k ≤ (1 + M s)1/2

∞ X X

(

c2j )1/2 .

(2.6)

l=1 j∈Jl

l=1

Since the sequence {cj } has the property |cν+1 | ≥ |cν+2 | ≥ . . . ,

∞ X

|cj | ≤ 1,

|cν+1 | ≤ a/s

j=ν+1

we may apply the simple inequality, (

X

c2j )1/2 ≤ s1/2 |c(l−1)s+ν+1 |,

j∈Jl

12

(2.7)

so that we bound the sum in the right side of (2.6) ∞ X X

(

c2j )1/2 ≤ s1/2

l=1 j∈Jl

∞ X

|c(l−1)s+ν+1 |

l=1

≤ s1/2 (a/s +

∞ X

s−1

l=2

X

|cj |) ≤ (a + 1)s−1/2 .

(2.8)

j∈Jl−1

Using the above inequality in (2.6), we obtain that 1

f 0 ∈ A1 (Ds , (3/2)1/2 (a + 1)s− 2 ). Assume u :=

P

(2.9)

ai hi ∈ A1 (Ds , B), where B is an absolute constant. Then for any

ψ ∈ Ds we have v := u − hu, ψiψ ∈ A1 (D, 2B), since |hu, ψi| = |h

P

ai gi , ψi| ≤ kψk

P

|ai | ≤ B. Along with (2.3) and (2.9) the above

argument shows that f1 ∈ A1 (Ds , 2(3/2)1/2 (a + 1)s−1/2 ).

(2.10)

Consider the following quantity qs := qs (fm−1 ) := sup kPH(s) (fm−1 )k, hi ∈D i∈[1,s]

where H(s) := span(h1 , . . . , hs ). It is clear that qs = sup

max

H(s) ψ∈H(s),kψk≤1

Let ψ =

Ps

i=1

|hfm−1 , ψi| = sup |hfm−1 , g s i|. g s ∈Ds

ai hi . Again by Lemma 2.6 we bound (1 − M s)

s X

a2i ≤ kψk2 ≤ (1 + M s)

i=1

s X

a2i .

(2.11)

i=1

Therefore, (1 + M s)−1

s X

s X

i=1

i=1

hfm−1 , hi i2 ≤ kPH(s) (fm−1 )k2 ≤ (1 − M s)−1

13

hfm−1 , hi i2 .

(2.12)

Let pm := PHm (fm−1 ). In order to relate qs2 to kpm k2 we begin with the fact that qs2 ≤ sup (1 − M s)−1 hi ∈D i∈[1,s]

s X

hfm−1 , hi i2 .

i=1

Consider an arbitrary set {hi }si=1 of distinct elements of the dictionary D. Let V be a set of all indices i ∈ [1, s] such that hi = ϕk(i) , k(i) ∈ Im . Denote V 0 := {k(i), i ∈ V }. Then s X

hfm−1 , hi i2 =

i=1

X

hfm−1 , hi i2 +

i∈V

X

hfm−1 , hi i2 .

(2.13)

i∈[1,s]\V

From the definition of {ϕk }k∈Im we get max |hfm−1 , hi i| ≤ t−1 min 0 |hfm−1 , ϕk i|. k∈Im \V

i∈[1,s]\V

(2.14)

Using (2.14) we continue (2.13) ≤

X

hfm−1 , ϕk i2 + t−2

k∈V 0

hfm−1 , ϕk i2 ≤ t−2

X k∈Im \V 0

X

hfm−1 , ϕk i2 .

k∈Im

Therefore, qs2 ≤ (1 − M s)−1 t−2

X

hfm−1 , ϕk i2 ≤

k∈Im

1 + Ms kpm k2 . − M s)

t2 (1

This results in the following inequality kpm k2 ≥

t2 (1 − M s) 2 q . 1 + Ms s

(2.15)

Thus we can interpret WSGA(s, t) as WGA(rt) with respect to the dictionary Ds ,

where r =

1−M s 1+M s

1

2

. Using (2.10), we get

s,t 1/2 k = k(f1 )s,t (a + 1)s−1/2 γ(m − 1, rt). kfm m−1 k ≤ 6

This completes the proof of Theorem 2.4.

14

2

Chapter 3 Weak Orthogonal Super Greedy Algorithm 3.1

Introduction and Preliminaries

Let us begin this chapter with a simple question. Assume we have a collection of elements {ϕj }m j=1 from a dictionary D. For a given f ∈ H, how can we minimize the error of approximation from F := span{ϕj }m j=1 ? It is not difficult to see that kf − PF (f )k = inf kf − gk, g∈F

where PF is the orthogonal projection operator onto F . If {ϕj }m j=1 are obtained from a realization of a greedy algorithm, apparently one can take the orthogonal projection onto the span of {ϕj }m j=1 as the approximant in order to minimize the residual. In 1996, DeVore and Temlyakov [11] studied the following algorithm motivated by the above fact. Orthogonal Greedy Algorithm (OGA). Initially, we define f0o := f . Then for each m ≥ 1 we inductively define: (1) ϕom ∈ D is any element satisfying o o , ϕom i| = sup |hfm−1 , gi|. |hfm−1 g∈D

(2) Let Hm := span(ϕo1 , . . . , ϕom ) and let PHm (f ) denote an operator of orthogonal projection onto Hm . Define Gom (f, D) := PHm (f ).

15

(3) Define the residual after mth iteration of the algorithm o fm := f − Gom (f, D).

In the above greedy step (1), we make an additional assumption that the maximizer always exists. If the dictionary is finite, obviously there exists such a maximizer. Comparing OGA with PGA that we discussed in Chapter 2, easy to observe that PGA subtracts the contribution of only one element in step (2), and OGA subtracts the contribution of a subspace in the corresponding step. In 2000, Temlyakov [35] introduced the following weak version corresponding to OGA. Let a sequence of weakness parameters τ = {tk }∞ k=1 , tk ∈ (0, 1], k = 1, 2, . . . be given. Weak Orthogonal Greedy Algorithm (WOGA). Initially, we define f0o,τ := f . Then for each m ≥ 1 we inductively define: (1) ϕo,τ m ∈ D is any element satisfying the following inequality o,τ o,τ |hfm−1 , ϕo,τ m i| ≥ tm sup |hfm−1 , gi|. g∈D

o,τ τ τ (f ) denote an operator of orthogonal (2) Let Hm := span(ϕo,τ 1 , . . . , ϕm ) and let PHm τ projection onto Hm . Define

τ (f ). Go,τ m (f, D) := PHm

(3) Define the residual after mth iteration of the algorithm o,τ fm := f − Go,τ m (f, D).

Easy to see, the algorithm can pick some element which is close to the maximizer instead of the maximizer itself in the greedy step (1) of WOGA. By analysis analogous to that we did in chapter 2, this idea brings flexibility for choosing ϕm also guarantees

16

the existence providing tm ∈ (0, 1). In the case tk = 1, k = 1, 2, . . . , WOGA coincides with OGA. The convergence rate of OGA obtained by DeVore and Temlyakov [11] in 1996 is as follows. Theorem 3.1. Let D be an arbitrary dictionary in H. Then, for each f ∈ A1 (D, B), we have kf − Gom (f, D)k ≤ Bm−1/2 .

(3.1)

In 2000, Temlyakov [35] generalized this result by proving the convergence rate of WOGA. Theorem 3.2. Let D be an arbitrary dictionary in H. Then, for each f ∈ A1 (D, B), we have kf − Go,τ m (f, D)k ≤ B(1 +

m X

t2k )−1/2 .

(3.2)

k=1

We note that in a particular case tk = t, k = 1, 2, . . . , the right hand side takes form B(1 + mt2 )−1/2 . When t = 1, the result is equivalent to the Theorem 3.1 in the sense of order.

3.2

The Algorithm

In this section we define a super greedy type algorithm using the idea of picking several elements at the greedy step and making orthogonal projection to minimize the residual. We now propose the following algorithm which is modified from WOGA. Let a natural number s and a sequence τ := {tk }∞ k=1 , tk ∈ (0, 1], k = 1, 2, . . . be given. Consider the following Weak Orthogonal Super Greedy Algorithm with parameter s. Weak Orthogonal Super Greedy Algorithm (WOSGA(s, τ )). Initially, we define f0 := f . Then for each m ≥ 1 we inductively define:

17

(1) ϕ(m−1)s+1 , . . . , ϕms ∈ D are elements of the dictionary D satisfying the following inequality. Denote Im := [(m − 1)s + 1, ms] and assume that min |hfm−1 , ϕi i| ≥ tm i∈Im

sup

|hfm−1 , gi|.

g∈D,g6=ϕi ,i∈Im

(2) Let Hm := Hm (f ) := span(ϕ1 , . . . , ϕms ) and let PHm denote an operator of orthogonal projection onto Hm . Define Gm (f ) := Gm (f, D) := Gsm (f, D) := PHm (f ). (3) Define the residual after mth iteration of the algorithm s fm := fm := f − Gm (f, D).

In this chapter we study the rate of convergence of WOSGA(s, τ ) in the case tk = t, k = 1, 2, . . . . In this case we write t instead of τ in the notations. We assume that the dictionary D is M -coherent and that f ∈ A1 (D). We begin with the case t = 1. In this case we impose an additional assumption that ϕi ’s in the step (1) exist. Clearly, it is the case if D is finite. We call the algorithm WOSGA(s, 1) the Orthogonal Super Greedy Algorithm with parameter s (OSGA(s)). In Section 3.3 we prove the following upper estimate on the error for OSGA(s). Theorem 3.3. Let D be a dictionary with coherence parameter M := M (D). Then, for s ≤ (2M )−1 , the algorithm OSGA(s) provides, after m iterations, an approximation of f ∈ A1 (D) with the following upper bound on the error: kfm k ≤ A(sm)−1/2 , with A =

√

m = 1, 2, . . .

,

(3.3)

40.5.

In Theorem 3.3 we are interested in the order in terms of parameters s and m. It follows from the known results that the order (sm)−1/2 obtained in the Theorem 18

3.3 cannot be improved. The constant

√ 40.5 is a specific constant that our proof

provides. Clearly, it is not the best constant. However, we did not try to make it smaller and tried to give a simplest version of the proof that gives the right order. Remark 3.4. For the case f ∈ A1 (D, B), we can also run OSGA. By a similar proof, √ the upper bound on the error has the same order as in (3.3) and 40.5B is a specific constant. We note that OSGA(s) adds s new elements of the dictionary at each iteration and makes one orthogonal projection at each iteration. For comparison, OGA adds one new element of the dictionary at each iteration and makes one orthogonal projection at each iteration. After m iterations of OSGA(s) and after ms iterations of OGA both algorithms provide ms-term approximants with a guaranteed error bound for f ∈ A1 (D) of the same order: O((ms)−1/2 ). Both algorithms use the same number ms of elements of the dictionary. However, OSGA(s) makes m orthogonal projections and OGA makes ms (s times more) orthogonal projections. Additionally, at every greedy step OGA picks the largest inner product (in magnitude), which basically requires ordering of all the inner products. OSGA also needs ordering at each iteration. However, the number of iterations of OSGA is s times smaller than the number of iterations of OGA. In this dissertation, we do not discuss numerical experiments since it is mostly devoted to theoretical results. Based on the analysis of the number of orthogonal projections and the number of orderings we observe that OSGA(s) is s times simpler (more efficient) than OGA. We gain this simplicity of OSGA(s) under an extra assumption of D being M -coherent and s ≤ (2M )−1 . Therefore, if our dictionary D is M -coherent then OSGA(s) with small enough s approximates with error whose guaranteed upper bound for f ∈ A1 (D) is of the same order as that for OGA. We now proceed to the case of WOSGA(s, t) with t ∈ (0, 1).

19

Theorem 3.5. Let D be a dictionary with coherence parameter M := M (D). Then, for s ≤ (2M )−1 , the algorithm WOSGA(s, t) provides, after m iterations, an approximation of f ∈ A1 (D) with the following upper bound on the error: kfm k ≤ A(t)(sm)−1/2 ,

3.3

m = 1, 2, . . .

A(t) := (81/8)1/2 (1 + t)t−2 .

Proof of the Rate of Convergence

Proof of Theorem 3.3. Denote Fm := span(ϕi , i ∈ Im ). Then Hm−1 , Fm ⊂ Hm . Therefore, fm = f − PHm (f ) = fm−1 + Gm−1 (f ) − PHm (fm−1 + Gm−1 (f )) = fm−1 − PHm (fm−1 ). It is clear that the inclusion Fm ⊂ Hm implies kfm k ≤ kfm−1 − PFm (fm−1 )k.

(3.4)

Using the notation pm := PFm (fm−1 ), we continue kfm−1 k2 = kfm−1 − pm k2 + kpm k2 and by (3.4) kfm k2 ≤ kfm−1 k2 − kpm k2 .

(3.5)

To estimate kpm k2 from below for f ∈ A1 (D), we first make some auxiliary observations. Let f=

∞ X j=1

cj gj ,

gj ∈ D,

∞ X

|cj | ≤ 1,

|c1 | ≥ |c2 | ≥ . . .

.

(3.6)

j=1

Every element of A1 (D) can be approximated arbitrarily well by elements of the form (3.6). It will be clear from the below argument that it is sufficient to consider 20

elements f of the form (3.6). Suppose ν is such that |cν | ≥ 2/s ≥ |cν+1 |. Then the above assumption on the sequence {cj } implies that ν ≤ s/2 and |cs+1 | < 1/s. We claim that elements g1 , . . . , gν will be chosen among ϕ1 , . . . , ϕs at the first iteration. Indeed, for j ∈ [1, ν] we have |hf, gj i| ≥ |cj | − M

∞ X

|ck | ≥ 2/s − M (1 − 2/s) > 2/s − M.

k6=j

For all g distinct from g1 , . . . , gs we have |hf, gi| ≤ M + 1/s. Our assumption s ≤ 1/(2M ) implies that M + 1/s ≤ 2/s − M . Thus, we do not pick any of g ∈ D distinct from g1 , . . . , gs until we have chosen all g1 , . . . , gν . Denote f 0 := f −

ν X

cj gj =

∞ X

cj g j .

j=ν+1

j=1

It is clear from the above argument that f1 = f − PH1 (f ) (f ) = f 0 − PH1 (f ) (f 0 ); fm = f − PHm (f ) (f ) = f 0 − PHm (f ) (f 0 ). We now estimate kpm k2 . For fm−1 consider the following quantity qs := qs (fm−1 ) := sup kPH(s) (fm−1 )k, hi ∈D i∈[1,s]

where H(s) := span(h1 , . . . , hs ). Then kPH(s) (fm−1 )k = Let ψ =

Ps

i=1

max

ψ∈H(s),kψk≤1

|hfm−1 , ψi|.

ai hi . Then by Lemma 2.6 we bound (1 − M s)

s X

a2i ≤ kψk2 ≤ (1 + M s)

i=1

s X

a2i .

(3.7)

i=1

Therefore, −1

(1 + M s)

s X

2

2

hfm−1 , hi i ≤ kPH(s) (fm−1 )k ≤ (1 − M s)

i=1

−1

s X

hfm−1 , hi i2 .

i=1

21

Thus kpm k2 ≥

1 − Ms 2 q . 1 + Ms s

(3.8)

Using the notation Jl := [(l − 1)s + ν + 1, ls + ν] we write for m ≥ 2 kfm−1 k2 = hfm−1 , f 0 i =

∞ X

hfm−1 ,

l=1 ∞ X 1/2

≤ qs (1 + M s)

X

cj gj i

j∈Jl

(

X

c2j )1/2 .

(3.9)

l=1 j∈Jl

Since the sequence {cj } has the property |cν+1 | ≥ |cν+2 | ≥ . . . ,

∞ X

|cj | ≤ 1,

|cν+1 | ≤ 2/s

(3.10)

j=ν+1

we may apply the simple inequality, (

X

c2j )1/2 ≤ s1/2 |c(l−1)s+ν+1 |,

j∈Jl

so that we bound the sum in the right side of (3.9) ∞ X X

(

c2j )1/2 ≤ s1/2

l=1 j∈Jl

∞ X

|c(l−1)s+ν+1 |

l=1

≤ s1/2 (2/s +

∞ X

s−1

l=2

|cj |) ≤ 3s−1/2 .

X

(3.11)

j∈Jl−1

Inequalities (3.9) and (3.11) imply qs ≥ (s1/2 /3)(1 + M s)−1/2 kfm−1 k2 By (3.8) we have kpm k2 ≥

s(1 − M s) kfm−1 k4 . 2 9(1 + M s)

(3.12)

Our assumption M s ≤ 1/2 implies 1 − Ms ≥ 2/9 (1 + M s)2 and, therefore, (3.12) gives kpm k2 ≥ (s/A2 )kfm−1 k4 , 22

A :=

√

40.5.

Thus, by (3.5) we get kfm k2 ≤ kfm−1 k2 (1 − (s/A2 )kfm−1 k2 ).

(3.13)

Using (3.10) we get for kf 0 k kf 0 k ≤

∞ X l=1

k

X

cj gj k ≤

j∈Jl

∞ X

(1 + M s)1/2 (

X

c2j )1/2 ≤ (1 + M s)1/2 3s−1/2 ,

j∈Jl

l=1

and kf1 k2 ≤ kf 0 k2 ≤ 27/(2s) ≤ A2 /s. We need the following known lemma (see, for example, [11]). Lemma 3.6. Let {am }∞ m=1 be a sequence of non-negative numbers satisfying the inequalities a1 ≤ D,

am+1 ≤ am (1 − am /D),

m = 1, 2, . . . .

Then we have for each m am ≤ D/m. By Lemma 3.6 with am := kfm k2 , D := A2 /s, we obtain kfm k2 ≤ A2 (sm)−1 ,

m = 1, 2, . . .

. 2

This completes the proof of Theorem 3.3.

Proof of Theorem 3.5. Proof of this theorem mimics the proof of Theorem 3.3, except that in the auxiliary observations we choose a threshold B/s with B := (3 + t)/(2t) instead of 2/s: |cν | ≥ B/s ≥ |cν+1 |, so that our assumption M s ≤ 1/2 implies that M + 1/s ≤ t(B/s − M ). This, in turn, implies that all g1 , . . . , gν will be chosen at the first iteration. As a result, the sequence {cj } satisfies the following conditions |cν+1 | ≥ |cν+2 | ≥ . . . ,

∞ X

|cj | ≤ 1,

|cν+1 | ≤ B/s.

j=ν+1

To find an analog of inequality (3.8), we begin with the fact that qs2 ≤ sup (1 − M s)−1 hi ∈D i∈[1,s]

s X i=1

23

hfm−1 , hi i2 .

(3.14)

Now, in order to relate qs2 to kpm k2 , consider an arbitrary set {hi }si=1 of distinct elements of the dictionary D. Let V be a set of all indices i ∈ [1, s] such that hi = ϕk(i) , k(i) ∈ Im . Denote V 0 := {k(i), i ∈ V }. Then s X

X

i=1

i∈V

hfm−1 , hi i2 =

hfm−1 , hi i2 +

X

hfm−1 , hi i2 .

(3.15)

i∈[1,s]\V

From the definition of {ϕk }k∈Im we get max |hfm−1 , hi i| ≤ t−1 min 0 |hfm−1 , ϕk i|. k∈Im \V

i∈[1,s]\V

(3.16)

Using (3.16) we continue (3.15) ≤

X

hfm−1 , ϕk i2 + t−2

k∈V 0

X

hfm−1 , ϕk i2 ≤ t−2

k∈Im \V 0

X

hfm−1 , ϕk i2 .

k∈Im

Therefore, qs2 ≤ (1 − M s)−1 t−2

X

hfm−1 , ϕk i2 ≤

k∈Im

1 + Ms kpm k2 . − M s)

t2 (1

This results in the following analog of (3.8) t2 (1 − M s) 2 kpm k ≥ q . 1 + Ms s 2

(3.17)

The use of (3.14) instead of (3.10) gives us the following version of (3.11) ∞ X X

(

c2j )1/2 ≤ (B + 1)s−1/2 .

l=1 j∈Jl

The rest of the proof repeats the corresponding part of the proof of Theorem 3.3 with A :=

3(B+1) √ 2t

2

= (81/8)1/2 (1 + t)t−2 .

24

Chapter 4 Weak Orthogonal Super Greedy Algorithm with Thresholding 4.1

Introduction and the Algorithm

Thresholding, which has a long history, is widely used in approximation theory and many other areas. For instance, in many popular compression techniques, such as JPEG, JPEG-2000, MPEG, and MP3, people look for an expansion of the original signal in an appropriate basis or dictionary. Then one can apply the thresholding technique to the coefficients in the expansion, namely, only keep big coefficients. If the basis or dictionary we chose is suitable, the new representation after thresholding with much fewer terms provides a good approximant of the original signal. In fact, thresholding is preferred by computer scientists and engineers. Since this technique avoids ordering in the greedy step, it is more feasible and efficient for programming. In greedy approximation we can replace the greedy step by a thresholding step to construct new algorithms. We now proceed to the definition of the algorithm. Let a natural number s and a weakness sequence τ := {tk }∞ k=1 , tk ∈ (0, 1], k = 1, 2, . . . be given. Weak Orthogonal Super Greedy Algorithm with Thresholding (WOSGAT(s, τ )). Initially, f0 := f0s,τ := f . Then for each m ≥ 1 we inductively define: (1) ϕi ∈ D where im ∈ Im are elements of the dictionary D satisfying the following

25

inequalities sm := |Im | ≤ s and min |hfm−1 , ϕi i| ≥ tm kfm−1 k2 . i∈Im

(4.1)

(2) Let Hm := Hm (f ) := span(ϕi , i ∈ I1 ∪ . . . ∪ Im ) and let PHm denote an operator of orthogonal projection onto Hm . Denote Gm (f ) := Gs,τ m (f, D) := PHm (f ). (3) Define the residual after mth iteration of the algorithm s,τ := f − PHm (f ). fm := fm

For s = 1, WOSGAT coincides with the Modified Weak Orthogonal Greedy Algorithm (MWOGA) (see [36], p. 61). Recalling the discussions in previous chapters, we note that a realization of WOSGA and WOGA is guaranteed for any f ∈ H. It is proved in [36] that a realization of MWOGA is guaranteed for f ∈ A1 (D). In the same way one can also prove that a realization of WOSGAT is guaranteed for f ∈ A1 (D). Let us make a remark on the implementation of WOSGAT for f ∈ H. In [3] the general procedure called “Basis Pursuit” was proposed for finding a representation of f with a minimal `1 norm of coefficients. It is pointed out in [3] that Basis Pursuit is a linear programming problem. Thus, one can use the following strategy: • For a given f find (or estimate) |f |A1 (D) using Basis Pursuit; • Then consider f /|f |A1 (D) and use WOSGAT. We note that in step (1) of WOSGAT, if there are more than s ϕi ’s satisfying |hfm−1 , ϕi i| ≥ tm kfm−1 k2 , the algorithm may pick any s of them and then make the projection. We will prove an upper bound for the rate of convergence of WOSGAT for f ∈ A1 (D) for a more general dictionary than the M -coherent dictionary. 26

Definition 4.1. We say that a dictionary D is (N, β)-Bessel if for any n ≤ N distinct elements ψ1 , . . . , ψn of the dictionary D we have for any f ∈ H 2

kPΨ(n) (f )k ≥ β

n X

|hf, ψi i|2 ,

i=1

where Ψ(n) := span{ψ1 , . . . , ψn }. Theorem 4.2. Let D be an (N, β)-Bessel dictionary. Then, for s ≤ N , the algorithm WOSGAT(s, τ ) provides, after m iterations, an approximation of f ∈ A1 (D) with the following upper bound of the error: kf − Gm (f )k ≤ (1 + β

m X

t2j sj )−1/2 .

j=1

For the case tk = t, k = 1, 2, . . ., we replace τ with t in the notations and give an upper bound for the rate of convergence in the following direct Corollary. Corollary 4.3. Let D be an (N, β)-Bessel dictionary. Then, for s ≤ N , the algorithm WOSGAT(s, t) provides, after m iterations, an approximation of f ∈ A1 (D) with the following upper bound of the error: kf − Gm (f )k ≤ (1 + βt

2

m X

sj )−1/2 .

j=1

We point out that

Pm

j=1

sj is the number of elements that the algorithm picked

up from D after m iterations. Therefore, WOSGAT offers the same error bound (in the sense of order) in terms of the number of elements of the dictionary, used in the approximant, as WOGA or MWOGA. We now give some sufficient conditions for a dictionary D to be (N, β)-Bessel. We begin with a simple lemma useful in that regard. Lemma 4.4. Let a dictionary D have the following property of (N, A)-stability. For any n ≤ N distinct elements ψ1 , . . . , ψn of the dictionary D, we have for any coefficients c1 , . . . , cn k

n X

2

ci ψi k ≤ A

i=1

n X i=1

27

|ci |2 .

Then D is (N, A−1 )-Bessel. Proof. Let f ∈ H. We have for n ≤ N kPΨ(n) (f )k =

|hPΨ(n) (f ), ψi|

sup ψ∈Ψ(n),kψk≤1

= ≥

|

sup

n X

hf, ψi ici |

(c1 ,...,cn ):kc1 ψ1 +···+cn ψn k≤1 i=1 n X

|

sup

(c1 ,...,cn ):|c1 |2 +···+|cn |2 ≤A−1

hf, ψi ici | = A

−1/2

n X

(

i=1

|hf, ψi i|2 )1/2 .

i=1

2 Proposition 4.5. An M -coherent dictionary is (N, (1 + M (N − 1))−1 )-Bessel. Proof. By Lemma 2.6 for an M -coherent dictionary we have k

n X

2

ci ψi k ≤ (1 + M (n − 1))

i=1

n X

2

|ci | ≤ (1 + M (N − 1))

i=1

n X

|ci |2 .

i=1

Applying the above Lemma 4.4, we obtain the statement of the proposition.

2

The following proposition is a direct corollary of Lemma 4.4. Proposition 4.6. Let a dictionary D have the Restricted Isometry Property with parameters N, δ: for any n ≤ N distinct elements ψ1 , . . . , ψn (1 − δ)

n X

|ci |2 ≤ k

i=1

n X

ci ψi k2 ≤ (1 + δ)

i=1

n X

|ci |2 .

i=1

Then D is (N, (1 + δ)−1 )-Bessel.

4.2

Proof of the Rate of Convergence

Proof of Theorem 4.2. For the {ϕi } from the definition of WOSGAT, denote Fm := span(ϕi , i ∈ Im ). It is easy to see that Hm−1 , Fm ⊂ Hm . Therefore, fm = f − PHm (f ) = fm−1 + Gm−1 (f ) − PHm (fm−1 + Gm−1 (f )) = fm−1 − PHm (fm−1 ). 28

Then the inclusion Fm ⊂ Hm implies kfm k ≤ kfm−1 − PFm (fm−1 )k.

(4.2)

Using the notation pm := PFm (fm−1 ), we continue kfm−1 k2 = kfm−1 − pm k2 + kpm k2 and by (4.2) kfm k2 ≤ kfm−1 k2 − kpm k2 .

(4.3)

We now prove a lower bound for kpm k. By our assumption that the dictionary is (N, β)-Bessel we get kPFm (fm−1 )k2 ≥ β

|hfm−1 , ϕi i|2 .

X i∈Im

Then, by the thresholding condition of the greedy step (1), we obtain kPFm (fm−1 )k2 ≥ βt2m sm kfm−1 k4 .

(4.4)

Substituting this bound in (4.3), we get 2

kfm k ≤ kfm−1 k

2

1−

βt2m sm kfm−1 k2

.

(4.5)

We now apply the following lemma from [35]. Lemma 4.7. Let {am }∞ m=0 be a sequence of nonnegative numbers satisfying the inequalities a0 ≤ A,

am ≤ am−1 (1 − λ2m am−1 /A),

m = 1, 2, . . .

Then we have for each m am ≤ A(1 +

m X

λ2k )−1 .

k=1

It gives us kfm k ≤ (1 + β

m X

t2j sj )−1/2 .

j=1

2 29

Chapter 5 Applications in Compressed Sensing 5.1

Introduction of Compressed Sensing

The conventional approach of reconstructing signals exactly from measured data follows the famous Shannon/Nyquist sampling theorem [40], which states that the sampling rate must be twice the highest frequency. In recent years, a new method of sampling and recovering sparse or compressible signal is developed, which is called Compressed Sensing or Compressive Sensing (CS). CS predicts that reconstruction of sparse signal is possible from what was previously believed to be incomplete information. CS has attracted much attention from both mathematicians and computer scientists. At present, it is a hot area of research because of extensive connections to and applications in coding and information theory, operation research, signal/image processing, machine learning, and etc. We refer the readers to [17] or [7] as a survey of both theoretical and numerical results of the compressed sensing. In the case of a discrete, finite dimensional domain, we can view our signals as vectors in Rn . When dealing with vectors in Rn , we will make frequent use of the `p norms, which are defined for p ∈ [1, ∞] as

kxkp =

      

Pn

(

i=1

1

|xi |p ) p p ∈ [1, ∞);

(5.1)

maxi=1,...,n |xi | p = ∞.

For convenience, we denote k · k := k · k2 . Compressed Sensing refers to a problem of recovery (or approximate recovery) of an unknown column vector (signal) x ∈ RN from the information provided by linear

30

measurements y ∈ Rm , where yj = hψj , xi, ψj ∈ RN , j = 1, 2, . . . , m. We can rewrite this relation between x and y with the equation y = Φx, where Φ := [ψ1 . . . ψN ]T is a m × N matrix. The recovery of x from given Φ and observed y is the major concern of CS. Typically m N , which makes y = Φx a highly under-determined system of linear equations. Therefore, if there is no extra structure of x, it is impossible to solve this reconstruction problem. Later we will see the sparse structure of x plays a crucial role in the recovery. If x is sparse or can be approximated well by a sparse expansion in terms of a suitable basis, that is, only a small number of non-zero coefficients, the recovery is feasible in a certain sense. In the setting of approximation theory, we treat the collection of columns in Φ as a finite dictionary D = {ϕi }N i=1 , where ϕi ’s are column vectors in Φ. The condition y ∈ A1 (D) is equivalent to existence of x ∈ RN such that y = Φx and kxk1 :=

N X

|xi | ≤ 1.

(5.2)

i=1

As a direct corollary of Theorem 3.2, we get for any y ∈ A1 (D) that the Orthogonal Greedy Algorithm guarantees the following upper bound for the error ky − Gon (y, D)k ≤ (n + 1)−1/2 ,

(5.3)

where the bound (5.3) holds for any D (any Φ). A naive approach of a recovery procedure that probably comes first to mind is to find the sparsest vector u := argmin kvk0

subject to Φv = y,

(P0 )

where kvk0 := | supp(v)|. Unfortunately, this combinational minimization is NP-hard in general [28, 29].

31

Another method proposed by Donoho and co-authors (see, for instance, [3, 13]) is the `1 -minimization (also called Basis Pursuit) which is proven to work well in CS and gives the following error bound as one of fundamental results of CS (see [1], [13] and also [37] for further results and a discussion). Denote by AΦ (y) := argmin kvk1

subject to Φv = y,

(P1 )

the result of application of the `1 -minimization algorithm AΦ to the data vector y. Then under some conditions on the matrix Φ (RIP(2n, δ) with small enough δ, which will be discussed momentarily) one has for x satisfying (5.2) kx − AΦ (Φx)k ≤ Cn−1/2 .

(5.4)

The inequalities (5.3) and (5.4) look alike. However, they provide the error bounds in different spaces: (5.3) in Rm (the measurements space) and (5.4) in RN (the signal space). Besides the `1 -minimization, a family of greedy algorithms is also utilized as fundamental tools for CS. To discuss the performance of greedy approximation in CS, let us begin with the Restricted Isometry Property of Φ introduced by Candès and Tao [2] in 2005 which is very useful in analysis of the performance of recovery algorithms. A vector x ∈ RN has the support T = supp(x) := {i ∈ N : 1 ≤ i ≤ N and xi 6= 0}. For a set of indices T , denote by |T | the cardinality of the set. If |T | ≤ K, x is called a K-sparse signal. Definition 5.1 (Restricted Isometry Property). A m × N matrix Φ satisfies the Restricted Isometry Property with parameters K, δ (we say Φ satisfies RIP(K, δ) for simplicity) if there exist δ ∈ (0, 1) such that (1 − δ)kxk2 ≤ kΦxk2 ≤ (1 + δ)kxk2

(5.5)

holds for every K-sparse x. Moreover, define δK := inf{δ : (5.5) holds for any Ksparse x}. 32

In order to measure the efficiency of different recovery algorithms, we can consider the rate of convergence as in (5.4) or another criteria that is the exact recovery of the sparse signal x. The theocratical results in this chapter concern more in the exact recovery aspect. The following theorem was proved by Candès and Tao [2] in 2005. Theorem 5.2. If Φ ∈ Rm×N satisfies RIP of order 3K with δ3K + δ2K + δK < 1, then AΦ (Φx) = x for any K-sparse x. In addition, the exact recovery conditions for greedy algorithms will be discussed in the next section. To avoid confusions, let us clarify the notations that we use in the rest of this chapter. For x ∈ RN , xΓ ∈ R|Γ| is a vector whose entries are the entries of x with indices in Γ. For m×N matrix Φ, ΦΓ is a m×|Γ| submatrix of Φ with columns indexed in Γ. Given y ∈ Rm , the orthogonal projection of y onto span(ΦΓ ) := span{ϕi : i ∈ Γ} is denoted by PΓ (y) :=

argmin

ky − y 0 k.

y 0 :y 0 ∈span(ΦΓ )

It is known and easy to check that if Φ∗Γ ΦΓ is invertible then PΓ (y) = ΦΓ Φ†Γ y, where Φ†Γ := (Φ∗Γ ΦΓ )−1 Φ∗Γ is the Moore-Penrose pseudoinverse of ΦΓ and Φ∗ is the transpose of Φ. We complete this section by presenting one observation and three lemmas those will be used in the proofs later. They are all consequences of RIP. The observation, derived directly from the definition of RIP, is that if Φ satisfies RIP of both orders K and K 0 , with K < K 0 , then δK ≤ δK 0 . In addition, the following inequalities (see [30] and [5]) which are used frequently in this thesis.

33

Lemma 5.3. Suppose Φ satisfies RIP of order s. Then, for any set of indices Γ such that |Γ| ≤ s and any x ∈ RN and y ∈ R|Γ| , we bound (1 − δ|Γ| )kxΓ k ≤ kΦ∗Γ ΦΓ xΓ k ≤ (1 + δ|Γ| )kxΓ k

(5.6)

kΦ∗Γ yk ≤ (1 + δ|Γ| )1/2 kyk.

(5.7)

and

Lemma 5.4. Assume Γ and Λ are two disjoint sets of indices. If Φ satisfies RIP of order |Γ ∪ Λ| with constant δ|Γ∪Λ| , then for any vector x ∈ R|Λ| kΦ∗Γ ΦΛ xk ≤ δ|Γ∪Λ| kxk.

(5.8)

Lemma 5.5. Suppose Φ satisfies RIP of order s. Then, for any set of indices Γ such that |Γ| ≤ s, we have k(Φ∗Γ ΦΓ − I)xk ≤ δs kxk.

5.2

(5.9)

Orthogonal Matching Pursuit

The Orthogonal Matching Pursuit (OMP) (see [32, 8]) in the CS setting is the counterpart of OGA (see Chapter 3) in the general setting. Comparing to the Basis Pursuit, the major advantages of this algorithm are its speed and its ease of implementation. Considering its applications in signal processing, let us present it in an algorithmic way.

34

Algorithm: Orthogonal Matching Pursuit (OMP) Input: Φ and y Initialization: r0 := y, x0 := 0, Λ0 := ∅, ` := 0 Iterations: Define Λ`+1 := Λ` ∪ {argmaxi |hr` , ϕi i|}. Then x`+1 := argminz ky − ΦΛ`+1 zk and r`+1 := y − PΛ`+1 (y) = y − ΦΛ`+1 x`+1 . If r`+1 = 0, stop. Otherwise let ` := ` + 1 and begin a new iteration. Output: If algorithm stops at kth iteration, output xˆ is such that xˆΛk = xk and xˆ(Λk )C = 0. It is known that if the columns of the matrix Φ form a dictionary D that is M coherent then OMP recovers exactly any K-sparse signal with K < (1 + M −1 )/2. The behavior of OMP with respect to incoherent dictionaries is well studied (see, for instance,[18, 39, 14, 15, 16, 38, 25]). In this section we study performance of OMP assuming RIP and prove a theorem that guarantees the exact recovery of any K-sparse signal. Let us take a close look at the first iteration of OMP. OMP chooses the index that corresponds to the largest (in magnitude) inner product. When can we conclude that this index is indeed in the support of x? A corollary of the following more general lemma gives the answer. From now on we always assume y = Φx, where x ∈ RN is K-sparse. Lemma 5.6. Assume s ∈ N and s ≤ K and Φ satisfies RIP of order K + s with constant δ := δK+s <

q

s . 2K

Define Λ := {i1 , . . . , is } such that

|hy, ϕi1 i| ≥ . . . ≥ |hy, ϕis i| ≥

sup ϕ∈Φ,ϕ6=ϕi ,i∈Λ

Then Λ ∩ T 6= ∅, where T is the support of x.

35

|hy, ϕi|.

Proof. We proceed by contradiction. Let T 0 := {n1 , . . . , ns } be a subset of T which contains the s largest (in magnitude) entries of x. From the definition of Λ it follows that kΦ∗Λ Φxk ≥ kΦ∗T 0 Φxk.

(5.10)

We now assume Λ ∩ T = ∅, so that xΛ = 0. Then (5.10) gives kΦ∗Λ Φxk = kΦ∗Λ Φx − xΛ k ≥ kxT 0 k − kΦ∗T 0 Φx − xT 0 k.

(5.11)

kxT 0 k ≤kΦ∗Λ Φx − xΛ k + kΦ∗T 0 Φx − xT 0 k √ √ ≤ 2kΦ∗Λ∪T 0 Φx − xΛ∪T 0 k ≤ 2k(Φ∗Λ∪T ΦΛ∪T − I)xk √ ≤ 2δkxk,

(5.12)

It follows that

where the last inequality is obtained by using Lemma 5.5. Next, using the definition of T 0 we get r

kxT 0 k ≥ Clearly, we have (s/K)1/2 >

s kxk. K

√ s 1/2 2δ, if δ < ( 2K ) . This contradiction completes the 2

proof.

This proof recommended by an anonymous referee improves the previous constant √ 1/(1 + 2) in [23]. When s = 1 in Lemma 5.6, we have the following corollary. Corollary 5.7. Suppose Φ satisfies RIP(K + 1, δK+1 ) with δK+1 <

√1 . 2K

If i ∈ [1, N ]

such that |hy, ϕi i| ≥

sup

|hy, ϕi|.

ϕ∈Φ,ϕ6=ϕi

Then i ∈ T , where T is the support of x. This corollary tells us that under the above conditions on Φ the OMP chooses the right index at each step. We now proceed to the main result of this section. 36

Theorem 5.8. Assume Φ satisfies RIP(K + 1, δ) with δ := δK+1 <

√1 . 2K

Then OMP

recovers any K-sparse x ∈ RN exactly in K iterations. Theorem 5.8 is a follow up to the corresponding main result of [6] by Davenport and Wakin in 2010. They proved an analog of Theorem 5.8 under assumption δK+1 < √ √1 . Thus, our contribution is in improving the constant from 1/3 to 1/ 2. It is 3 K conjectured in [5] that there exist K-sparse x and Φ satisfying RIP of order K +1 with √1 K

such that OMP can not recover x in K iterations. If this conjecture is √ true, then we cannot increase the constant 1/ 2 to 1. The question of best constant

δK+1 =

in Theorem 5.8 is an interesting open question. Proof of Theorem 5.8. Take any ` < K. From the definition of OMP, we know that Λ` contains the indices selected at the first ` iterations. All indices contained in Λ` are distinct because we make an orthogonal projection at each iteration. Assume Λ` ⊆ T , which is equivalent to the statement that OMP has selected ` correct indices (all selected indices are contained in T ) after ` iterations. As a consequence of this assumption we get that r` = y − PΛ` (y) is at most K-sparse and supported on T . If i is selected by the (` + 1)th iteration, it satisfies |hr` , ϕi i| ≥

sup

|hr` , ϕi|.

ϕ∈Φ,ϕ6=ϕi

By Corollary 5.7, if Φ satisfies RIP of order K + 1 with constant δ := δK+1 <

√1 , 2K

we know that i ∈ T . Since r` = y − PΛ` (y), the inner product of ϕi and any column indexed in Λ` is zero. This implies i 6∈ Λ` . Therefore, i ∈ T \Λ` . If Φ satisfies RIP of order K + 1 with constant δ <

√1 , 2K

the above argument works for any ` ∈ [0, K − 1].

By induction from ` = 0 to ` = K − 1, we get the conclusion that ΛK = T , rK = y − PT (y) = y − ΦT xT = 0, and OMP stops after K iterations. The output xˆ satisfies xˆΛK = xK = argmin ky − ΦΛK zk z

= argmin ky − ΦT zk = xT , z

37

and xˆ(ΛK )C = xT C = 0. 2

Hence, xˆ = x. This completes the proof.

5.3

Orthogonal Multi Matching Pursuit

Theorem 5.8 requires RIP condition with δK+1 <

√1 2K

so that OMP recovers a K-

sparse signal exactly in K iterations. For doing this OMP must select a correct index at each iteration. In this section we study an algorithm that may pick wrong indices in its iterations yet finally recovers the signal exactly. We study here the Orthogonal Multi Matching Pursuit with parameter s (OMMP(s)), which is the Orthogonal Super Greedy Algorithm with parameter s adjusted to the compressed sensing setting. Next, let us establish the algorithm. Algorithm: Orthogonal Multi Matching Pursuit (OMMP) Input: Φ, y and s Initialization: r0 := y, x0 := 0, Λ0 := ∅, ` := 0 Iterations: Define Λ`+1 := Λ` ∪ {i1 , . . . , is } such that |hr` , ϕi1 i| ≥ . . . ≥ |hr` , ϕis i| ≥

sup

|hr` , ϕi|.

ϕ∈Φ ϕ6=ϕi ,k=1,...,s k

Then x`+1 := argminz ky − ΦΛ`+1 zk and r`+1 := y − ΦΛ`+1 x`+1 . If r`+1 = 0, stop. Otherwise let ` := ` + 1 and begin a new iteration. Output: If algorithm stops at kth iteration, output xˆ such that xˆΛk = xk and xˆ(Λk )C = 0. Theorem 5.9. For s ∈ N and 1 < s ≤ K, assume Φ satisfies RIP of order sK with a constant δ := δsK <

√ √ s√ . (2+ 2) K

Then, for any K-sparse x ∈ RN , OMMP(s) recovers

x exactly within at most K iterations.

38

Proof. We will prove that OMMP(s) picks at least one correct index at each iteration. After running OMMP for ` iterations, we obtain a set of indices Λ` . We prove that at the (` + 1)th iteration OMMP selects at least one correct index. We carry out this proof without assumption that OMMP selected correct indices at previous iterations. Denote T1 := Λ` ∩ T , K1 := |T1 | and T2 := T \Λ` , K2 := |T2 |. We continue our proof by contradiction. Assume that at the next iteration the OMMP(s) does not select any correct indices. This means Λ0 ∩ T = ∅, where Λ0 := (Λ`+1 \Λ` ). From the definition of OMMP, we easily derive that |Λ0 | = s. The residual after ` iterations can be written in the following form r` = y − PΛ` (y) = ΦT1 xT1 + ΦT2 xT2 − PΛ` (ΦT1 xT1 + ΦT2 xT2 ) = ΦT2 xT2 − PΛ` (ΦT2 xT2 ).

(5.13)

`

Let xp ∈ R|Λ | give the coefficients of projection PΛ` (ΦT2 xT2 ). Then we continue (5.13) = ΦT2 xT2 − ΦΛ` xp .

(5.14)

We now focus on the second term of (5.14). Since Φ satisfies RIP of order sK, it also satisfies RIP of order |Λ` |. By (5.6) Φ∗Λ` ΦΛ` is invertible. By the property of the Moore-Penrose pseudoinverse we get xp = (Φ∗Λ` ΦΛ` )−1 Φ∗Λ` ΦT2 xT2 . Since Λ` ∩ T2 = ∅ we apply (5.6) and Lemma 5.4 kxp k = k(Φ∗Λ` ΦΛ` )−1 Φ∗Λ` ΦT2 xT2 k δs`+K2 kxT2 k 1 − δs` δ ≤ kxT2 k. 1−δ ≤

39

(5.15)

By (5.14) we have kΦ∗Λ0 r` k = kΦ∗Λ0 (ΦT2 xT2 − ΦΛ` xp )k ≤ kΦ∗Λ0 ΦT2 xT2 k + kΦ∗Λ0 ΦΛ` xp k.

(5.16)

By our assumption Λ0 ∩ T = ∅ and, therefore, Λ0 ∩ T2 = ∅. Moreover, the definition of Λ0 indicates that Λ0 and Λ` are also disjoint. Using Lemma 5.4 we continue (5.16) ≤ δs+K2 kxT2 k + δs(`+1) kxp k ≤ δkxT2 k + δkxp k δ2 ≤ δkxT2 k + kxT2 k. 1−δ

(5.17)

In the last inequality we used (5.15). The remainder of the proof consists of two cases depending on the size of T2 . First, we consider the case K2 > s. Denote T3 := {n1 , . . . , ns } such that T3 ⊆ T2 and |xn1 | ≥ . . . ≥ |xns | ≥

sup

|xn |.

n∈T2 ,n6∈T3

The definition of the Λ0 implies kΦ∗Λ0 r` k ≥ kΦ∗T3 r` k = kΦ∗T3 (ΦT2 xT2 − ΦΛ` xp )k ≥ kΦ∗T3 (ΦT3 xT3 + ΦT2 \T3 xT2 \T3 − ΦΛ` xp )k ≥ kΦ∗T3 ΦT3 xT3 k − kΦ∗T3 ΦT2 \T3 xT2 \T3 k − kΦ∗T3 ΦΛ` xp k.

(5.18)

By the definition of T3 kxT3 k ≥

s K2

1 2

kxT2 k,

(5.19)

and s kxT2 \T3 k ≤ 1 − K2

40

1 2

kxT2 k.

(5.20)

We estimate the last three terms in (5.18). Applying (5.6) and Lemma 5.4 we obtain kΦ∗T3 ΦT3 xT3 k ≥ (1 − δs )kxT3 k ≥ (1 − δ)kxT3 k s ≥ (1 − δ) K2

1 2

kxT2 k,

(5.21)

kΦ∗T3 ΦT2 \T3 xT2 \T3 k ≤ δK2 kxT2 \T3 k ≤ δkxT2 \T3 k

≤ δ 1−

s K2

1 2

kxT2 k,

(5.22)

and kΦ∗T3 ΦΛ` xp k ≤ δ(`+1)s kxp k ≤

δ2 kxT2 k. 1−δ

(5.23)

Substituting (5.21),(5.22) and (5.23) in (5.18) we obtain kΦ∗Λ0 r` k

s ≥ (1 − δ) K2

1 2

s kxT2 k − δ 1 − K2

1

2

kxT2 k −

δ2 kxT2 k. 1−δ

(5.24)

Comparing (5.17) and (5.24) we get

δ

s K2

1 2

s + 1− K2

1 2

2δ 2 s +δ+ ≥ 1−δ K2

1 2

,

which implies 1 √ 2δ 2 s 2 ≥ (1 + 2)δ + . 1−δ K2

If δ <

√ √ s√ (2+ 2) K

< 31 , then (1 +

√

√ 2δ 2 s 2)δ + < (2 + 2)δ < 1−δ K

This contradiction implies Λ0 ∩ T 6= ∅.

41

1 2

s < K2

1

2

.

(5.25)

Second, we consider the case K2 ≤ s. In this case |T2 | ≤ |Λ0 |. Using this and the definition of Λ0 we get kΦ∗Λ0 r` k ≥ kΦ∗T2 r` k = kΦ∗T2 (ΦT2 xT2 − ΦΛ` xp )k ≥ kΦ∗T2 ΦT2 xT2 k − kΦ∗T2 ΦΛ` xp k ≥ (1 − δK2 )kxT2 k − δs+K2 kxp k ≥ (1 − δ)kxT2 k −

δ2 kxT2 k. 1−δ

(5.26)

Comparing (5.17) and (5.26), we obtain δkxT2 k +

δ2 δ2 kxT2 k > (1 − δ)kxT2 k − kxT2 k 1−δ 1−δ

or equivalently 2δ +

2δ 2 > 1, 1−δ

(5.27)

The above inequality does not hold for δ ≤ 1/3. This contradiction implies that in the second case we also have Λ0 ∩ T 6= ∅. These two cases show that if δ <

√ √ s√ (2+ 2) K

then Λ0 ∩ T 6= ∅. Therefore, OMMP

picks at least one correct index at every iteration. This means T ⊆ ΛK . Thus xˆΛK = xK = argmin ky − ΦΛK zk = xΛK z

and xˆ(ΛK )C = x(ΛK )C = 0. Then we conclude that xˆ = x.

2

Using the idea of induction in the above proof we proved that each iteration of the OMMP picks at least one correct index, such that the OMMP recovers any K-sparse x in at most K iterations. However, in practice OMMP is usually more efficient. Since each greedy step OMMP is very likely to pick more than only one correct index, the algorithm may recover x in less than K iterations.

5.4

Orthogonal Multi Matching Pursuit with Thresholding Pruning

In this section, we construct a new CS recovery algorithm which is called Orthogonal Multi Matching Pursuit with Thresholding Pruning (OMMPTP). As a parameter 42

needed for the implementation, many CS greedy recovery algorithms such as ROMP, CoSamp, and SP (see [31, 30, 5]) require the prior knowledge of the sparsity of the unknown signal. However, in practice the sparsity of the unknown signal is not always available. On the contrary, OMMPTP does not require this information in advance. Assume the sparse signal x satisfies a condition about the magnitudes of its components, say, |xi | ≥ a for all nonzero xi . We define Orthogonal Multi Matching Pursuit with Thresholding Pruning with parameter s and a (OMMPTP(s, a)) as follows. Algorithm: OMMPTP(s, a) Input: a, Φ, y, and s Initialization Stage: Let Λ0 := Λ0a := ∅, r0 := y, and j := 1. Iteration Stage: Step 1: Let m := 1. Step 2: Denote Λj := Λj−1 ∪ {i1 , . . . , ik }, where k = ms and a |hrj−1 , φi1 i| ≥ . . . ≥ |hrj−1 , φik i| ≥

sup

|hrj−1 , φi|.

φ∈Φ φ6=φi ,k=1,2,...,ms k

Then find xj such that xjΛj := argminz ky − ΦΛj zk and xj[1,N ]\Λj = 0. Step 3: Denote Λja := {i ∈ [1, N ] : i ∈ Λj and |xji | ≥ a/2}. j j−1 j j−1 Step 4: If Λja ⊆ Λj−1 , j := j + 1, a , update Λa := Λa , r := r

and m := m + 1. Then go to step 2. Otherwise, update rj := y − PΛja (y). If rj = 0, stop. Otherwise, update j := j + 1 and go to step 1. Output: If the algorithm stops at the `th iteration, the output xˆ satisfies xˆ[1,N ]\Λ`a = 0 and xˆΛ`a = ΦΛ`a Φ†Λ`a y.

43

Analysis of OMMPTP In this subsection we address the performance of OMMPTP by proving a theorem which provides a condition of exact recovery. For convenience, we denote Ω(r, j) := {i1 , . . . , ij } ⊂ [1, N ] such that |hr, φi1 i| ≥ . . . ≥ |hr, φij i| ≥

sup

|hr, φi|.

φ∈Φ φ6=φi ,k=1,2,...,j k

To be consistent, we follow the dotations in Section 5.1, where T := supp(x) and |T | ≤ K. Lemma 5.10. Let a natural number s be given. Assume Φ satisfies RIP of order sn + K with constant δ := δsn+K < 1/2, where n is the smallest integer satisfying sn ≥ K. Then we have Ω(y, sn) ∩ T 6= ∅. Proof. We can prove by contradiction. Assume Ω := Ω(y, sn) and Ω ∩ T = ∅. Since Ω maximize the inner products, by (5.6) we have kΦ∗Ω yk ≥ kΦ∗T yk = kΦ∗T ΦT xT k ≥ (1 − δK )kxk. In addition by Lemma 5.4, δsn+K kxk ≥ kΦ∗Ω ΦT xT k. Thus δkxk ≥ (1 − δK )kxk ≥ (1 − δ)kxk. Apparently, if δ < 1/2 the above inequality yields a contradiction. This implies 2

Ω ∩ T 6= ∅ for δ < 1/2.

The interpretation of the above lemma tells us that under some RIP condition, if only j is big enough, Ω(y, js) includes some correct indices. Then, let us consider the case that the true support of x is partially known. Assume a subset of true support of x, say, Γ ⊂ T is given. And let r := y −PΓ (y). For a given natural number L, denote Ω := Ω(r, L) and Ω0 := Ω0 (r, L, Γ) := Ω(r, L) ∪ Γ. Then define Ω0a := Ω0a (r, L, Γ) as follows. First, find w ∈ RN such that wΩ0 = argminz ky − ΦΩ0 zk and w[1,N ]\Ω0 = 0. Next, we define Ω0a := {i ∈ Ω0 : |wi | ≥ a/2}. 44

Lemma 5.11. Assume Φ satisfies RIP of order L + K with constant δ := δL+K < b )a. If Γ ⊂ T and r = y − PΓ (y), then Ω0a (r, L, Γ) = Ω0 (r, L, Γ) ∩ T . and kxk < ( 1−b 2b Proof. It is sufficient to consider two cases. First, we denote Ω := Ω(r, L) and assume Ω ∩ T = ∅. From the definition of Ω0 := Ω0 (r, L, Γ), we obtain ΦΩ0 wΩ0 = PΩ0 (y) = PΩ0 (ΦT xT ) = PΩ0 (ΦΓ xΓ + ΦT \Γ xT \Γ ) = ΦΓ xΓ + ΦΩ0 Φ†Ω0 ΦT \Γ xT \Γ = ΦΓ xΓ + ΦΩ0 xp , where xp = Φ†Ω0 ΦT \T 0 xT \T 0 . Therefore, by (5.6) and Lemma 5.4, kxp k = k(Φ∗Ω0 ΦΩ0 )−1 Φ∗Ω0 ΦT \Γ xT \Γ k δL+K kxT \Γ k 1 − δL+K δ ≤ kxT \Γ k. 1−δ

≤

Since δ < b, by simple calculation , if kxT \Γ k ≤ kxk < a(1 − b)/2b, we have kxp k < a/2. Accordingly, kxp k∞ ≤ kxp k < a/2. It is not difficult to see that the magnitude of every component of xΓ is greater than a. Therefore, those components of w supported on Γ have magnitudes greater than or equal to a/2, and the components supported on Ω have magnitudes less than a/2. This implies that Ω0a = Ω0 ∩ Γ = Γ. In the second case, we denote T 0 := Ω ∩ T and assume T 0 6= ∅. Furthermore, we denote T 00 = T \(Γ ∪ T 0 ). It is clear that ΦΩ0 wΩ0 = PΩ0 (y) = PΩ0 (ΦT xT ) = PΩ0 (ΦΓ xΓ + ΦT 0 xT 0 + ΦT 00 xT 00 ) = ΦΓ xΓ + ΦT 0 xT 0 + ΦΩ0 Φ†Ω0 ΦT 00 xT 00 = ΦΓ xΓ + ΦT 0 xT 0 + ΦΩ0 x¯p ,

45

where x¯p = Φ†Ω0 ΦT 00 xT 00 .Therefore, by (5.6) and Lemma 5.4, k¯ xp k = k(Φ∗Ω0 ΦΩ0 )−1 Φ∗Ω0 ΦT 00 xT 00 k δL+K kxT 00 k 1 − δL+K δ ≤ kxT 00 k. 1−δ ≤

Since δ < b, by simple calculation, if kxT 00 k ≤ kxk < a(1 − b)/2b, we have kxp k < a/2. And therefore, kxp k∞ ≤ kxp k < a/2. It is not difficult to see that the magnitude of every component of xΓ and xT 0 is greater than a. Therefore, those components of w supported on Γ and T 0 have magnitudes greater than or equal to a/2, and the components supported on Ω\T 0 have magnitudes less than a/2. This implies that Ω0a = T 0 ∪ Γ = Ω0 ∩ T , which completes the proof.

2

We now proceed to a sufficient condition of exact recovery by OMMPTP. Theorem 5.12. Assume K-sparse signal x ∈ RN satisfies |xi | ≥ a for all i ∈ supp(x). If Φ satisfies RIP with order sL + K and δ := δsL+K < b < 1/2, where L is the smallest integer such that sL ≥ K, then OMMPTP(s, a) recovers x exactly for all x satisfying kxk < ( 1−b )a. 2b Proof. Let us prove by induction. At the very beginning, initialize the estimated support Γ := ∅ and residual r := y. Without of loss of generality, assume Ω(r, js) ∩ T = ∅ for j = 1, . . . , n − 1 and Ω(r, ns) ∩ T 6= ∅. Using Lemma 5.10, we can claim that n ≤ L. And apply Lemma 5.11, we derive Ω0a (r, js, ∅) = Ω0 (r, js, ∅) ∩ T = ∅ for j = 1, . . . , n − 1 and Ω0a (r, ns, ∅) = Ω(r, ns) ∩ T . This implies that Λja = ∅ for j = 1, . . . , n − 1 and Λna = Λn ∩ T 6= ∅. Then the algorithm updates rn := y − PΛna (y) = Φxn . If rn = 0, we finish the proof. Otherwise, since Λna is a subset of T , xn is at most K sparse. The algorithm will begin a new iteration. We can repeat a argument similar as above. The only modification needed is to update Γ := Λna and r := rn . We can see that every time we do the repetition, the 46

algorithm could get a subset of the true support T . Finally, we will get them all. Consequently, when making a projection onto that estimated support, we recover the 2

signal x exactly.

47

Bibliography [1] E. Candès, J. Romberg, and T. Tao, Stable signal recovery from incomplete and inaccuratemeasurements, Commun. Pure Appl. Math. 59 (2006), 1207–1223. [2] E. Candès and T. Tao, Decoding by linear programming, IEEE Trans. Inform. Theory 51 (2005), 4203–4215. [3] S. S. Chen, D. L. Donoho, and M. A. Saunders, Atomic decomposition by basis pursuit, SIAM Rev. 43 (2001), 129–159. [4] A. Cohen, W. Dahmen, and R. A. DeVore, Compressed sensing and k-term approximation, J. Amer. Math. Soc. 22 (2009), 211–231. [5] W. Dai and O. Milenkovic, Subspace pursuit for compressive sensing: Closing the gapbetween performance and complexity, IEEE Trans. Inform. Theory 55 (2009), 2230–2249. [6] M. Davenport and M. Wakin, Analysis of orthogonal matching pursuit using the restricted isometry property, IEEE Trans. Inform. Theory 56 (2010), 4395–4401. [7] M. A. Davenport, M. F. Duarte, Y. C. Eldar, and G. Kutyniok, Introduction to compressed sensing, Compressed Sensing: Theory and Applications, Cambridge University Press, 2011. [8] G. Davis, S. Mallat, and M. Avellaneda, Greedy adaptive approximation, Constr. Approx. 13 (1997), 57–98. [9] R. A. DeVore, Nonlinear approximation, Acta Numerica 7 (1998), 51–150. [10] R. A. DeVore and G. G. Lorenz, Constructive approximation, Springer, Berlin, 1993. [11] R. A. DeVore and V. N. Temlyakov, Some remarks on greedy algorithms, Adv. Comput. Math. 5 (1996), 173–187.

48

[12]

, Nonlinear approximation in finite-dimensional spaces, J. Complexity 13 (1997), 489–508.

[13] D. L. Donoho, Compressed sensing, IEEE Trans. Inform. Theory 52 (2006), 1289–1306. [14] D. L. Donoho, M. Elad, and V. N. Temlyakov, Stable recovery of sparse overcomplete representations in the presence of noise, IMI Preprint (2004). [15]

, Stable recovery of sparse overcomplete representations in the presence of noise, IEEE Trans. Inform. Theory 52 (2006), 6–18.

[16]

, On Lebesgue-type inequalities for greedy approximation, J. Approx. Theory 147 (2007), 185–195.

[17] M. Fornasier and H. Rauhut, Compressive sensing, Handbook of Mathematical Methods in Imaging (O. Scherzer, ed.), Springer, 2011, pp. 187–228. [18] A. C. Gilbert, S. Muthukrishnan, and S. Strauss, Approximation of functions over redundant dictionaries using coherence, 2002, pp. 243–252. [19] P. J. Huber, Projection pursuit, Ann. Statist. 13 (1985), 435–475. [20] L. Jones, On a conjecture of huber concerning the convergence of projection pursuit regression, Ann. Statist. 15 (1987), 880–882. [21]

, A simple lemma on greedy approximation in hilbert space and convergence rates for projection pursuitregression and neural network tranining, The Annals of Statistics 20 (1992), 608–613.

[22] S. V. Konyagin and V. N. Temlyakov, Rate of convergence of pure greedy algorithms, East J. Approx. 5 (1999), 493–499. [23] E. Liu and V. N. Temlyakov, Orthogonal super greedy algorithm and applications in compressed sensing, IMI Preprint, available at http://imi.cas.sc.edu/IMI/reports/2010/reports/1001.pdf, January 2010. [24]

, Super greedy type algorithms, IMI Preprint, available at http://imi.cas.sc.edu/IMI/reports/2010/reports/2010_07_Preprint.pdf, October 2010.

49

[25] E. Livshitz, On efficiency of orthogonal matching pursuit, Preprint, 2010. [26] E. D. Livshitz, On lower bounds for the rate of convergence of greedy algorithms, Izv. Math. 73 (2009), 1197–1215. [27] E. D. Livshitz and V. N. Temlyakov, Two lower estimates in greedy approximation, Constr. Approx. 19 (2003), 509–523. [28] S. G. Mallat and Z. Zhang, Matching pursuit with time-frequency dictionaries, IEEE Trans. Signal Process. 41 (1993), 3397–3415. [29] B. K. Natarajan, Sparse approximation solutions to linear systems, SIAM J. Comput. 24 (1995), 227–234. [30] D. Needell and J. Tropp, COSAMP: Iterative signal recovery from incomplete and inaccurate samples, Appl. Comput. Harmonic Analysis 26 (2009), 301–321. [31] D. Needell and R. Vershynin, Signal recovery from inaccurate and incomplete measurements via regularized orthogonal matching pursuit, IEEE Journal of Selected Topics in Signal Processing 4 (2010), 310–316. [32] Y. C. Pati, R. Rezaiifar, and P. S. Krishnaprasad, Sparse approximation solutions to linear systems, Proc. 27th Annu. Asilomar Conf. Signlas, Systems, and Computers, vol. 1, November 1993, pp. 40–44. [33] A. V. Sil’nichenko, Rate of convergence of greedy algorithms, Mat. Zametki 76 (2004), 628–632. [34] V. N. Temlyakov, Greedy algorithms and m-term approximationwith regard to redundant dictionaries, J. Approx. Theory 98 (1999), 117–145. [35] [36]

[37]

, Weak greedy algorithms, Adv. Comput. Math. 12 (2000), 213–227. , Nonlinear methods of approximation, Found. Comput. Math. 3 (2003), 33–107. , Greedy appproximation, Acta Numerica (2008), 235–409.

[38] V. N. Temlyakov and P. Zheltov, On performance of greedy algorithms, IMI Preprint, available at http://imi.cas.sc.edu/IMI/reports/2010/reports/1002.pdf, January 2010. 50

[39] J. Tropp, Greedy is good: Algorithmic results for sparse approximation, IEEE Trans. Inform. Theory 50 (2004), 2231–2242. [40] M. Unser, Sampling–50 years after shannon, Proceedings of the IEEE 88 (2000), 569–587.

51

Super Greedy Type Algorithms and Applications in ...

for the Degree of Doctor of Philosphy in ... greedy idea, we build new recovery algorithms in Compressed Sensing (CS) .... In Chapter 2 a super greedy type algorithm is proposed which is called the Weak ...... In recent years, a new method of.

Download PDF

388KB Sizes 1 Downloads 260 Views

Report

Super Greedy Type Algorithms and Applications in ...

Recommend Documents