Appl. Comput. Harmon. Anal. 30 (2011) 402–406
Contents lists available at ScienceDirect
Applied and Computational Harmonic Analysis www.elsevier.com/locate/acha
Letter to the Editor
The null space property for sparse recovery from multiple measurement vectors Ming-Jun Lai 1 , Yang Liu ∗ Department of Mathematics, The University of Georgia, Athens, GA 30602, United States
a r t i c l e
i n f o
a b s t r a c t
Article history: Available online 10 November 2010 Communicated by Naoki Saito on 3 January 2010
We prove a null space property for the uniqueness of the sparse solution vectors recovered from a minimization in p quasi-norm subject to multiple systems of linear equations, where p ∈ (0, 1]. Furthermore, we show that the null space property is equivalent to the null space property for the standard p minimization subject to a single linear system. This answers the questions raised in Foucart and Gribonval (2010) [17]. © 2010 Elsevier Inc. All rights reserved.
Keywords: Sparse recovery Optimization
1. Introduction Recently, one of the central problems in the compressed sensing for the sparse solution recovery of under-determined linear systems has been extended to the sparse solution vectors for multiple measurement vectors (MMV). That is, letting A be a sensing matrix of size m × N with m N and given multiple measurement vectors b(k) , k = 1, . . . , r, we are looking for solution vectors x(k) , 1, . . . , r such that
Ax(k) = b(k) ,
k = 1, . . . , r
(1)
and the vectors x(k) , k = 1, . . . , r are jointly sparse, i.e. have nonzero entries at the same locations and have as few nonzero entries as possible. Such problems arise in source localization (cf. [24]), neuromagnetic imaging (cf. [12]), and equalization of sparse communication channels (cf. [13,15]). A popular approach to find the sparse solution for multiple measurement vectors (MMV) is to solve the following optimization:
minimize x(k) ∈R N k=1,...,r
N (x1, j , . . . , xr , j ) p
1 / p
q
j =1
: subject to Ax
(k)
(k)
= b , k = 1, . . . , r ,
(2)
r
for q 1 and p 1, where x(k) = (xk,1 , . . . , xk, N ) T for all k = 1, . . . , r, (x1 , . . . , xr )q = ( j =1 |x j |q )1/q is the standard q norm. Clearly, it is a generalization of the standard 1 minimization approach for the sparse solution. That is, when r = 1, one finds the sparse solution x solving the following minimization problem:
minimize{x1 : subject to Ax = b}, x∈R N
* 1
Corresponding author. E-mail addresses:
[email protected] (M.-J. Lai),
[email protected] (Y. Liu). This author is partly supported by the National Science Foundation under grant DMS-0713807.
1063-5203/$ – see front matter doi:10.1016/j.acha.2010.11.002
© 2010
Elsevier Inc. All rights reserved.
(3)
M.-J. Lai, Y. Liu / Appl. Comput. Harmon. Anal. 30 (2011) 402–406
403
N
where x1 = j =1 |x j | is the standard 1 norm. Such a minimization problem (3) has been actively studied recently. See, e.g. [5–9,3,16,19,25], and references therein. In the literature, there are also several studies for various combinations of p 1 and q 1 in (2). See, e.g., references [13,11,24,27–29]. In particular, the well-known null space property (cf. [14] and [20]) for the standard 1 minimization has been extended to this setting (2) for multiple measurement vectors. In [4], the following result is proved. Theorem 1.1. Let A be a real matrix of size m × N and S ⊂ {1, 2, . . . , N } be a fixed index set. Denote by S c the complement set of S in {1, 2, . . . , N }. Let · be any norm. Then all x(k) with support x(k) in S for k = 1, . . . , r can be uniquely recovered using the following
N (x1, j , . . . , xr , j ): subject to Ax(k) = b(k) , k = 1, . . . , r minimize x(k) ∈R N k=1,...,r
(4)
j =1
if and only if all vectors (u(1) , . . . , u(r ) ) ∈ ( N ( A ))r \{(0, 0, . . . , 0)} satisfy the following
(u 1, j , . . . , ur , j ) < (u 1, j , . . . , ur , j ), j∈ S
(5)
j∈ S c
where N ( A ) stands for the null space of A. In [17], Foucart and Gribonval studied the MMV setting when r = 2, q = 2 and p = 1. They gave another nice explanation of the problem of MMV. When r = 2, one can view that the sparse solutions x(1) and x(2) are two components of a complex solution y = x(1) + ix(2) of Ay = c with c = b(1) + ib(2) . Then they recognize that the null space property for Ay = c for the solution as complex vector is the same as the null space property for Ax = b for solution as real vector. That is, they proved the following Theorem 1.2. Let A be a real matrix of size m × N and S ⊂ {1, . . . , N } be the support of the sparse vector y. The complex null space property: for any u ∈ N ( A ), w ∈ N ( A ) with (u, w) = 0,
u 2j + w 2j <
j∈ S
u 2j + w 2j ,
(6)
j∈ S c
where u = (u 1 , u 2 , . . . , u N ) T and w = ( w 1 , w 2 , . . . , w N ) T is equivalent to the following standard null space property: for any u in the null space N ( A ) with u = 0,
|u j | <
j∈ S
|u j |.
(7)
j∈ S c
Furthermore, the researchers in [17] raised two questions. One is how to extend their result from r = 2 to any r 3 and the other one is what happen when q = 2 and p < 1. These motivate us to study the joint sparse solution recovery. The study of the 1 minimization in (3) was generalized to the following p setting:
p
min x p , Ax = b
(8)
x∈R N
N
for a fixed number p ∈ (0, 1] (see for instance, [20,9], and [18]), in which x p = ( j =1 |x j | p )1/ p is the standard p quasinorm when p ∈ (0, 1). Therefore, we may consider a joint recovery from multiple measurement vectors via
minimize
N
x21, j + · · · + xr2, j
p
: subject to Ax(1) = b(1) , . . . , Ax(r ) = b(r )
(9)
j =1
for a given 0 < p 1, where x(k) = (xk,1 , . . . , xk, N ) T ∈ R N for all k = 1, . . . , r, and this is actually (2) for q = 2. Note that
when p → 0+ , we have ( x21, j + · · · + xr2, j ) p → 1 if any of x1, j , . . . , xr , j is nonzero and hence, N
x21, j + · · · + xr2, j
p
→s
i =1
which is the joint sparsity of the solution vectors x(k) , k = 1, . . . , r. Thus, the minimization in (9) makes sense. In fact, the minimization (9) has one advantage over the minimization in (2) when p = q = 1. That is, fewer measurements are needed for exact recovery by using the p minimization with p < 1 than the standard 1 convex minimization. Indeed, in [9], Chartrand demonstrated this fact by numerical examples with Gaussian random matrices and in [10], Chartrand and Staneva showed in theory that the p minimization (8) can recover the exact sparse solution by using a fewer measurements when
404
M.-J. Lai, Y. Liu / Appl. Comput. Harmon. Anal. 30 (2011) 402–406
p → 0+ than the standard convex 1 minimization. It is well known that the 1 minimization is convex and is equivalent to the standard linear programming problem which has two matured computational approaches: the interior point method and the simplex method which requires polynomial time for most practical problems (cf. [26]). However, when p < 1, the p minimization is a nonconvex minimization. Its computation is not as well understood as that of the 1 minimization. Nevertheless, one can find local minimizers of some regularized constrained or unconstrained p minimization as in [18] and [22] and some local minimizers have been empirically observed to be global minimizers, see numerical experiments in [18] and [22]. However, the computation takes more time than that of the 1 minimization. Thus if one finds an efficient computational algorithm, the p minimization for sparse solutions does provides a good approach to have a better sparse recovery than 1 minimization for practical applications. In this paper we mainly prove the following Theorem 1.3. Let A be a real matrix of size m × N and S ⊂ {1, 2, . . . , N } be a fixed index set. Fix p ∈ (0, 1] and r 1. Then the following conditions are equivalent: (a) All x(k) with support in S for k = 1, . . . , r can be uniquely recovered using (9); (b) For all vectors (u(1) , . . . , u(r ) ) ∈ ( N ( A ))r \{(0, 0, . . . , 0)}
u 21, j + · · · + u r2, j
p
j∈ S
<
u 21, j + · · · + u r2, j
p
;
(10)
j∈ S c
(c) For all vector z ∈ N ( A ) with z = 0,
j∈ S
|z j | p <
|z j | p ,
(11)
j∈ S c
where z = ( z1 , . . . , z N ) T ∈ R N . That is, it is enough to check (11) for all z ∈ N ( A ) in order to see the uniqueness of the joint sparse solution vectors. This significantly reduces the complexity of verification of (10). Also, our results extend the results in Theorem 1.1 from the norm in it to the quasi-norm. These results completely answer the questions raised in [17]. 2. The proof of the main theorem To show the equivalences in Theorem 1.3, we divide the proof into three parts. That is, we prove (1) (b) ⇒ (a); (2) (a) ⇒ (c); (3) (c) ⇒ (b). The first part, i.e. (b) ⇒ (a) is to show that (10) is a sufficient condition for the uniqueness of the joint sparse solution vectors. The proof is a straightforward generalization of the arguments in [21]. We spell out the detail as follows. Let x(k) , k = 1, . . . , r be the joint sparse solution vectors of the minimization (9) with the assumption that the support of each x(k) are contained in S. For any vectors u(1) , . . . , u(r ) in N ( A ) with an assumption that they are not simultaneously zero, we easily have, for 0 < p 1,
(x1, j , . . . , xr , j ) p (u 1, j , . . . , ur , j ) p + (x1, j + u 1, j , . . . , xr , j + ur , j ) p . j∈ S
2
j∈ S
2
(12)
2
j∈ S
By the property (10), we have
(x1, j , . . . , xr , j ) p < (u 1, j , . . . , ur , j ) p + (x1, j + u 1, j , . . . , xr , j + ur , j ) p . j∈ S
2
j∈ S c
2
(13)
2
j∈ S
But the support of the vectors x(k) , k = 1, . . . , r is contained in S. Hence, N (x1, j , . . . , xr , j ) p = (x1, j , . . . , xr , j ) p j =1
2
j∈ S
<
(u 1, j , . . . , ur , j ) p + (x1, j + u 1, j , . . . , xr , j + ur , j ) p j∈ S c
=
2
2
2
j∈ S
(x1, j + u 1, j , . . . , xr , j + ur , j ) p + (x1, j + u 1, j , . . . , xr , j + ur , j ) p . j∈ S c
2
j∈ S
So x(k) , k = 1, . . . r are the unique solution to the minimization problem (9).
2
M.-J. Lai, Y. Liu / Appl. Comput. Harmon. Anal. 30 (2011) 402–406
405
The second part is to show (a) implies (c). Assume that there exists some z ∈ N ( A ) with z = 0 and
|z j | p
j∈ S
|z j | p .
(14)
j∈ S c
We can choose x(1) ∈ R N such that the entries of x(1) restricted on S are equal to those of z, and the remaining entries are zeros as well as x(k) = 0 for k = 2, . . . , r. Then for multiple measurement vectors b(k) := Ax(k) , k = 1, . . . , r,
Ax(1) = Ax(1) − Az = A x(1) − z and by (14)
N (x1, j , 0, . . . , 0) p = (x1, j , 0, . . . , 0) p = ( z j , 0, . . . , 0) p 2
j =1
2
j∈ S
2
j∈ S
( z j , 0, . . . , 0) p = (x1, j − z j , 0, . . . , 0) p 2
j∈ S c
2
j∈N
which contradicts with the uniqueness of the recovery of the new measurement vectors b(k) , k = 1, . . . , r. This finishes the proof for the part that (a) implies (c). The third part is to show (c) implies (b). Let us start with a useful lemma. Let Sr −1 be the unit sphere in Rr . From the perspective of integral geometry, we know that S1 | ·, ξ | p dξ is actually a rotation invariant function on the vectors in Rr (cf. [1,2], and [23]). In fact, we have Lemma 2.1. Let r be an integer not less than 2. Then for any p > 0,
v , ξ p dξ = C
Sr −1
for all v ∈ Sr −1 , where C > 0 is a constant dependent only on p and r. Proof. Let U be an orthogonal transformation of Rr . Then for any v ∈ Sr −1 , the sphere of the unit ball in Rr , we have
U ( v ), ξ = v , U −1 (ξ )
(15)
for all ξ ∈ Sr −1 . Also, we know that
Sr −1 = U ( v ): U ∈ O (r ) ,
(16)
where O (r ) denotes the set of all r × r orthogonal matrices. By change of variables and using the fact that | det(U −1 )| = 1, we get
U ( v ), ξ p dξ =
Sr −1
v , U −1 (ξ ) p dξ =
Sr −1
v , U −1 (ξ ) p dU −1 (ξ ) =
Sr −1
v , ξ p dξ
(17)
Sr −1
for all U ∈ O (r ). Thus we see that Sr −1 | v , ξ | p dξ ≡ C for some C > 0 and for all v ∈ Sr −1 .
2
Next we need to have a comparison result for any matrix B ∈ Rr × N of size r × N with any integer r 2. Lemma 2.2. Let S ⊂ {1, 2, . . . , N } be an index set with | S | = s. Given 0 < p 1 and a matrix B = [b i j ]1i r ,1 j N ∈ Rr × N , if
(x1 , x2 , . . . , xr ) B S < (x1 , x2 , . . . , xr ) B S c p p
for all (x1 , . . . , xr ) ∈ Rr \{(0, . . . , 0)}, then
b21, j + · · · + br2, j
j∈ S
p
<
b21, j + · · · + br2, j
(18)
p .
(19)
j∈ S c
Proof. Let us rewrite (18) as follows.
j∈ S
|b1, j x1 + · · · + br , j xr | p <
j∈ S c
|b1, j x1 + · · · + br , j xr | p
(20)
406
M.-J. Lai, Y. Liu / Appl. Comput. Harmon. Anal. 30 (2011) 402–406
for all (x1 , x2 , . . . , xr ) ∈ Rr \ {(0, 0, . . . , 0)}. Normalizing (b1, j , . . . , br , j ), we let v j :=
1
b21, j +···+br2, j
b21, j + · · · + br2, j
(b1, j , . . . , br , j ). Then we have
p
p p v j , ξ p < b21, j + · · · + br2, j v j , ξ
j∈ S
(21)
j∈ S c
for all vector ξ = (x1 , x2 , . . . , xr ) ∈ Rr \ {(0, 0, . . . , 0)}. Taking the integral of (21) over the unit (r − 1)-sphere Sr −1 , we have
b21, j + · · · + br2, j
j∈ S
p
p v j , ξ p dξ < v j , ξ p dξ. b21, j + · · · + br2, j Sr −1
j∈ S c
(22)
Sr −1
By using Lemma 2.1, Sr −1 | ·, ξ | p dξ is a positive constant. Therefore, (19) follows.
2
We are now ready to prove the third part of Theorem 1.3 for p ∈ (0, 1). Now assume that we have (11) for r 2. For any (u(1) , u(2) , . . . , u(r ) ) of vectors in ( N ( A ))r \{(0, 0, . . . , 0)}, we let B = [u(1) , . . . , u(r ) ] T be a matrix in Rr × N , where 0 stands for a zero vector of size N × 1. For any (x1 , x2 , . . . , xr ) ∈ Rr \ {(0, 0, . . . , 0)}, z = (x1 , x2 , . . . , xr ) B is in N ( A ) \ {(0, 0, . . . , 0)}. The null space property of z, i.e. (11) implies (18) for all (x1 , x2 , . . . , xr ) ∈ Rr \ {(0, 0, . . . , 0)}. The conclusion (19) of Lemma 2.2 implies the null space property (10) for r 2. The above discussions show that (c) implies (b) in Theorem 1.3. These three parts above furnish a proof of Theorem 1.3. Acknowledgments We would like to thank the reviewers for the valuable comments that have helped significantly improve the presentation of this paper. References [1] S. Alesker, Continuous rotation invariant valuations on convex sets, Ann. of Math. (2) 149 (1999) 977–1005. [2] S. Alesker, J. Bernstein, Range characterization of the cosine transform on higher Grassmannians, Adv. Math. 184 (2004) 367–379. [3] A.M. Bruckstein, D.L. Donoho, M. Elad, From sparse solutions of systems of equations to sparse modeling of signals and images, SIAM Rev. 51 (2009) 34–81. [4] E. van den Berg, M.P. Friedlander, Joint-sparse recovery from multiple measurements, arXiv:0904.2051v1, 2009. [5] E.J. Candès, Compressive sampling, in: International Congress of Mathematicians, vol. III, Eur. Math. Soc., Zürich, 2006, pp. 1433–1452. [6] E.J. Candès, The restricted isometry property and its implications for compressed sensing, C. R. Acad. Sci. Paris, Ser. I 346 (2008) 589–592. [7] E.J. Candès, J.K. Romberg, T. Tao, Stable signal recovery from incomplete and inaccurate measurements, Comm. Pure Appl. Math. 59 (2006) 1207–1223. [8] E.J. Candès, T. Tao, Near-optimal signal recovery from random projections: universal encoding strategies, IEEE Trans. Inform. Theory 52 (12) (2006) 5406–5425. [9] R. Chartrand, Exact reconstruction of sparse signals via nonconvex minimization, IEEE Signal Process. Lett. 14 (2007) 707–710. [10] R. Chartrand, V. Staneva, Restricted isometry properties and nonconvex compressive sensing, Inverse Problems 24 (3) (2008), doi:10.1088/02665611/24/3/035020. [11] J. Chen, X. Huo, Theoretical results on sparse representations of multiple-measurement vectors, IEEE Trans. Signal Process. 54 (December 2006) 4634– 4643. [12] S.F. Cotter, B.D. Rao, Sparse channel estimation via matching pursuit with application to equalization, IEEE Trans. Commun. 50 (3) (March 2002) 374–377. [13] S.F. Cotter, B.D. Rao, K. Engang, K. Kreutz-Delgado, Sparse solutions to linear inverse problems with multiple measurement vectors, IEEE Trans. Signal Process. 53 (July 2005) 2477–2488. [14] D.L. Donoho, M. Elad, Optimally sparse representation in general (nonorthogonal) dictionaries via 1 minimization, Proc. Natl. Acad. Sci. USA 100 (5) (March 2003) 2197–2202. [15] I.J. Fevrier, S.B. Gelfand, M.P. Fitz, Reduced complexity decision feedback equalization for multi-path channels with large delay spreads, IEEE Trans. Commun. 47 (6) (June 1999) 927–937. [16] S. Foucart, A note on guaranteed sparse recovery via 1 -minimization, Appl. Comput. Harmon. Anal. 29 (2010) 97–103. [17] S. Foucart, R. Gribonval, Real vs. complex null space properties for sparse vector recovery, C. R. Math. Acad. Sci. Paris 348 (15–16) (August 2010) 863–865. [18] S. Foucart, M.J. Lai, Sparsest solutions of underdetermined linear systems via q minimization for 0 < q 1, Harmon. Anal. 26 (2009) 395–407. [19] S.Foucart, M.J. Lai, Sparse recovery with pre-gaussian random matrices, Studia Math. 200 (2010) 91–102. [20] R. Gribonval, M. Nielsen, Sparse representations in unions of bases, IEEE Trans. Inform. Theory 49 (12) (December 2003) 3320–3325. [21] R. Gribonval, M. Nielsen, Highly sparse representations from dictionaries are unique and independents of the sparseness measure, Appl. Comput. Harmon. Anal. 22 (2007) 335–355. [22] M.J. Lai, J. Wang, An unconstrained lq minimization for sparse solution of under determined linear systems, 2010, submitted for publication. [23] Y. Lonke, Derivatives of the L p -cosine transform, Adv. Math. 176 (2003) 175–186. [24] D. Malioutov, M. Cetin, A.S. Willsky, A sparse signal reconstruction perspective for source localization with sensor arrays, IEEE Trans. Signal Process. 53 (2005) 3010–3022. [25] M. Rudelson, R. Vershynin, Non-asymptotic theory of random matrices: extreme singular values, in: Proceedings of the International Congress of Mathematicians Hyderabad, India, 2010. [26] D.A. Spielman, S.H. Teng, Smoothed analysis of algorithms: why the simplex algorithm usually takes polynomial time, J. ACM 51 (2004) 385–463. [27] J.A. Tropp, Recovery of short, complex linear combinations via 1 minimization, IEEE Trans. Inform. Theory 51 (4) (April 2005) 1568–1570. [28] J.A. Tropp, Algorithms for simultaneous sparse approximation: Part II: Convex relaxation, Signal Process. 86 (2006) 589–602. [29] J.A. Tropp, A.C. Gilbert, M.J. Strauss, Algorithms for simultaneous sparse approximation: Part I: Greedy pursuit, Signal Process. 86 (2006) 572–588.