Exploiting Group Sparsity in Nonlinear Acoustic Echo Cancellation by ...

Viewer
Transcript

IEICE TRANS. FUNDAMENTALS, VOL.E96–A, NO.10 OCTOBER 2013

1918

PAPER

Special Section on Sparsity-aware Signal Processing

Exploiting Group Sparsity in Nonlinear Acoustic Echo Cancellation by Adaptive Proximal Forward-Backward Splitting Hiroki KURODA†a) , Nonmember, Shunsuke ONO† , Student Member, Masao YAMAGISHI† , Member, and Isao YAMADA† , Senior Member

SUMMARY In this paper, we propose a use of the group sparsity in adaptive learning of second-order Volterra filters for the nonlinear acoustic echo cancellation problem. The group sparsity indicates sparsity across the groups, i.e., a vector is separated into some groups, and most of groups only contain approximately zero-valued entries. First, we provide a theoretical evidence that the second-order Volterra systems tend to have the group sparsity under natural assumptions. Next, we propose an algorithm by applying the adaptive proximal forward-backward splitting method to a carefully designed cost function to exploit the group sparsity eﬀectively. The designed cost function is the sum of the weighted group 1 norm which promotes the group sparsity and a weighted sum of squared distances to data-fidelity sets used in adaptive filtering algorithms. Finally, Numerical examples show that the proposed method outperforms a sparsity-aware algorithm in both the system-mismatch and the echo return loss enhancement. key words: group sparsity, nonlinear acoustic echo cancellation (NLAEC), adaptive Volterra filter, weighted group 1 norm, adaptive proximal forward-backward splitting (APFBS)

1.

Introduction

The Volterra system has been used as a general model of a large variety of real-world nonlinear systems [1]–[4]. Adaptive Volterra filters have been applied to the nonlinear acoustic echo cancellation (NLAEC) problem which has become increasingly important to deal with nonlinearity, e.g., in loudspeaker systems [5]–[10]. A major goal of the NLAEC problem is to estimate the overall nonlinear echo path which is the cascade of the loudspeaker system and the acoustic impulse response (AIR). Figure 1 illustrates a typical nonlinear echo path and its model, where the AIR is modeled by a finite impulse response (FIR) system, and the loudspeaker system is modeled by a second order Volterra system (SOV1). Therefore, the overall nonlinear echo path can also be modeled by another second order Volterra system (SOV2) with larger memory [3]–[6]. Essentially, certain sparsities in the AIR have been observed since early 90’s, where we call the system is sparse if most of the entries of the system parameters are approximately zero except small number of entries of relatively large magnitude (see, e.g., [11], [12]). Since then, the “sparsity” has started to be exploited for accelerating the convergence speed of adaptive identification of the sparse AIR Manuscript received January 20, 2013. Manuscript revised May 16, 2013. † The authors are with the Department of Communications and Computer Engineering, Tokyo Institute of Technology, Tokyo, 152-8552 Japan. a) E-mail: [email protected] DOI: 10.1587/transfun.E96.A.1918

[13], [14], and recently has been utilized for the steadystate performance improvements with use of the weighted 1 norm [15]–[17]. At the same time, it has been reported experimentally that the overall nonlinear echo path can also be simulated eﬀectively by sparse second-order Volterra systems [7], [8], where we call the Volterra system is sparse if the coeﬃcient vector and matrix are approximately sparse. Indeed, the sparsity in the Volterra system has also been exploited to improve the performance of the adaptive NLAEC algorithms [7], [8], [18], which suggests that we could improve further the performance of the adaptive NLAEC if we eﬀectively utilize more precise features than the “sparsity” of the overall nonlinear echo path. In this paper, motivated by a natural observation that the “group sparsity” (or “structured sparsity”) is more precise feature than the general “sparsity” of a typical AIR (see, e.g., [19], [20]), we propose an eﬀective use of the group sparsity of SOV2 in the adaptive learning of the overall nonlinear echo path. The group sparsity of SOV2 indicates that the components of the coeﬃcient vector and matrix of SOV2 are separated into some groups, and most of groups only contain approximately zero-valued components, while the group sparsity of the AIR simply implies the sparsity across the groups of the components in the impulse response vector. The authors of [19] have shown a theoretical evidence of the group sparsity of the AIR and verified empirically that the AIR is well-captured by this knowledge. Despite the group sparsity of a typical AIR, the group sparsity of SOV2 which models the overall nonlinear echo path is not obvious because of the existence of SOV1. We first show theoretically that SOV2 tends to have the group sparsity under natural assumptions. That is, if the memory of SOV1 is much smaller than the length of the AIR which is group sparse, SOV2 is guaranteed to have a certain group sparsity (see for validity of these assumptions [3], [19], [21]). Fortunately, the group sparsity of a vector can be promoted by suppressing its weighted group 1 norm (also called weighted 1,2 norm) defined as the weighted sum of the 2 norm of the components in each group in the vector [22]–[27]. Based on this fact, to exploit the group sparsity of SOV2 eﬀectively, we propose an adaptive algorithm by applying adaptive proximal forward-backward splitting (APFBS) [15], [28]–[34] to a time-varying cost function Θk designed as the sum of the weighted group 1 norm and the weighted sum of the squared distances to multiple data-

c 2013 The Institute of Electronics, Information and Communication Engineers Copyright

KURODA et al.: EXPLOITING GROUP SPARSITY IN NONLINEAR AEC BY ADAPTIVE PROXIMAL FORWARD-BACKWARD SPLITTING

1919

Fig. 1

A typical echo path and its model.

fidelity sets used in existing adaptive filtering algorithms [15], [35], [36]. Suppressing the weighted group 1 norm in Θk promotes the group sparsity of the adaptive Volterra filter. The acceleration of convergence of the adaptive filtering is achieved by the eﬃcient data-reusing through suppression of the weighted sum of the squared distances to multiple data-fidelity sets. We conduct numerical examples to show the eﬃcacy of the proposed group sparsity promoting term by comparing the proposed method with the APFBS employing the weighted 1 norm as a sparsity promoting term and the APFBS without any kind of sparsity promoting term. The results show that the proposed method using the group sparsity promoting term achieves the best steady-state behavior in both the system mismatch and the echo return loss enhancement. This paper is organized as follows. In Sect. 2, we present mathematical preliminaries and a formulation of the NLAEC problem. Main contribution is collected in Sect. 3 where a theoretical evidence of the group sparsity in SOV2 is given, and the APFBS using the weighted group 1 norm is proposed as a group sparsity aware adaptive identification method for second-order Volterra filters. Numerical examples are shown in Sect. 4. 2.

Preliminaries

2.1 Mathematical Tools Let R and N denote the sets of all real numbers and nonnegative integers, respectively. In addition, define N∗ := N \ {0}. For every vector x√∈ RN (N ∈ N∗ ), we define the 2 norm of x by x2 := xt x, which is the norm induced by the Euclidean inner product x, y := xt y (∀x, y ∈ RN ), where (·)t denotes the transpose operation. A set C ⊂ RN is said to be convex if αx + (1 − α)y ∈ C for every x, y ∈ C and 0 ≤ α ≤ 1. Let C ⊂ RN be a nonempty closed convex set. For every x ∈ RN , the distance between x and C is defined by d(x, C) := miny∈C x − y2 , and the metric projection of x onto C is defined by PC (x) := arg miny∈C x − y2 . For any x ∈ RN and y ∈ C, we have

y = PC (x) ⇔ (∀z ∈ C) x − y, z − y ≤ 0. In particular, if M ⊂ RN is a (linear) subspace, we have y = PM (x) ⇔ (∀z ∈ M) z, x − y = 0, and PM : RN → M is a linear operator. A function f : RN → R is said to be convex if f (αx + (1 − α)y) ≤ α f (x) + (1 − α) f (y) for every x, y ∈ RN and 0 ≤ α ≤ 1. For every convex function f : RN → R, the proximity operator [37] of index γ ∈ (0, ∞) is defined by 1 2 proxγ f (x) := arg min γ f (y) + x − y2 . 2 y∈RN For every x ∈ RN and X ∈ RN×N , we define the support of x and X by supp(x) := {i ∈ {1, . . . , N} | xi 0} and supp(X) := (i, j) ∈ {1, . . . , N} × {1, . . . , N} Xi, j 0 , which are the sets of all indices corresponding to the nonzero entries of x and X, respectively. For every finite set A, we denote the number of elements of A by |A|. 2.2 Nonlinear Acoustic Echo Cancellation Problem Consider an echo path modeled by the cascade of a secondorder Volterra system (SOV1) and a finite impulse response (FIR) system, in Fig. 1. The former system has been commonly used to model nonlinearities of loudspeakers, and the latter system models the AIR [3]–[6]. Our model is uk = st xk + xtk Sxk , zk = rt uk ,

(1) (2)

where xk = [xk , xk−1 , . . . , xk−L+1 ]t , s ∈ RL , S ∈ RL×L , uk = [uk , uk−1 , . . . , uk−M+1 ]t , r ∈ R M , k ∈ N∗ is the time index, L is the memory length of the SOV1, and M is the length of the FIR system. Typically, xk is the far-end speech signal, uk is the output of the speaker system, and zk is the echo signal. Without loss of generality, since xtk Sxk is a quadratic form, we can limit S to a lower triangular matrix. The overall system, i.e., SOV2, through (1) and (2) can be expressed in a single equation:

IEICE TRANS. FUNDAMENTALS, VOL.E96–A, NO.10 OCTOBER 2013

1920

⎡ t ⎤ ⎡ ⎤ xt Sxk ⎢⎢⎢ s xk ⎥⎥⎥ ⎢⎢⎢ ⎥⎥⎥ ⎢⎢⎢ st x ⎥⎥⎥ ⎢⎢⎢ xt k Sx ⎥⎥⎥ k−1 ⎥ k−1 ⎢ ⎢ ⎥⎥⎥ k−1 ⎥ t⎢ t⎢ ⎢ ⎥ ⎢ zk = r ⎢⎢⎢ ⎥⎥⎥ + r ⎢⎢⎢ ⎥⎥⎥ .. .. ⎢⎢⎢ ⎥⎥⎥ ⎢⎢⎢ ⎥⎥⎥ . . ⎣ t ⎦ ⎣ t ⎦ s xk−M+1 xk−M+1 Sxk−M+1 = x¯ tk h + x¯ tk H x¯ k ,

(4)

where ϕk : RN → R is a smooth convex function with Lips chitz continuous† gradient ∇ϕk : RN → RN , ψk : RN → R is a possibly nonsmooth convex function, and λ ∈ (0, ∞). In general, ϕk is utilized as a data-fidelity term, ψk is used to exploit a priori knowledge, and λ controls their importance (such examples are given below in Sects. 3.2 and 3.3). For N an arbitrarily chosen h0 ∈ R , the APFBS generates a sehk by quence k∈N μ hk , hk+1 = prox μλ ψk hk − ∇ϕk (10) Lk Lk

(5)

where μ ∈ (0, 2) is the step-size, and Lk > 0 is a Lipschitz constant of ∇ϕk .

(3)

where x¯ k = [xk , xk−1 , . . . , xk−N+1 ]t ∈ RN , h ∈ RN and H ∈ RN×N (N := M+L−1). h and H are respectively defined by h :=

M

r s¯ ,

=1

H :=

M

r S¯ ,

=1

Remark 2 (Properties of the APFBS):

where s¯ := [0−1 , s, 0N−L−+1 ] ∈ R , t

N

(6)

0m := [0, 0, . . . , 0] ∈ Rm (m ∈ N), (NOTE: 00 is defined as ∅), and S¯ ∈ RN×N is defined by ⎧ ⎪ ⎪ ⎨S i−+1, j−+1 , if ≤ j ≤ i ≤ + L − 1, S¯ := ⎪ (7) ⎪ i, j ⎩0, otherwise, for every ∈ {1, . . . , M}. A major goal of the nonlinear acoustic echo cancellation problem is to estimate SOV2 in terms of h and H . By a simple calculation, (3) can be reduced to a wide-sense linear model (see Remark 1): t xk h , (8) zk = t t xk = x¯ tk , vec x¯ k x¯ tk ∈ RN , where h = (h )t , vec(H )t t , = N + N 2 = (M + L)(M + L − 1), and vec(·) : RN×N → RN 2 N t 2 is defined by vec( A) := at1 , at2 , . . . , atN ∈ RN for every A = [a1 , a2 , . . . , aN ] ∈ RN×N . Remark 1: Note that any linear system must be defined on a subspace. However, the domain of (8) t t t t N x¯ ∈ R ⊂ RN V := x¯ , vec x¯ x¯

is not a subspace of RN . In this strict sense, we refer to (8) as the wide-sense linear model. 2.3 Adaptive Proximal Forward-Backward Splitting The adaptive proximal forward-backward splitting (APFBS) [15], [28]–[32] is an algorithm for suppressing a timevarying cost function Θk , which is designed based on a cur xi , di ki=1 with a priori rent estimate hk and available data t information, where di = xi h + vi , and vi is the additive noise. We define the cost function Θk : RN → R by h := ϕk h + λ ψk h , (9) Θk

(a) (Monotone approximation) The APFBS (10) for any μ ∈ (0, 2) satisfies the (strictly) monotone approximation property hΘk < hΘk , (11) hk − hk+1 − 2

2

hΘk

for any ∈ Ωk := arg minu∈RN Θk (u), if hk Ωk [15], [35], [36], [38]. (b) (Performance in experiments) Extensive experiments on the APFBS (10) have shown its excellent convergence behavior in numerical examples [15], [28]–[34]. (c) (Relation to an algorithm in convex optimization) The APFBS (10) is an adaptive generalization of the proximal forward-backward splitting method [39], [40]. This implies that, in the time-invariant case where ϕk = ϕ and ψk = ψ for any k ∈ N∗ , the sequence generated by (10) is guaranteed to converge to a minimizer of the function Θk = Θ. 2.4 Weighted Group 1 Norm Promoting the group sparsity of a vector h ∈ RN can be achieved by suppressing its weighted group 1 norm [22]– [27] defined as the weighted sum of the 2 norm of subvectors hJg (g = 1, . . . , G) of the vector h, i.e.,

G w(k)

w(k) h := g hJg , 1,2 2

(12)

g=1

G where w(k) g g=1 are positive weights at time k, hJg := ! G are group hi i ∈ Jg ∈ R|Jg | is a subvector†† , and Jg g=1 and satpartitions of the whole index set Jall := 1, . . . , N isfy † We call a mapping T : R M → R M Lipschitz continuous with a Lipschitz constant κ if there exists κ (> 0) which satisfies T (x) − T (y)2 ≤ κ x − y2 for every x, y ∈ R M . †† For every vector x ∈ R M and finite set A, (xi | i ∈ A) ∈ R|A| is a vector consisting of elements of {xi | i ∈ A} and sorted in ascending order by index.

KURODA et al.: EXPLOITING GROUP SPARSITY IN NONLINEAR AEC BY ADAPTIVE PROXIMAL FORWARD-BACKWARD SPLITTING

1921 G "

Jg = Jall ,

(13)

g=1

Jg ∩ Jg = ∅ (if g g ).

(14)

Note that the weighted group 1 norm reduces to the weighted 1 norm if Jg = 1 (g = 1, . . . , G). Some theo(k) retical evidence on the group sparsity promotion by ·w 1,2 has been reported, e.g., [26], [27]. As a natural extension of the widely used design† for the weighted 1 norm [41] to the weighted group 1 norm, we employ the following weight: 1 w(k) , g = hk J + ρ g 2

(15)

where ρ is a suﬃciently small real number (NOTE: the same weight is used in [42]). Fortunately, the proximity operator of index γ ∈ (0, ∞) (k) of ·w 1,2 can be calculated eﬃciently in a closed-form, i.e., ⎧ hJ ⎪ ⎪ ! ⎪ hJg−γwg(k) g , ⎨ h Jg 2 proxγ·w(k) h =⎪ ⎪ 1,2 Jg ⎪ ⎩0,

hJg > γwg(k), 2

(16)

otherwise.

The derivation of (16) is shown in [43], [44]. The operator G proxγ·w(k) using weights w(k) g g=1 defined in (15) promotes 1,2 the group sparsity eﬀectively because this operator shrinks (k) hJg by a large γwg if hJg is small. 2

3.

Fig. 2

for some positive integers ξ p and δ p (1 ≤ p ≤ P) with ξ1 ≤ ξ1 + δ1 − 1 < ξ2 ≤ ξ2 + δ2 − 1 < . . . < ξP ≤ ξP + δP − 1 ≤ M, P ∈ N∗ . (ii) (Short memory length of SOV1) Memory length L of s and S is much smaller than M, i.e., L M. Then, supp(h ) is included in P intervals of size δ p +L− 1 (p = 1, . . . , P), and supp(H ) is included in indices of P minor matrices of size (δ p + L − 1) × L (p = 1, . . . , P), i.e.,

Main Contributions supp(h ) ⊂

3.1 Theoretical Guarantee of Group Sparsity

Proposition 1 (Group sparsity of (h , H )): (a) (Support of (h , H ) in general case) h ∈ RN in (4) and H ∈ RN×N in (5) satisfy

supp(h ) ⊂ D1 := {1, . . . , N}, supp(H ) ⊂ D2 := {(i, j) ∈ D1 ×D1 | 0 ≤ i− j ≤ L−1} . (17) See Fig. 2 for the illustration of D2 . (b) (Support of (h , H ) for group sparse AIR) Assume that r ∈ R M satisfies the following conditions. (i) (Strict group sparsity of the AIR) supp(r) is included in P ( M) short intervals of size δ p (p = 1, . . . , P), i.e., P " ξ p , ξ p + 1, . . . , ξ p + δ p − 1 , p=1

P "

j ∈ D1 ξ p ≤ j ≤ ξ p +δ p +L−2 ,

(19)

p=1

The following proposition presents the support of the system parameters h of SOV2. In particular, Proposition 1 (b) suggests that the group sparsity of the AIR guarantees a certain group sparsity of SOV2.

supp(r) ⊂

An illustration of the group partitions.

(18)

supp(H ) ⊂

P " (i, j) ∈ D2 ξ p ≤ j ≤ i ≤ ξ p +δ p +L−2 . (20) p=1

The proof is given in Appendix A. Note that validity of the assumptions (i) and (ii) is provided in [3], [19], [21]. From Proposition 1(a), we have the knowledge on the support of h in general case. This knowledge will be used to design the cost function Θk in Sects. 3.2 and 3.3. Corollary 1 (Support of the system h in general case): Components of the system h are zero outside the index set D := D1 ∪ vecind (D2 ), i.e., # $ h ∈ RN h ∈ M := (21) hi = 0 (∀i ∈ Dc ) , where Dc = Jall \ D, and†† , vecind (i, j) := N j + i, vecind : D2 → N +1, . . . , N † The idea of this design is to approximate the 0 pseudo norm x0 := |supp(x)| by a weighted 1 norm. †† For every set A ⊂ D2 , vecind (A) := {vecind (i, j) | (i, j) ∈ A} ⊂ . N +1, . . . , N

IEICE TRANS. FUNDAMENTALS, VOL.E96–A, NO.10 OCTOBER 2013

1922

which represents the relation between an index of H and its corresponding index of h .

Note that, since M is a closed subspace, PM : RN → M is a linear operator. 3.2 Design of ψk To promote the group sparsity, we adopt the weighted group 1 norm w(k) h = h (22) ψk 1,2 with the weights defined in (15). Although many designs of groups can be considered, here we introduce a simple design of groups which reflects the fact that the support of h and the column index j of the support of H are included in the same intervals (see (19) and (20)). As illustrated in Fig. 2, groups {Jg }G−1 g=1 of h are defined by Jg := Jgh ∪ vecind JgH for every g ∈ {1, . . . , G −1} with JG := Dc , where Jgh and JgH are the index set of h and H defined by Jgh := { j ∈ D1 | nsize (g − 1) + 1 ≤ j ≤ nsize g} , JgH := {(i, j) ∈ D2 | nsize (g − 1) + 1 ≤ j ≤ nsize g} , for every g ∈ {1, . . . , G −1}, nsize ∈ N∗ defines a group size, and† G = N/nsize + 1. The designed {Jg }Gg=1 satisfy the conditions (13) and (14). 3.3 Design of ϕk

3.4 Proposed Algorithm (G1 -APFBS) By applying the APFBS to the designed ψk in (22) and ϕk in (23), for arbitrarily chosen h0 ∈ M, we obtain the update ⎞ ⎛

⎜⎜⎜ ⎟⎟⎟ (k) ⎜⎜⎜(1−μ) hk ⎟⎟⎟⎠ , hk +μ ωι PS ι() ∩M ⎝

hk+1 = proxμλ·w(k) 1,2

ι∈Ik

(25) where μ ∈ (0, 2) is the step-size. We call this algorithm G1 APFBS. The projection PS () ∩M : RN → RN can be computed by k

h = P+S () PM h , PS () ∩M k

where††

# + h ∈ RN S k() :=

k

(26)

$ xk − dk ≤ , ht PM

⎧ h, h∈+ S k(), ⎪ ⎪ ⎪ ⎨ h =⎪ P+S () +() xk )−dk −sgn xk )−dk ht PM ( ht PM ( ⎪ k ⎪ xk , h S k , h− PM ⎩ 2 x P M ( k )2 (27) ⎧ ⎪ ⎪ hi , i ∈ D, ⎨ h =⎪ PM ⎪ i ⎩ 0, otherwise. The derivation of (26) is given in Appendix B. Remark 3:

We define the data-fidelity term as the weighted sum of the squared distances to closed convex sets, i.e., 1 2 () ω(k) (23) ϕk h = ι d h, S ι ∩ M , 2 ι∈I k

where Ik stands for the indices of the data-fidelity sets to be ∈ (0, 1] (ι ∈ Ik ) are processed at time k, the weights ω(k) ι chosen to satisfy Σι∈Ik ω(k) ι = 1, M is defined in (21), and the data-fidelity set S k() is defined by # $ t h ∈ RN S k() := xk − dk ≤ , h with user-defined nonnegative . The desired system h belongs to S k() with high probability, where determines the reliability of S k() . Moreover, by Corollary 1, the desired system h is contained in M, and thus h is expected to belong () to S k ∩ M ∅ (NOTE: the nonemptiness is verified in Appendix B). The gradient of ϕk is given by h = ω(k) h − PS ι() ∩M h ∀ h ∈ RN (24) ∇ϕk ι ι∈Ik

of which the Lipschitz constant is 1(> 0) for all k ∈ N∗ .

(a) (Computational complexity) Although the size of h is = N + N 2 = (M + L)(M + L − 1), the number of N the coeﬃcients to be stored is reduced to dim M = (L + 1)(M + L/2) − 1 because the update (25) guarantees hk ∈ M ⇒ hk+1 ∈ M by the operation (16) (see the definition of the subspace M in (21)). In addition, the computational complexity of proxγ·w(k) is 1,2 fairly low, and it does not aﬀect the overall complexity of the proposed method. (b) (Use of group sparsity in linear system identification) Recently, an adaptive algorithm using the weighted group 1 norm (12) has been proposed as an extension of the Reweighted Zero Attracting LMS (RZA-LMS) [17]. However, we already confirmed in [33], [34] that a version of the APFBS, which is the proposed algorithm (25) specialized for Jg = 1 (g = 1, . . . , G), outperforms the RZA-LMS, as a sparsity aware adaptive filtering algorithm. (c) (Diﬀerent use of group sparsity in Volterra filters) A diﬀerent application of the group sparsity to the † For every x ≥ 0, the ceiling function of x is defined by x = min{y ∈ N | y ≥ x}. †† sgn(·) : R → R denotes the sign of argument.

KURODA et al.: EXPLOITING GROUP SPARSITY IN NONLINEAR AEC BY ADAPTIVE PROXIMAL FORWARD-BACKWARD SPLITTING

1923 Table 1 Parameter settings for the algorithms in the experiments. In the G1 -APFBS and the 1 APFBS, λ is chosen in such a way that their steady-state behavior are best in our experiments. Input signal

AR(1) signal

Sum of speech and noise

Algorithm

λ

ρ

G1 -APFBS

5 × 10−8

10−2

30

1 -APFBS

5 × 10−10

10−2

none

μ

{k, k − 1, . . . , k − 9}

1/ |Ik |

10−1

1

{k, k − 1, . . . , k − 9}

1/ |Ik |

10−1

1 1

Null-APFBS

0

none

none

{k, k − 1, . . . , k − 9}

1/ |Ik |

G1 -APFBS

2 × 10−6

10−2

30

{k, k − 1, . . . , k − 9}

1/ |Ik |

10−5

1

1 -APFBS

9 × 10−9

10−2

none

{k, k − 1, . . . , k − 9}

1/ |Ik |

10−5

1

Null-APFBS

0

none

none

{k, k − 1, . . . , k − 9}

1/ |Ik |

10−5

1

Numerical Examples

We compare the performance of the proposed method (G1 APFBS) with special cases of the APFBS employing the same fidelity term defined in (23) to demonstrate the eﬃcacy of the proposed group sparsity promoting term in the NLAEC problem. We refer the APFBS with λ=0 to NullAPFBS, and the APFBS using the weighted 1 norm to 1 APFBS. More precisely, ψk in (9) for the 1 -APFBS is defined by N 1 h := ψk hi , (k) i=1 hi + ρ

10−1

second-order Volterra filters has been proposed in [45] to reduce the complexity in the nonlinear system of multiple Volterra filters where each subsystem is regarded as a group. 4.

ωι(k)

Ik

nsize

(28)

where the weight is designed according to [41] (NOTE: the weight used in (28) can also be interpreted as a simplest exmaple of (15) for Jg = 1 (g = 1, . . . , G)). See Table 1 for the parameter settings for these algorithms in our experiments. The algorithms are compared in the systemmismatch defined by 2 hk h − sysmis(k) := 10 log10 2 2 h 2 and in an approximation of the echo return loss enhancement (ERLE) defined by ,k 2 j=k−+1 z j ERLE(k) := 10 log10 ,k , (29) 2 j=k−+1 (z j − y j ) where the system output z j is defined in (8), y j is the filter output, and the results are averaged over 100 runs. The system h and H are generated by (4) and (5), where s and S are generated from [−1, 1] uniform distribution with L = 15, and r is generated according to ITU-T G.168 [46] with M = 256 (see also Fig. 3). The data signal is generated according to dk = zk + vk , where vk is white Gaussian noise, and SNR is set to 30 dB. We conduct experiments for two

kinds of input signals, i.e., auto regressive signal (AR(1)) and a more realistic signal consisting of speech and back ground noise. Experiment for the AR(1) input: The input xk is generated by xk+1 = 0.1xk + nk and then normalized to variance one, where nk follows the i.i.d. Gaussian distribution N(0, 1). In (29), is set to 103 . As shown in Fig. 4(a) and Fig. 4(b), the proposed method (G1 -APFBS) achieves the best steady-state behavior. In the system-mismatch, the G1 APFBS improves 2.65 dB and 4.25 dB in the last 104 iterations, compared with the 1 -APFBS and the Null-APFBS, respectively. Similarly, in the ERLE, the G1 -APFBS improves 2.54 dB and 4.07 dB, respectively. Experiment for the speech and noise input: The input xk is the sum of the speech signal k taken from [47] and white Gaussian noise nk , i.e., xk+1 = k + nk , where SNR between k and nk is set to 10 dB. In (29), is set to 104 . As shown in Fig. 4(c) and Fig. 4(d), the proposed G1 -APFBS achieves the best steady-state behavior. In the system-mismatch, the G1 -APFBS improves 1.64 dB and 3.90 dB in the last 104 iterations, compared with the 1 APFBS and the Null-APFBS, respectively. Similarly, in the ERLE, the G1 -APFBS improves 1.48 dB and 3.94 dB, respectively. These gains demonstrate the eﬃcacy of the use of the group sparsity promoting term in the G1 -APFBS. 5.

Conclusion

We have proposed an eﬀective use of the group sparsity in adaptive learning of second-order Volterra filters for the nonlinear acoustic echo cancellation problem. First, we provided a theoretical evidence of the group sparsity of the overall nonlinear echo path under two natural assumptions that the coeﬃcient vector of the acoustic impulse response (AIR) is group sparse, and the memory length of the model of the loudspeaker system is much smaller than the length of the AIR. Next, to utilize the group sparsity effectively, we have proposed an adaptive algorithm by applying the adaptive proximal forward-backward splitting (APFBS) to the carefully designed Θk which is the sum of the weighted group 1 norm and the weighted sum of the squared distances to the data-fidelity sets. Numerical examples have demonstrated that our proposed method (G1 APFBS) achieves the better steady-state behavior than the

IEICE TRANS. FUNDAMENTALS, VOL.E96–A, NO.10 OCTOBER 2013

1924

Fig. 3

Fig. 4

An outcome of H generated by (5).

Comparison of the G1 -APFBS with the Null-APFBS and the 1 -APFBS.

KURODA et al.: EXPLOITING GROUP SPARSITY IN NONLINEAR AEC BY ADAPTIVE PROXIMAL FORWARD-BACKWARD SPLITTING

1925

APFBS applied to diﬀerent sparsity promoting terms in both the system-mismatch and the echo return loss enhancement. Finally, we remark that the proposed method can be extended straightforwardly to include various conventional algorithms, e.g., proportionate NLMS for Volterra filters [7], [8], by introducing the variable-metric [48]. Such an extension will be discussed in other occasions. Acknowledgement The authors would like to thank the anonymous reviewers for their invaluable comments on the original version of the manuscript. This work was supported in part by JSPS Grants-in-Aid (24-2522, 24800022, B-21300091). References [1] V.J. Mathews and G.L. Sicuranza, Polynomial Signal Processing, John Wiley and Sons, New York, 2000. [2] M. Zeller and W. Kellermann, “Fast and robust adaptation of DFTdomain Volterra filters in diagonal coordinates using iterated coeﬃcient updates,” IEEE Trans. Signal Process., vol.58, no.3, pp.1589– 1604, March 2010. [3] F. Kuech, “Approaches to nonlinear acoustic echo cancellation,” 2008 ITG Conference on Voice Communication, pp.1–4, Oct. 2008. [4] A.J.M. Kaizer, “Modeling of the nonlinear response of an electrodynamic loudspeaker by a Volterra series expansion,” J. Audio Eng. Soc, vol.35, no.6, pp.421–433, 1987. [5] A. Stenger, L. Trautmann, and R. Rabenstein, “Nonlinear acoustic echo cancellation with 2nd order adaptive Volterra filters,” Proc. IEEE ICASSP, pp.877–880, March 1999. [6] F. Kuech and W. Kellermann, “A novel multidelay adaptive algorithm for Volterra filters in diagonal coordinate representation,” Proc. IEEE ICASSP, pp.869–872, May 2004. [7] F. Kuech and W. Kellermann, “Proportionate NLMS algorithm for second-order Volterra filters and its application to nonlinear echo cancellation,” Proc. Int. Workshop on Acoustic Echo and Noise Control (IWAENC), Kyoto, Sept. 2003. [8] T.G. Burton and R.A. Goubran, “A generalized proportionate subband adaptive second-order Volterra filter for acoustic echo cancellation in changing environments,” IEEE Trans. Audio, Speech & Language Process., vol.19, no.8, pp.2364–2373, Nov. 2011. [9] A. Guerin, G. Faucon, and R. Le Bouquin-Jeannes, “Nonlinear acoustic echo cancellation based on Volterra filters,” IEEE Trans. Speech & Audio Process., vol.11, no.6, pp.672–683, Nov. 2003. [10] A. Fermo, A. Carini, and G.L. Sicuranza, “Low-complexity nonlinear adaptive filters for acoustic echo cancellation in GSM handset receivers,” European Trans. Telecommunications, vol.14, no.2, pp.161–169, 2003. [11] S. Makino and Y. Kaneda, “Exponentially weighted stepsize projection algorithm for acoustic echo cancellers,” IEICE Trans. Fundamentals, vol.E75-A, no.11, pp.1500–1508, 1992. [12] S. Makino, Y. Kaneda, and N. Koizumi, “Exponentially weighted stepsize NLMS adaptive filter based on the statistics of a room impulse response,” IEEE Trans. Speech & Audio Process., vol.1, no.1, pp.101–108, Jan. 1993. [13] D.L. Duttweiler, “Proportionate normalized least-mean-squares adaptation in echo cancelers,” IEEE Trans. Speech & Audio Process., vol.8, no.5, pp.508–518, Sept. 2000. [14] M. Yukawa, “Krylov-proportionate adaptive filtering techniques not limited to sparse systems,” IEEE Trans. Signal Process., vol.57, no.3, pp.927–943, March 2009. [15] Y. Murakami, M. Yamagishi, M. Yukawa, and I. Yamada, “A sparse adaptive filtering using time-varying soft-thresholding techniques,”

Proc. IEEE ICASSP, pp.3734–3737, March 2010. [16] Y. Kopsinis, K. Slavakis, and S. Theodoridis, “Online sparse system identification and signal reconstruction using projections onto weighted 1 balls,” IEEE Trans. Signal Process., vol.59, no.3, pp.936–952, 2011. [17] Y. Chen, Y. Gu, and A. Hero, “Sparse LMS for system identification,” Proc. IEEE ICASSP, pp.3125–3128, April 2009. [18] V. Kekatos and G.B. Giannakis, “Sparse Volterra and polynomial regression models: Recoverability and estimation,” IEEE Trans. Signal Process., vol.59, no.12, pp.5907–5920, Dec. 2011. [19] A. Asaei, M.J. Taghizadeh, H. Bourlard, and V. Cevher, “Multi-party speech recovery exploiting structured sparsity models,” Proc. INTERSPEECH 2011, pp.185–188, Aug. 2011. [20] P. Loganathan, A. Khong, and P. Naylor, “A class of sparsenesscontrolled algorithms for echo cancellation,” IEEE Trans. Audio, Speech & Language Process., vol.17, no.8, pp.1591–1601, Nov. 2009. [21] M. Zeller and W. Kellermann, “Coeﬃcient pruning for higherorder diagonals of Volterra filters representing Wiener-Hammerstein models,” Proc. Int. Workshop Acoustic Echo and Noise Control (IWAENC), Sept. 2008. [22] S. Bakin, Adaptive Regression and Model Selection in Data Mining Problems, Ph.D. thesis, Australian National University, Canberra, 1999. [23] M. Yuan and Y. Lin, “Model selection and estimation in regression with grouped variables,” J. Royal Statistical Society: Series B (Statistical Methodology), vol.68, no.1, pp.49–67, 2006. [24] F.R. Bach, “Consistency of the group lasso and multiple kernel learning,” J. Mach. Learn. Res., vol.9, pp.1179–1225, June 2008. [25] L. Meier, S. Van De Geer, and P. B¨uhlmann, “The group lasso for logistic regression,” J. Royal Statistical Society: Series B (Statistical Methodology), vol.70, no.1, pp.53–71, 2008. [26] Y. Eldar and M. Mishali, “Robust recovery of signals from a structured union of subspaces,” IEEE Trans. Inf. Theory, vol.55, no.11, pp.5302–5316, Nov. 2009. [27] M. Stojnic, F. Parvaresh, and B. Hassibi, “On the reconstruction of block-sparse signals with an optimal number of measurements,” IEEE Trans. Signal Process., vol.57, no.8, pp.3075–3085, Aug. 2009. [28] M. Yamagishi, M. Yukawa, and I. Yamada, “Sparse system identification by exponentially weighted adaptive parallel projection and generalized soft-thresholding,” Proc. ASC APSIPA, pp.367–370, Biopolis, Singapore, Dec. 2010. [29] M. Yukawa, “Multikernel adaptive filtering,” IEEE Trans. Signal Process., vol.60, no.9, pp.4672–4682, Sept. 2012. [30] M. Yamagishi, M. Yukawa, and I. Yamada, “Acceleration of adaptive proximal forward-backward splitting method and its application to sparse system identification,” Proc. IEEE ICASSP, pp.4296–4299, May 2011. [31] T. Yamamoto, M. Yamagishi, and I. Yamada, “Adaptive proximal forward-backward splitting for sparse system identification under impulsive noise,” Proc. EUSIPCO, pp.2620–2624, Aug. 2012. [32] M. Yukawa, Y. Tawara, M. Yamagishi, and I. Yamada, “Sparsityaware adaptive filters based on p -norm inspired soft-thresholding technique,” Proc. IEEE ISCAS, pp.2749–2752, May 2012. [33] I. Yamada, S. Gandy, and M. Yamagishi, “Sparsity-aware adaptive filtering based on a Douglas-Rachford splitting,” Proc. EUSIPCO, pp.1929–1933, Sept. 2011. [34] S. Ono, M. Yamagishi, and I. Yamada, “A sparse system identification by using adaptively-weighted total variation via a primal-dual splitting approach,” Proc. IEEE ICASSP, May 2013. [35] I. Yamada and N. Ogura, “Adaptive projected subgradient method for asymptotic minimization of sequence of nonnegative convex functions,” Numer. Funct. Anal. and Optim., vol.25, no.7&8, pp.593–617, 2005. [36] I. Yamada, “Adaptive projected subgradient method: A unified view for projection based adaptive algorithms,” J. IEICE (in Japanese),

IEICE TRANS. FUNDAMENTALS, VOL.E96–A, NO.10 OCTOBER 2013

1926

[37]

[38]

[39]

[40]

[41]

[42] [43]

[44]

[45]

[46] [47]

[48]

vol.86, no.8, pp.654–658, Aug. 2003. J.J. Moreau, “Fonctions convexes duales et points proximaux dans un espace hilbertien,” C.R. Acad. Sci. Paris S´er, vol.255, pp.2897– 2899, 1962. I. Yamada, M. Yukawa, and M. Yamagishi, “Minimizing the Moreau envelope of nonsmooth convex functions over the fixed point set of certain quasi-nonexpansive mappings,” in Fixed-Point Algorithms for Inverse Problems in Science and Engineering, ed. H.H. Bauschke, R.S. Burachik, P.L. Combettes, V. Elser, D.R. Luke, and H. Wolkowicz, Springer Optimization and Its Applications, pp.345–390, Springer New York, 2011. P.L. Combettes and V. Wajs, “Signal recovery by proximal forwardbackward splitting,” Multiscale Modeling & Simulation, vol.4, no.4, pp.1168–1200, 2005. P. Lions and B. Mercier, “Splitting algorithms for the sum of two nonlinear operators,” SIAM J. Numerical Analysis, vol.16, no.6, pp.964–979, 1979. E. Cand`es, M. Wakin, and S. Boyd, “Enhancing sparsity by reweighted 1 minimization,” J. Fourier Analysis and Applications, vol.14, pp.877–905, 2008. Y. Chen, Y. Gu, and A.O. Hero, “Regularized least-mean-square algorithms,” ArXiv e-prints, Dec. 2010. J. Duchi and Y. Singer, “Eﬃcient online and batch learning using forward backward splitting,” J. Mach. Learn. Res., vol.10, pp.2899– 2934, Dec. 2009. J. Liu, S. Ji, and J. Ye, “Multi-task feature learning via eﬃcient 2,1 norm minimization,” Proc. 25th Conference on Uncertainty in Artificial Intelligence, pp.339–348, AUAI Press, Arlington, Virginia, United States, 2009. D. Song, H. Wang, and T.W. Berger, “Estimating sparse Volterra models using group L1-regularization,” Proc. 32nd Annual International Conference of the IEEE Engineering in Medicine and Biology Society (EMBC), pp.4128–4131, Sept. 2010. Digital Network Echo Cancellers, ITU-T Rec. G.168, 2007. P. Kabal, “TSP speech database,” Tech. Rep., Department of Electrical & Computer Engineering, McGill University, Montreal, Quebec, Canada, 2002. M. Yukawa, K. Slavakis, and I. Yamada, “Adaptive parallel quadratic-metric projection algorithms,” IEEE Trans. Audio, Speech & Language Process., vol.15, no.5, pp.1665–1680, July 2007.

Appendix A: Proof of Proposition 1 (a) From (5), we have

supp(H ) ⊂

M "

supp S¯ .

. 1 for any ∈ {1, . . . , M}, and hence ξP +δ "P −1 =ξP

Apply this to (A· 2), we have (19). Proof of (20): From (18), H in (5) can be written as H =

p=1

=ξP

and then

supp(H ) ⊂

P ξP +δ " "P −1 p=1

supp S¯ .

(A· 3)

=ξP

Equation (7) yields supp S¯ ⊂ {(i, j) ∈ D2 | ≤ j ≤ i ≤ +L−1} , for any ∈ {1, . . . , M}, and hence ξP +δ "P −1

supp S¯ ⊂ (i, j) ∈ D2 | ξ p ≤ j ≤ i ≤ ξ p +δ p +L−2 .

=ξP

Apply this to (A· 3), we have (20).

Appendix B: The Derivation of (26) xk 0. (i) We show x¯ k 0 ⇔ PM t t Assume x¯ k = 0. Then, xk = x¯ tk , vec x¯ k x¯ tk = 0 and the xk = 0. linearity of PM implies PM xk 0. Assume x¯ k 0. Then, by D1 ⊂ D, we have PM () (ii) We show S k ∩ M ∅ if x¯ k 0. By (i), we have xk PM (A· 4) θ := dk 2 ∈ M. P M xk 2 xk + xk xk = θ t PM θt xk − PM t = θ PM xk = dk ,

(A· 1)

r s¯ ,

=ξP

we have θ ∈ S k() ∩ M ∅. (iii) We show + S k() ∩ M = S k ∩ M as follows. xk − dk ≤ and h ∈ S k() ∩ M ⇔ ht h∈M xk − dk ≤ and ⇔ h∈M ht PM ⇔ h∈+ S k() ∩ M,

and then supp(h ) ⊂

r S¯ ,

Moreover, by

Since (7) implies supp S¯ ⊂ D2 , (17) is guaranteed by (A· 1). (b) Proof of (19): From (18), h in (4) can be written as +δP −1 P ξP

+δP −1 P ξP

p=1

=1

h =

supp (¯s ) ⊂ i ∈ D1 | ξ p ≤ i ≤ ξ p + δ p + L − 2 .

+δP −1 P ξP " " p=1

supp (¯s ) .

(A· 2)

=ξP

Equation (6) yields supp (¯s ) ⊂ i ∈ D1 | ≤ i ≤ + L −

xk = where the second equivalence follows by (∀z ∈ M) zt xk . zt PM (iv) We show PS () ∩M = P+S () PM . By the expression (27) of k k P+() , we have for any h ∈ RN , Sk

KURODA et al.: EXPLOITING GROUP SPARSITY IN NONLINEAR AEC BY ADAPTIVE PROXIMAL FORWARD-BACKWARD SPLITTING

1927

h ∈+ S k() ∩ M. P+S () PM

(A· 5)

k

By (iii) and (A· 5), we have, for any z ∈ + S k() ∩M = S k() ∩M and h ∈ RN , / 0 h , z − P+S () PM h h − P+S () PM k k / ! 0 = PM h − P+S () PM h , z − P+S () PM h k k / 0 = PM h − P+S () PM h , z − P+S () PM h ≤ 0, k

k

which implies PS () ∩M = P+S () PM . k

k

Hiroki Kuroda was born in Suginami, Tokyo, Japan in 1990. He received the B.E. degree in computer science in 2013 from the Tokyo Institute of Technology, Tokyo, Japan. Since 2013, he has been a master course student in the Department of Communications and Computer Engineering at the Tokyo Institute of Technology. His current research interests are in acoustic signal processing and mathematical signal processing.

Shunsuke Ono received the B.E. and M.E. degrees in engineering from the Tokyo Institute of Technology in 2010 and 2012, respectively. He is currently pursuing a Ph.D. degree in the Department of Communications and Computer Engineering in the Tokyo Institute of Technology, and he is a research fellow (DC1) of the Japan Society for the Promotion of Science (JSPS). He received the Young Researchers’ Award from the IEICE in 2013. His research interests include signal and image processing, convex optimization, and inverse problems.

Masao Yamagishi received the B.E., M.E., and Ph.D. degrees from the Tokyo Institute of Technology in 2007, 2008, and 2012, respectively. He is currently an assistant professor in the Department of Communications and Computer Engineering, the Tokyo Institute of Technology. He received the IEICE Young Researchers’ Award in 2010. His research interests include mathematical signal processing and convex optimization theory.

Isao Yamada received the B.E. degree in computer science from the University of Tsukuba in 1985 and the M.E. and Ph.D. degrees in electrical and electronic engineering from the Tokyo Institute of Technology, Tokyo, Japan, in 1987 and 1990, respectively. He is a professor in the Department of Communications and Computer Engineering, the Tokyo Institute of Technology. His current research interests are in mathematical signal processing, inverse problems and optimization theory. He has been serving as an associate editor for several journals, including the IEEE Transactions on Signal Processing (2008–2011) and the IEEE Transactions on Circuits and Systems — PART I: Fundamental Theory and Applications (2006–2007). Currently, he is serving as the Editor-in-Chief of the IEICE Transactions on Fundamentals, and as an editorial board member of the Numerical Functional Analysis and Optimization (Taylor and Francis) and the Multidimensional Systems and Signal Processing (Springer). He received the IEICE Excellent Paper Awards (1991, 1995, 2006, 2009), the IEICE Achievement Award (2009), the ICF Research Award (2004), the Docomo Mobile Science Award — Fundamental Science Division (2005), and the Fujino Prize (2008).

Exploiting Group Sparsity in Nonlinear Acoustic Echo Cancellation by ...

IEICE TRANS. FUNDAMENTALS, VOL.E96âA, NO.10 OCTOBER 2013. PAPER Special Section on Sparsity-aware Signal Processing. Exploiting Group Sparsity in Nonlinear Acoustic Echo Cancellation by Adaptive Proximal Forward-Backward Splitting. Hiroki KURODA. â a). , Nonmember, Shunsuke ONO. â . , Student ...

Download PDF

2MB Sizes 2 Downloads 218 Views

Report

Exploiting Group Sparsity in Nonlinear Acoustic Echo Cancellation by ...

Recommend Documents