Implementation of Greedy Algorithms for LTE Sparse ...

Viewer
Transcript

Implementation of Greedy Algorithms for LTE Sparse Channel Estimation Patrick Maechler, Pierre Greisen, Benjamin Sporrer, Sebastian Steiner, Norbert Felber, and Andreas Burg Integrated Systems Laboratory, ETH Zurich, Switzerland {maechler,apburg}@iis.ee.ethz.ch Abstract—Broadband wireless systems often operate under channel conditions that are characterized by a sparse channel impulse response. When the amount of training is given by the standard, compressed sensing channel estimation can exploit this sparsity to improve the quality of the channel estimate. In this paper, we analyze and compare the hardware complexity and denoising performance of three greedy algorithms for the 3GPP LTE system. The complexity/performance trade-off is analyzed using parameterized designs with varying configurations. One configuration of each algorithm is fabricated in a 180 nm process and measured.

I. I NTRODUCTION The significant attention compressed sensing (CS) [1], [2] received in the last few years led to the development of a framework consisting of theories for performance guarantees, many reconstruction algorithms, and a number of possible applications. One of these promising applications is channel estimation in broadband wireless systems based on orthogonal frequency division multiplexing (OFDM). OFDM modulation is the technology of choice for broadband wireless systems, such as 3GPP long term evolution (LTE) [3], the target application of this work. The error-rate performance of coherent OFDM communication systems relies heavily on the quality of the channel estimation. Unfortunately, broadband channels require the estimation of many parameters. However, channel measurements have shown that wireless channels can often be described by only a small number of propagation paths. Hence, the degrees of freedom of the channel are limited. This inherent sparsity of the channel impulse response (CIR) can be exploited to improve the quality of the channel estimation. Corresponding algorithms have recently received significant attention in the context of CS, but have also already been considered for channel estimation prior to the CS-era. The benefits of using the matching pursuit (MP) algorithm [4] for the estimation of communication channels with a sparse CIR were shown in [5]. Later, sparsity in the delayDoppler function of shallow water communication channels was exploited in [6], where MP was used to track sparse reflections in a fast time-varying environment. In [7], CS is applied to the estimation of doubly selective channels in multi-carrier systems such as OFDM. The authors show that an approximately sparse representation can be found in the delay-Doppler domain and that, with randomly distributed pilot tones and CS-based estimation, the basis pursuit (BP) algorithm achieves better channel estimates with only half

the training tones compared to least squares (LS) estimation. In [8], it was shown that CS-based algorithms such as BP and orthogonal matching pursuit (OMP) outperform subspacebased sparse reconstruction algorithms for the estimation of underwater acoustic channels. Initial implementations of sparse channel estimators in [9] were designed for DS-CDMA. The authors proposed a highly parallel architecture of the MP algorithm on an FPGA to achieve high throughput. In [10], it is stated that for a shallowwater acoustic communication system, a highly parallel FPGA implementation of MP channel estimation outperforms corresponding DSP and microprocessor (XILINX MicroBlaze) implementations in terms of both power consumption and processing time. However, no comparison of the hardware complexity of different algorithms has been done so far. In our previous work [11], an MP implementation for LTE was presented but no other CS algorithm have been implemented yet for a high data rate communication standard such as LTE. Outline & Contributions: In this paper, we compare the hardware complexity of three algorithms: matching pursuit, gradient pursuit, and orthogonal matching pursuit. In Section II we show how CS is applied to channel estimation for the 3GPP LTE standard. We then introduce the three algorithms in Section III and describe important algorithm optimizations to reduce the computational complexity. Section IV presents the hardware architectures for all three implementations. Their costs and merits are then compared in terms of chip area and fix-point performance in Section V. Notation: Throughout this paper, the following notation is used. An upper case bold letter denotes a matrix, a lower case bold letter a column vector. (.)H denotes the Hermitian transpose, ci is the ith component of vector c, and Φg denotes the gth column of matrix Φ. ΦΓ (cΓ ) indicates a matrix (vector) in which all columns (elements) are zero except those selected by the elements of the set Γ. A. 3GPP LTE System Overview 3GPP LTE is an upcoming standard for mobile communication. LTE supports bandwidths between 1.4 MHz and 20 MHz. OFDM with up to 2048 sub-channels is employed in the downlink and single carrier-frequency division multiple access is used for the uplink. LTE also supports multiple-input multipleoutput transmissions with up to two receive and four transmit antennas. In this paper we shall focus on the single-input

©2011 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works. DOI: 10.1109/ACSSC.2010.5757587

time slots

|h(t)|

0.5 ms

0

Figure 1.

subcarriers

LTE time slot with trained OFDM sub-channels marked

single-output downlink but the same algorithms can also be applied to the MIMO case and the complexity results can serve as a basis for estimating hardware costs for the MIMO case with orthogonal training. LTE employs pilot-assisted channel estimation. The training in a time slot of 0.5 ms duration is distributed across frequency and time according to the pattern shown in Fig. 1. Usually, the channel estimates are averaged and interpolated over time and frequency, for example by 2DWiener filters. B. Sparse Signal Recovery CS provides a framework that allows to reconstruct sparse signals from much fewer measurements than the dimension of the unknown signal suggests [1], [2]. Thus, for a sparse signal vector x ∈ CM , a measurement vector y ∈ CN with N M , and the measurement matrix (or dictionary) Φ ∈ CN ×M , one can reconstruct x from y = Φx if Φ fulfills certain conditions on the restricted isometry property [2]. With noise-free measurements, the signal x can be reconstructed by finding the sparsest possible solution which fulfills y = Φx. x ˆ = argminx ||x||0 , subject to y = Φx.

(1)

A large number of algorithms have been proposed to solve the CS reconstruction problem (1). Most algorithms also work for noisy measurements where the reconstruction problem is not solved with equivalence but within an error bound ||Φx − y|| ≤ . II. S PARSE C HANNEL E STIMATION In this section, the application of CS to OFDM channel estimation is explained and the employed channel model is introduced. A. Channel Model Let P be the number of dominant paths with complexvalued gains ai and delays τi . The corresponding sparse CIR can be written as h(τ ) =

P X i=1

ai δ(τ − τi ).

We assume a system with small Doppler spread and neglect the time-dependency of the impulse response. The ideal peaks of the CIR h(τ ) are broadened by transmit and receive filters (cf. Fig. 2), and the effective CIR is given by ˜ ) = g r (τ ) ∗ h(τ ) ∗ g t (τ ) = h(τ

1

2

3

4

5

6

1

2

3 time [µs]

4

5

6

2048

˜ |h(t)|

1

P X i=1

ai g(τ − τi ),

where g t (τ ) and g r (τ ) denote transmit and receive filters and g(τ ) = (g r ∗ g t )(τ ) with ∗ denoting the convolution. When

0

Figure 2. Example of a sparse CIR. Top: physical channel h(t), bottom: ˜ filtered channel h(t) as seen by the receiver

transmitting a signal s(t) and adding complex white Gaussian noise n(t) with variance σn2 , the received signal is given by ˜ ∗ s)(t) + n(t). q(t) = (h B. Receiver The OFDM receiver depicted in Fig. 3 samples the signal with period T . After the cyclic prefix is removed, a Fourier transform the samples into frequency domain: rz = PD−1converts −j2π zk √1 N , where z ∈ [0, D − 1] is the subcarq(kT )e k=0 D rier index and D is the number of tones. For the transmitted ˜ symbol vector c, one receives rz = FFT(h)(z)c ˜. z +n On the pilot tones with indices P = [p1 p2 . . . pN ], defined in Fig. 1, known symbols cP are transmitted. They are fed to the channel estimator and the measurements of these pilot tones are calculated as yi = rpi · c∗pi /||cpi ||2 . The data tones are forwarded to the detector, which uses the output of the channel estimator to recover the transmitted signal. C. Dictionary To apply CS theory to channel estimation, one first has to establish the linear relationship between the measurements taken in frequency domain and the time domain basis which allows a sparse signal representation. The measurement matrix Φ defines which atomic elements can be selected by CS reconstruction algorithms and how they are related to the measurements. Each column of Φ corresponds to a dictionary element. An obvious choice for the dictionary elements is the discrete Fourier transform (DFT) of a single sample of the ˜ CIR δ[t − kT ], which implies xk = h(kT ). Note that with this choice, x appears less sparse than P after transmit and receive filtering, since each reflection is broadened across a few samples. The columns of the measurement matrix are assumed to be normalized to ||Φg ||2 = 1. This leads to a measurement matrix 1 pn · (m − 1) Φn,m = √ exp −j2π , D N RX filter

q(t)

Receiver

Figure 3.

CP removal q(kT)

FFT

r

data tones

pilot tones

Decoder

Training seq. y CS removal estimation

Block diagram of an OFDM receiver with CS channel estimation

Algorithm 1 Greedy algorithm Input y: measured pilot tones 1: r0 = y; x0 = 0; n = 1; Γ0 = ∅ 2: while stopping criterion not met do 3: gn = ΦH rn−1 ; 4: in = argmaxi |gin |; 5: Γn = Γn−1 ∪ in ; 6: xn = updateX(S n ); 7: rn = rn−1 − Φxn ; 8: n = n + 1; 9: end while 10: Output xn ; with n ∈ [1, N ], m ∈ [1, M ] for N measured pilot tones and a maximum channel length M . Note that Φ is constructed from a D × D DFT matrix by selecting only the first M columns and the N rows corresponding to the pilot tones. LTE can use an extended cyclic prefix of 512 samples for 20 MHz bandwidth or 256 samples for 10 MHz bandwidth. Assuming that the maximum CIR length is limited by the length of the cyclic prefix (thus no inter-symbol interference occurs), we end up with maximally M = 512 dictionary elements. N = 400 pilot tones are measured in a 2048 subchannel 20 MHz OFDM system and N = 200 in a 1024 subchannel 10 MHz system. III. G REEDY A LGORITHMS Due to their low computational complexity and their regular structure, greedy algorithms seem to be most suitable for hardware implementation. In this section, three greedy algorithms will be reviewed. All three algorithms follow the generic structure in Alg. 1. Every greedy algorithm adding no more than one dictionary element per iteration can be computed by this iterative program when the appropriate estimation function updateX(S n ) is used. S n denotes the current state and includes all the variables of the algorithm S n = {xn−1 , gn , in , Φ, Γn , rn−1 , y}. In each iteration the dictionary element with the strongest correlation to the current residual r is added. All the chosen elements span a subspace in which the estimation of x is performed. After finishing the reconstruction, the CS channel estimator has to transform the estimated sparse CIR x into frequency domain to be used for data detection. A. Matching Pursuit MP is the simplest greedy algorithm. It was first introduced in [4]. After choosing the most suitable dictionary element, the contribution of the selected element is directly added to the sparse estimate. Since the elements are not orthogonal in general, the same element can be updated multiple times. The MP update function changes only one element in the current estimate xn . updateX(S n ) = xn−1 + ein ginn , where ein is a unit vector with all elements zero, except a ’1’ at position in .

The computationally most expensive operation of the MP algorithm is the matrix-vector product on line 3 of Alg. 1. However, it is well-known [4] that after the first iteration this operation can be replaced by a less complex update gn = ΦH r = gn−1 − gin−1 ΦH Φin−1 using the pre-computed correlation coefficients ΦH Φg by combining lines 3 and 7 of Alg. 1. This update strategy requires storing ΦH Φg for g = 1 . . . M . B. Orthogonal Matching Pursuit OMP is an extension of the MP algorithm as described in [12]. The extension consists in the more elaborate update function which includes all already chosen dictionary elements. The update is calculated by finding the least squares optimum in the subspace spanned by the already chosen dictionary elements ΦΓn . updateX(S n ) = argminx ||ΦΓn xΓn − r||2

(2)

As a result, the new estimate will be orthogonal to the residual after each iteration, which implies that the same element won’t be selected again. The required effort to compute (2) can be reduced by avoiding the computation of the full LS optimization in each iteration. When the LS step is computed by a matrix decomposition such as the QR decomposition (QRD), the Q and R matrices can be stored and provide a starting point for the decomposition in the next iteration. C. Gradient Pursuit The gradient pursuit (GP) algorithm was introduced in [13] as an approximation of OMP while the complexity is kept near that of MP. The GP algorithm updates all coefficients of the preliminary estimate xn by moving into a certain update direction within the subspace spanned by all selected coefficients xΓn . The most obvious choice for the update direction is the gradient gΓnn already computed in line 3 of Alg. 1. By setting cn = ΦΓn gΓnn , the update function can be defined as updateX(S n ) = xn−1 +

hrn−1 , cn i n gΓn . ||cn ||22

Other possibilities for update directions are conjugate gradient or an approximation thereof. The same simplification of the correlation as in MP can be applied to GP. However, all coefficients of xn−1 might have Γ changed in GP. Therefore, multiple updates must be performed on g. Since the residual r is no longer explicitly computed, the update function must be adapted as follows: hr, ci = = updateX(S n )

cH r = (ΦΓn gΓnn )H r = (gΓnn )H ΦH Γn r n H n n 2 (gΓn ) gΓn = ||gΓn ||2 ||gnn ||2 = xn−1 + Γn 22 gΓnn ||c ||2

0

10

D. Stopping Criterion

E. Simulations The channel model employed for the simulations is the extended typical urban (ETU) model with nine propagation paths, defined in the LTE standard [14]. All paths were assumed Rayleigh fading and white Gaussian noise was added. ˜ ) includes root raised-cosine filters The impulse response h(τ t r g and g with roll-off factor ρ = 0.25. The performance of the three algorithms introduced above is compared in Fig. 4 by plotting the mean squared error (MSE) of the channel estimation against the signal to noise ratio (SNR) at the receiver. The lower plot shows the average number of iterations performed before the stopping criterion is met. Fig. 4 shows that CS channel estimation provides a gain of more than 9 dB in the low SNR regime. At high SNR, the gain is getting smaller since more taps are above noise level. Thus, the estimation problem effectively becomes less sparse. One can observe that GP achieves almost OMP performance while the performance of MP degrades more in the high SNR regime. Moreover, the number of iterations of MP is rapidly increasing for higher SNR. As a theoretical limit, we also plot a genieaided LS estimator which operates only on the significant taps, which are assumed to be known to the receiver. IV. H ARDWARE I MPLEMENTATIONS There are two fundamentally different ways how the correlation on line 3 of Alg. 1 can be computed. First, the straightforward approach is to perform the matrix-vector-multiplications in complex-valued multiplication and accumulation (CMAC) units. A wide accumulation register allows for high precision. This operation can easily be performed with any degree of parallelism (≤ M ). Second, the structure of the measurement matrix allows to use an FFT. On one hand, using an FFT results in a lower number of multiplications. On the other hand, memory requirements are increased since an FFT of size D has to be performed. To maintain a higher flexibility and to keep the memory requirements low, the direct computation was chosen for this implementation as in [11]. The stopping criterion of all algorithms is implemented by comparing the magnitude of only the newest dictionary element to a programmable threshold, which is much less

MSE

10

−2

10

−3

10

−4

10

0

5

10

15 SNR

20

25

30

10

15 SNR

20

25

30

150 iterations

It is essential for all greedy algorithms to terminate after the correct number of iterations. When too many dictionary elements are considered, the noise within a subspace of too high dimension influences the estimation which leads to a higher noise variance. However, too few elements can not represent the channel properly and lead to a large residual (i.e., a large channel estimation error). Since the sparsity of the channel is unknown to the receiver, one has to guess when all relevant channel taps have been included. One possibility is to set a limit on the l2 -norm of the residual ||r||22 proportional to the noise variance σn2 . Another possibility is to look only at the largest element gin selected on line 4 of Alg. 1 and set a lower limit on its magnitude. The performance of both approaches is similar.

LS Known taps MP GP OMP

−1

MP

100

GP OMP

50 0

0

5

Figure 4. Comparison of greedy algorithms with floating point precision in LTE setting with 20 MHz bandwidth

complex than calculating the energy of the residual, especially when the residual is not even computed explicitly. We start by describing the hardware architecture for MP, which serves as a basis for all further implementations. GP then builds on MP and OMP builds on GP. Thus, the complexity is increasing in each step. A. MP Implementation The implementation of the MP algorithm is described in detail in [11] and is summarized as follows. The correlation matrix needed for the simplified update ΦH Φ is a Hermitian Toeplitz matrix which allows to store only the first row of ΦH Φ, which is a single vector of length M instead of an M × M matrix. The structure of Φ also allows for significant reductions in the size of its look-up-table (LUT) to D/4 values. This is achieved by computing for each entry its position on the unit circle, where all the D possible DFT coefficients are located. Inversion and complex-conjugation on the imaginary plain were used to further reduce the the number of needed storage to a fourth. The implemented architecture is depicted in Fig. 5. Most of the computations are performed in a configurable number of parallel multiply and storage units (MSU). Three phases, controlled by a finite state machine (FSM), are needed to compute the MP channel estimation: a) Correlation with ΦH : In the first iteration, the measurements are correlated with all columns of Φ, which is done in the MSUs and the results g1 are stored in the their Correlation RAM. The CMACs are also used to compute the squared absolute values, from which the maximum is chosen. b) Updates: The vector gn is updated using the precomputed correlation values. Then, the largest component of gn is determined as before. The selected elements are stored in the Sparse RAM. Updates are performed until the stopping criterion is met or the maximum supported sparsity is reached. c) Transformation into frequency domain: After the sparse CIR has been determined in the Kth iteration, xK must be transformed back into frequency domain. To this end, the

MSU RAM

Update LUT

CordicMUX

ArithmeticMUX

Measurement RAM

Arithmetic Cell

Cordic

CMAC pipeline register FSM

Figure 6.

Simplified hardware block diagram of the least squares unit

Meas. matrix LUT

Table I C OMPLEXITY OF GREEDY ALGORITHMS

Correlation RAM

Alg. MP GP

Sparse RAM sparse coefficients and indices

Simplified hardware block diagram of MP and GP (dashed)

stored sparse elements are multiplied with their corresponding rows of Φ. B. GP Implementation The MP architecture above can easily be extended to perform GP reconstruction. The only additional hardware blocks required are a divider and additional memory to store the value of xupdate = xΓn − xΓn−1 . All other additional operations compared to MP can be performed with the existing hardware. Thus, mainly the FSM has to be adapted. Due to the higher complexity of GP, the number of parallel units of MP must be increased to complete a sufficient number of iterations. When implemented in fix-point, this algorithm showed numerical instability at the squaring and divide operations in updateX(S n ). To solve this problem, gΓnn and cn were prescaled by a power of two, determined by the norms of those two numbers in the last iteration. Using this pseudofloating point method, a stable implementation was possible without having to extend the wordwidth. C. OMP Implementation The final and most complex implementation considered in this paper is OMP, which can be implemented with a further extension of the above described architecture by a unit that perform a least squares optimization (Fig. 6). The LS optimum is computed by a QRD followed by back-substitution. A modified Gram-Schmidt orthogonalization [15] is used to perform the QRD. Two arithmetic units and a CORDIC serve as basic building blocks of the QRD unit. An arithmetic unit consist of a complex-valued multiplier and an adder. For the calculation of the norm of a vector as required by the modified GramSchmidt process and for the division used during backsubstitution, a pipelined CORDIC architecture is used. This implementation of the norm calculation avoids the numerical

problems (increased dynamic range) associated with squaring followed by the computation of a square root. V. C OMPARISON In order to compare CS-based channel estimators and to obtain an estimation on area and cost of the three greedy algorithms under consideration, VLSI implementations of these algorithms were realized. For a practical implementation within a communication system, runtime constraints must be applied. In this work, it is assumed that the reconstruction must have finished before a new set of measurements is ready, which is after 0.5 ms, the duration of one resource block. This leads to a constraint on the maximum number of iterations an algorithm is allowed to perform. A parameterized VHDL design allowed to synthesize all implementations for multiple degrees of parallelism. For each configuration further area/performance trade-offs are achieved by synthesizing the design for different timing constraints. The achievable number of iterations for each implementation within our time limit determines the MSE performance. Fig. 7 compares the performance of all the greedy algorithms with different limits on the maximal number of iterations. The performance of the synthesis results is evaluated by comparing the SNR at which the MSE of the implementation 0

0

10

−1

10

0

10

LS MP, K=20 MP, K=25 MP, K=30 MP, K=35 MP, K=40

10

LS GP, K=12 GP, K=14 GP, K=18 GP, K=20 GP, K=24

−1

10

−2

−1

10

−2

10

−2

10

−3

10

10

−3

0

10

SNR

20

30

LS OMP, K=10 OMP, K=12 OMP, K=14 OMP, K=16 OMP, K=20

MSE

Figure 5.

OMP Alg. MP GP OMP

find max

MSE

pipeline register

div

MSE

FSM

Operations (CMAC or division) M N + M + 2(Ku − 1)M + KL ” “P Ku M k + N k + k + N + 1 + k + KL MN + M + k=2 ” “P K MN + M + k=2 M k + 2kN + 1 + k + KL Memory (words) N +M +K N + M + 2K N + M + 2K + KN + K 2 + 2N

10

−3

0

10

SNR

20

30

10

0

10

SNR

20

30

Figure 7. Comparison of fixed-point greedy algorithms with a maximum of K iterations for 10 MHz bandwidth

OMP (4)

4

OMP (8) Area [mm2]

3

GP (4) GP (8)

2

MP (2)

1 0 12

14

16

18

20 22 LS Crossing SNR [dB]

24

26

28

Figure 8. Area of the implementations vs. performance (number of parallel MSUs in brackets) for 10 MHz bandwidth Table II C HIP FIGURES ( SILICON ) Complexity [kGE] Core area [mm2 ] Max. frequency [MHz] Power [mW] Energy [µJ]/comp.

MP 58 0.73 140 88 44

GP 99 1.21 128 209 104

OMP (LS block) 93 1.21 166 196 98

crosses the MSE of a LS estimator. Again, the ETU channel model was employed. The area required for a given performance in a 180 nm technology is shown in Fig. 8. The scaling of the computational complexity and memory requirements of the implementations is analyzed in Tbl. I. K and Ku denote the number of added dictionary elements and the number of iterations performed, respectively. L indicates the number of sub-channels in frequency domain which must be estimated eventually. Doubling the bandwidth means doubling both M and N while the sparsity K remains approximately constant. For further comparison, a configuration of each algorithm is selected that yields the same performance. Fixing the SNR at which the MSE lines cross the LS estimate to 20 dB for a 10 MHz ETU channel, MP has to perform 19 iterations, GP 13, and OMP 12. Synthesis with corresponding hardware configurations yields an area of 0.374 mm2 , 0.735 mm2 , and 1.580 mm2 for MP, GP, and OMP, respectively. A. ASIC Implementation Results For all the three algorithms, one configuration was chosen to be fabricated in silicon (Fig. 9). Due to chip size constraints, the 10 MHz mode (half of the maximum LTE bandwidth) was chosen. The most important figures of these implementations are given in Tbl. II. The OMP chip contains only the least squares part of the algorithm. Combining this part with a chip of GP’s size results in the full OMP algorithm. The achievable speed and implemented memory sizes allow up to 50, 18, and 10 iterations for MP, GP, and OMP, respectively. This results in LS intersections at 24, 25, and 18 dB, thus OMP is limited by the number of possible iterations.

MP

GP Figure 9.

OMP (LS-part)

Layouts of the fabricated chips

VI. C ONCLUSION Three greedy algorithms were implemented and synthesized with different degrees of parallelism and sparsity support. MP was able to exploit the sparsity assumption of the channel using the smallest area. The GP implementation requires about three times the area of MP, but also improves the estimation by a few dB. The OMP algorithm turned out to be overly complex for this real-time system. Even with a much larger area than GP, the possible number of iterations was not sufficient to capture all relevant taps in the high SNR regime. For all three implementations it could however be shown that especially in the low SNR regime, significant gains in the MSE of channel estimates could be obtained compared to LS estimation with a silicon area that is low compared to a typical overall LTE baseband receiver. ACKNOWLEDGMENT The authors would like to thank Fabian Huber for his work on the implementation of the GP algorithm. Financial support for this work has been provided by the Swiss National Science Foundation and by the Hasler Foundation. R EFERENCES [1] D. Donoho, “Compressed sensing,” IEEE Trans. Inf. Theory, vol. 52, no. 4, pp. 1289–1306, April 2006. [2] E. Candès, J. Romberg, and T. Tao, “Robust uncertainty principles: exact signal reconstruction from highly incomplete frequency information,” IEEE Trans. Inf. Theory, vol. 52, no. 2, pp. 489–509, Feb. 2006. [3] E-UTRAN: Physical channels and modulation, 3GPP Std. TS 36.211, March 2009. [4] S. Mallat and Z. Zhang, “Matching pursuits with time-frequency dictionaries,” IEEE Trans. Signal Process., vol. 41, no. 12, pp. 3397–3415, Dec. 1993. [5] S. Cotter and B. Rao, “Sparse channel estimation via matching pursuit with application to equalization,” IEEE Trans. Commun., vol. 50, no. 3, pp. 374–377, March 2002. [6] W. Li and J. Preisig, “Estimation of Rapidly Time-Varying Sparse Channels,” IEEE J. Oceanic Engineering, vol. 32, no. 4, pp. 927–939, 2007. [7] G. Tauböck and F. Hlawatsch, “A compressed sensing technique for OFDM channel estimation in mobile environments: Exploiting channel sparsity for reducing pilots,” in IEEE Int. Conf. on Acoustics, Speech and Signal Processing, April 2008, pp. 2885–2888. [8] C. Berger, S. Zhou, J. Preisig, and P. Willett, “Sparse channel estimation for multicarrier underwater acoustic communication: From subspace methods to compressed sensing,” IEEE Trans. Signal Process., vol. 58, no. 3, pp. 1708–1721, March 2010. [9] Y. Meng, W. Gong, R. Kastner, and T. Sherwood, “Algorithm/architecture co-exploration for designing energy efficient wireless channel estimator,” ASP J. Low Power Electronics, vol. 1, no. 3, pp. 1–11, 2005. [10] B. Benson, A. Irturk, J. Cho, and R. Kastner, “Survey of hardware platforms for an energy efficient implementation of matching pursuits algorithm for shallow water networks,” in Proc. 3rd ACM int. workshop on underwater networks, 2008, pp. 83–86. [11] P. Maechler, P. Greisen, N. Felber, and A. Burg, “Matching pursuit: Evaluation and implementation for LTE channel estimation,” in Proc. ISCAS, May 2010. [12] Y. Pati, R. Rezaiifar, and P. Krishnaprasad, “Orthogonal matching pursuit: recursive function approximation with applications to wavelet decomposition,” in Proc. Asilomar Conference on Signals, Systems and Computers, vol. 1, Nov. 1993, pp. 40–44. [13] T. Blumensath and M. Davies, “Gradient pursuits,” IEEE Trans. Signal Process., vol. 56, no. 6, pp. 2370–2382, June 2008. [14] E-UTRAN: User Equipment radio transmission and reception, 3GPP Std. TS 36.101, Sept. 2009. [15] K. Nipp and D. Stoffer, Communication Systems, 5th ed. John Wiley & Sons, 2005.

Implementation of Greedy Algorithms for LTE Sparse ...

estimation in broadband wireless systems based on orthogonal frequency division multiplexing (OFDM). OFDM modulation is the technology of choice for broad-.

Download PDF

922KB Sizes 2 Downloads 220 Views

Report

Implementation of Greedy Algorithms for LTE Sparse ...

Recommend Documents