Design and Implementation of the Discrete Wavelet Transform on an FPGA Platform to Process Data Sets of up to Three Dimensions E. A. Rivera-Juarico, Student Member, IEEE, J. M. Ram´ırez-Cort´es, Senior Member, IEEE, V. Alarcon-Aquino and J. Escamilla-Ambrosio, Member, IEEE
Abstract—The aim of this work is the design and implementation of a DWT architecture FPGA-based platform for up to three dimensional signal processing. First, filter banks are designed using a distributed arithmetic technique. Then, we design controllers, interfaces and protocols that handle, transmit and sequence all the data during the computing process. Data is sent via USB to the FPGA and the user interface is programmed in MATLAB. A graphical user interface manages the system operation and displays the results on a PC. Designed filters are compared with a fully parallel architecture in relation to the number of gates used, speed, and algorithm performance. Keywords—Daubechies, Distributed Arithmetic, Filter Design.
ψ up to order n exist and are bounded. A set of orthogonal functions with compact support, widely used now in the wavelet field, and a method to obtain the corresponding filter coefficients was proposed by Ingrid Daubechies [3]. Several interesting properties are listed below. A. Multiresolution A multi-resolution or multi-scale scheme consists of a sequence of successive approximation spaces Vj with the following conditions:
I. I NTRODUCTION Wavelet signal processing is a technique that can be used to analyze the temporal and spectral properties of non-stationary signals. A prototype function called Mother Wavelet (ψ) is stretched and shifted to obtain an orthogonal function set that is used to analyze in a multiresolution approach the time-scale energy distribution of signals. Mother Wavelets can be designed to achieve different requirements. Essentially, mother wavelets are oscillating waves with finite energy and short duration (conditions 1 and 2) that are well localized in the time and frequency spaces [1]. The designers can use the well documented mother wavelets functions found in literature [2] or they can propose their own wavelets to use special characteristics to reach their purposes. So, wavelet theory can view as a flexible signal processing framework rather than a simple transformation. Z ψ dt = 0 (1) Z
+∞
cψ = 2π −∞
|Ψ(ω)|2 dω < ∞ |ω|
(2)
One type of mother wavelet can be distinguish by a property named compact support such that all derivatives of function This work was supported by CONACYT grant No. 290541 E. A. Rivera-Juarico is a MSc by National Institute of Astrophysics, Optics and Electronics, St. Ma. Tonantzintla, PUE, Mex., e-mail:
[email protected] J. M. Ram´ırez-Cort´es is a titular researcher at the Electronics Department, National Institute of Astrophysics, Optics, and Electronics, St. Ma. Tonantzintla, PUE, Mex., e-mail:
[email protected]. V. Alarcon-Aquino is titular professor at the Electronics Department, Universidad de las Americas, Puebla, Mex., e-mail:
[email protected] J. Escamilla-Ambrosio is a titular professor at the Electronics Department, Universidad Politcnica de Guanajuato, Mex., e-mail:
[email protected]
978-1-61284-1325-5/12/$26.00 ©2012 IEEE
...V2 ⊂ V1 ⊂ V0 ⊂ V−1 ⊂ V−2 ⊂ ...
(3)
∪j∈Z Vj = L2 (R)
(4)
∩j∈Z Vj = {0}
(5)
The condition 3 says that V0 is a central space which includes all subspaces Vj with j ∈ +Z and belongs in turn to a series of spaces Vj with j ∈ −Z. In addition the complex conjugate of the union of all spaces should be a function in L2 (R) as indicated by the condition 4. These spaces should be independent, and thus their intersection must be zero (condition 5). In addition, the multi-resolution appearance is a consequence of the following requirements: f ∈ Vj ⇔ f (2j · ) ∈ V0
(6)
f ∈ V0 ⇒ f (· −n) ∈ V0 , ∀n ∈ Z
(7)
That is, all spaces are scaled versions of the central space V0 (condition 6) and V0 is invariant under integer translations (condition 7). This implies that if f ∈ Vj , then f (· −2j n) ∈ Vj , for all n ∈ Z. Finally it requires that there be a function φ ∈ V0 such that φ0,n ; n ∈ Z (8) is a normalized orthogonal basis in V0 , where ∀{j, n} ∈ Z, φj,n (s) = 2−j/2 φ(2−j s − n)
(9)
This implies that {φj,n ; n ∈ Z} is a normalized orthogonal basis in Vj for all j ∈ Z. The function φ is called scaling
333
function multi-resolution analysis and equation 9 is known as dilation equation. Multi-resolution analysis can be described through a pyramidal structure that decomposes a signal into the approximations and details coefficients, creating a level of decomposition. It is required to apply the same scaling function φ and mother wavelet ψ in each level on scaled versions of the input signal obtained through the previous levels. The multi-resolution leads naturally to a hierarchical and fast computation of wavelet coefficients at different scales of a given function or sequence. B. Filters Banks for Wavelets Suppose, has been obtained the inner product of f with φ0,k , i.e. the scale j = 0. Then the calculation of hf, ψj,k i for j ≥ 1 is as follows: starting with hf, φ0,n i, estimate hf, ψ1,n i by, hf, ψj,k i =
X
gn−2k hf, φj−1,k i
(10)
n
Which can be obtained by the convolution of hf, φj−1,k i with g−n , for n ∈ Z and retaining only pairs of output samples (decimation by a factor of 2). Then hf, φ1,n i is calculated using a similar process of convolution and decimation given by, hf, φj,k i =
X
hn−2k hf, φj−1,k i
Fig. 1: Wavelet decomposition for three dimensional functions. low pass filter and high pass for each dimension respectively. The whole procedure is shown in fig. 1 where partial results are displayed in each dimension filtering: (a) original volume, (b) filtering output in x, (c) filtered output in y, and (d) filtering output in z which represents the final result of the DWT descomposition three dimensional process at one level. Fig. 2 shows the resulting filtering scheme using filter banks; (a) for x, (b) for y and (c) for z directions.
(11)
n
This can be found for all the family functions ψ and φ , which is useful for multiple levels of wavelet coefficients. The entire procedure can be viewed as the computation of approximations of f , together with the difference in information between two successive levels of approximation. The function ψ can be viewed as a high pass filter where we get the wavelet coefficients and the function φ as a low pass filter which gives us the coefficients of approximation to calculate the next level of decomposition by feeding this output to another pair of filters.This pair of filters are known as analysis filter banks. Then the next level decomposition gives us a representation of f with a version of ψ and φ scaled by a factor of 2j with j the level of decomposition. Moreover, the multiresolution analysis occurs when there are different levels or scaled versions of the input signal. In order to obtain j decomposition levels, j analysis filter banks are connected in such a way that the approximation coefficients of level j − 1 enters the level j with j > 1 , and j = 0 corresponds to the initial level of the original signal f .
2
LPF
2
LPF
2
HPF
f(n)
LPF
HPF
2 2
2
LPF
2
HPF
2
LPF
2
LPF
2
HPF
(a)
2
HPF
(b)
2
HPF
2
LPF HPF
(c)
2
LLL
LLH LHL
LHH
HLL HLH HHL HHH
Fig. 2: DWT filtering scheme for functions in three dimensions. In a general point of view, n-dimensional functions can be processed by a similar procedure of filtering in every dimension.
C. DWT processing on Three Dimensional data sets Processing functions in three dimensions can be considered as a combination of one-dimensional DWT applied to each of the dimensions of the function to be processed. So, what is considered a data volume of N1 N2 N3 , is divided into eight sub-volumes with dimensions N21 N22 N23 each. The following nomenclature is used for sub-volumes: LLL, LLH, LHL, LHH, HLL, HLH, HHL, HHH. Where L and H denote the outputs of 978-1-61284-1325-5/12/$26.00 ©2012 IEEE
D. Distributed Arithmetic Equation 12 (linear convolution) can be modified with distributed arithmetic (DA) to process signed numbers in 2’s complement [4].
334
y[n] = f [n] ∗ x[n] =
L−1 X k=0
f [k]x[n − k]
(12)
In 2’s complement, the most significant bit is used to distinguish between positive and negative numbers. If |x[n]| < 1, then it can be represented by, x[i] = −x0 [i] +
B−1 X
−b
xb [i]2
(13)
b=1
With b = 0 representing the most significant bit (and sign bit) and b = B − 1 least significant bit. The distribution of the architecture for the linear convolution is given by, y[i] =
N −1 X
c[n]x[i − n]
II. M ATERIALS AND M ETHODS A. Target Technology and Tools Field programmable gate array (FPGA) Spartan3E XC3S1200e-4fg320 of Xilinx Inc. over Nexys2 development board of Digilent Inc. was selected as primary target technology. Brief overview of this device is shown in table II. TABLE II: OVERVIEW XC3S1200e-4fg320.
(14)
Gates Equivalent Logic Cells Total CLBs Total Slices Distributed RAM bits Block RAM bits Dedicated Multipliers DCMs Inputs/Outputs Differential Inputs/Outputs pairs
n=0
y[i] =
N −1 X
" c[n] −x0 [i − n] +
n=0
y[i] = −
y[i] = −
N −1 X
N −1 X
c[n]x0 [i − n] +
c[n]
B−1 X
n=0
n=0
b=1
N −1 X
N −1 X
B−1 X
c[n]x0 [i − n] +
N −1 X
N −1 X
c[n]
n=0
c[n]x0 [i − n] +
n=0
y[i] = −
xb [i − n]2−b
(15)
b=1
n=0
y[i] = −
#
B−1 X
xb [i − n]2−b (16)
−b
xb [i − n]2
(17)
b=1
N −1 B−1 X X
c[n]xb [i − n]2−b (18)
n=0 b=1
c[n]x0 [i − n] +
B−1 X
n=0
b=1
2−b
N −1 X
c[n]xb [i − n] (19) {z
hc[n],xb [i−n]i
}
TABLE I: LU T CONTENTS IN GENERAL DA SCHEME
x1[i]
xB-1[i]
x0[i-1]
x1[i-1]
xB-1[i-1]
xb [i − 1] 0 0 0 0 .. . 1
LUT x0[i-N+1]
1200K 19512 2168 8672 136k 504K 28 8 304 124
Design tools used were MATLAB R2010b (Wavelet Toolbox 4.6, Filter Design Toolbox 4.7.1 and Filter Design HDL Coder 2.7) from The MathWorks Inc. for wavelet filter banks design, DA HDL code generation and user interface for running and debugging purposes. Other tools used were ISE WebPack 12.4 from Xilinx Inc. for implementation of designed filters, ADEPT 2.6.1 System and Adept 2.0.1 SDK from Digilent Inc. for programming and data transmission from the register bus implemented on the FPGA system.
y[i]
+/-
x1[i-N+1] xB-1[i-N+1]
2-1
{
+ 0≤b
-
b= B-1
Fig. 3: Structure based on DA. 978-1-61284-1325-5/12/$26.00 ©2012 IEEE
First, both Daubechies wavelets coefficients filters bank db1 to db8 are loaded into the MATLAB environment by the Wavelet Toolbox and passed as parameters to the Filter Design Toolbox to generate a FIR structures. Table III specifies the requirements for this system. TABLE III: PARAMETERS OF THE FILTERS
xb [i − N + 1] 0 1 0 1 .. . 1
... ... ... ... ... .. . ...
Register
x0[i]
Spartan3E
B. Design Method
LU T contents from hc[n], xb [i − n]i are shown in table I. When processing the vector x0 the accumulator performs a subtraction operation and the result is obtained after B cycles. This process is represented through fig. 3.
xb [i] 0 0 0 0 .. . 1
THE
n=0
|
c[n]xb [i − n] 0 c[N − 1] c[N − 2] c[N − 2] + c[N − 1] .. . c[0] + c[1] + ... + c[n − 1]
OF
Filter Structure Arithmetic Coefficients Input Output Rounding Mode Over-flow mode
Direct-Form FIR Fixed Point 16 bits in range [-1 1) 16 bits in range [-1 1) 16 bits in range [-2 2) Convergent Saturate
Then, LUT partition is calculated by considering the length of filters and LUTs of 4 inputs.The sum of the partitions must be equal to the length of the filter. Finally, filters are coded in VHDL language by the Filter Design HDL Coder for DA and parallel architectures. Testbench files are also generated with impulse, chirp and white noise signals for both architectures types. Once the filters designed, system is implemented with the ISE WebPack tool. Epp bus with 8 bits registers was implemented with USB-EPP and EPP-Wavelet interfaces along with a set of eight wavelet analysis filters banks, and decimation, data path control and synchronization modules. Each filter bank is similar to filter prototype Wdb1DA shown in fig. 4. 335
Fig. 4: Wavelet analysis filter bank module Wdb1DA with signals and routings.
Fig. 7: Implemented system resulting; USB interface, wavelet interface and wavelet module C. n-dimensional processing Developed management and administration routines provide the pyramidal structure of iterative computation for three dimensional signals. The routines are based on the structures proposed in [5]. Kernel of filtering convolution was named waveletFPGA1Dv2 function. The waveletFPGA1Dv2 function is responsible for sending the data to process into the FPGA and retrieve the results from it. The first is to reset the wavelet module, select the wavelet to use and prepares the data to be sent byte to byte by a USB channel using dynamic link library DPCUTIL.DLL from the Adept 2.0.1 SDK. System retrieve the data when the counter is even. Latency of the module and the length of the selected filter were considered to process an input vector and obtain all the coefficients that are generated. III. R ESULTS
Fig. 5: Module WdbNda01; Set of eight wavelet analysis filters banks with multiplexing data path modules for wavelet selection. The implementation include a set of eight wavelet analysis filter banks with the necessary control for multiplexing data path as shown in fig. 5. Total filters implemented was sixteen, i.e. low pass filter and high pass filter for each bank. Each filter ranges from first to eighth order. Wavelet Module implemented with synchronization and decimation modules are shown in fig. 6. EnablePulse submodule generate a pulse of 16 clock cycles to process every data when enable signal goes high in one clock cycle. Signal generated from this sub-module serves as clock to the decimation sub-modules decimationby2. Module also outputs a decimation state signal. Finally the entire system designed for the target FPGA is illustrated in fig. 7.
Both wavelet analysis filter bank architectures kernels were synthesized one by one in ISE WebPack. Results of device utilization for each filter bank is presented in tables IV and V for DA and parallel architectures respectively. Particular synthesis options were setup to ensure both architecture were implemented without special or dedicated FPGA resources like dedicated multipliers or embedded shift register in Spartan3E to ensure portability through different technologies. TABLE IV: SUMMARY ARCHITECTURES N Flip Flops Total LUTs Logic Routing Shifters Slices
1 232 329 287 40 2 220
2 231 363 322 35 6 230
RESULTS
WdbNDA filter banks 3 4 5 6 239 252 259 267 396 439 488 515 351 391 425 458 35 34 45 35 10 14 18 22 249 275 297 312
FOR
7 277 553 493 34 26 334
DA
8 291 583 519 34 30 355
TABLE V: SUMMARY RESULTS FOR PARALLEL ARCHITECTURES N Flip Flops Total LUTs - Logic - Route Slices
Fig. 6: Implemented wavelet module 978-1-61284-1325-5/12/$26.00 ©2012 IEEE
336
1 64 803 660 143 430
2 96 1604 1425 179 854
3 128 2082 1856 226 1114
WdbNP Filter Banks 4 5 6 164 192 224 2827 3292 3860 2520 2944 3486 307 348 374 1517 1771 2079
7 256 4240 3858 382 2290
8 288 5281 4836 445 2833
The architecture based on DA provides a better relationship in the use of FPGA resources and is even possible to implement the 8 filter banks occupying less than 26 % of the slices available in the FPGA while the parallel architecture exceeds available resources. Simulations of testbenches generated are presented in fig. 8, 9, 10 and 11 for high and low pass filters of db8 wavelet in DA and parallel architectures respectively. This test process a vector data with the characteristics of such signals within the input range and compare the outputs obtained by the filter in VHDL with the expected output generated by MATLAB.
Fig. 12 shows the simulation for the db1H filter in DA after processing 2056 words, the computing process is activated at 70 ns after the start of the simulation. The indicator recorded a time of 658.01 microseconds when getting the last word useful output. This is 3 samples after the last word given input latency of 3 samples of design. This is 657.94/2056 = 3.2001e − 07 seconds/sample equivalent to 3124905 samples/sec. The timing results are identical to the filter db1L in fig. 13. Tests with designed architectures provide similar results in the processing speed that can reach the module.
Fig. 12: Filter Simulation Speed db1H in DA Fig. 8: DAdb8 H module simulation.
Fig. 13: Filter Simulation Speed db1L in DA
Fig. 9: DAdb8 L module simulation.
Fig. 10: Pdb8 H module simulation.
Fig. 14 shows a one-dimensional test signal of 1024 elements with two Sinc functions with different properties being processed by the FPGA and a function encoded in MATLAB such that compute the same function. The results are displayed in figure 15. In the case of a two-dimensional signal, as shown in fig. 16a (a picture of 64x64 pixels), we have the results in fig. 16b. All figures selected from db1 wavelet analysis filter bank tests. For three-dimensional signals, we develop a small user interface vis_3Dby2Dplanes (Fig. 17) that allows to process volumes of a series of 3D images in Analyze 5.7 format developed by Mayo Clinic, for magnetic resonance imaging data storing. IV. C ONCLUSION
Fig. 11: Pdb8 L module simulation. 978-1-61284-1325-5/12/$26.00 ©2012 IEEE
In this work, design and implementation of a 3D DWT based on a FPGA platform have been described. Daubechies wavelet analysis bank filters dbN with N = 1, 2, ..., 8 in fully parallel and DA architectures are compared. As can be seen in the simulations and results, the architecture based on DA obtains a higher efficiency than the parallel architecture in terms of resource utilization. The results show that the Daubechies wavelet, one of the most widely used wavelet can be implemented efficiently in an FPGA using fewer resources. 337
Fig. 14: Signal test.
Fig. 17: Viewing sub-volumes of wavelet coefficients in developed graphical user interface.
ACKNOWLEDGMENT The authors acknowledge the financial support from the Mexican National Council for Science and Technology (CONACYT). R EFERENCES
Fig. 15: FPGA and MATLAB computing results for signal test of fig. 14.
This expand the framework commonly used in the literature ([6], [7], [8], [9], [10]), which mainly implement one or two wavelet filters. In addition, resource utilization measured in this work extends other studies like [11] by consider words of 16 bits, which usually had worked with up to 8 bits. We have seen that the FPGA technology allows parallel computing, thus, further improvements also include more than one module for process two or more data channels simultaneously and improve in architecture using techniques as dynamic partial reconfigurable systems as proposed in [12].
(a)
[1] I. Daubechies, Ten Lectures on Wavelets (CBMS-NSF Regional Conference Series in Applied Mathematics). SIAM: Society for Industrial and Applied Mathematics, 1992. [2] P. S. Addison, The Illustrated Wavelet Transform Handbook - Introductory Theory and Applications in Science, Engineering, Medicine and Finance. Institute of Physics Publishing, 2002. [3] R. M. Rao and A. S. Bopardikar, Wavelet Transforms: Introduction to Theory & Applications. Prentice Hall PTR, 1998. [4] U. Meyer-Baese, Digital Signal Processing with Field Programmable Gate Arrays (Signals and Communication Technology). Springer, 2007. [5] S. Cai, K. Li, and I. Selesnick, “Matlab implementation of wavelet transform,” Electrical Engineering at Polytechnic University, NY, 2003. [6] P. Longa and A. Miri, “Area-efficient fir filter design on fpgas using distributed arithmetic,” in Signal Processing and Information Technology, 2006 IEEE International Symposium on, pp. 248 –252, 2006. [7] P. Longa, A. Miri, and M. Bolic, “A flexible design of filterbank architectures for discrete wavelet transforms,” in Acoustics, Speech and Signal Processing, 2007. ICASSP 2007. IEEE International Conference on, vol. 3, pp. III–1441 –III–1444, 2007. [8] M. Sharif, A. Sazish, and A. Amira, “An efficient algorithm and architecture for medical image segmentation and tumour detection,” in Biomedical Circuits and Systems Conference, 2008. BioCAS 2008. IEEE, pp. 157 –160, 2008. [9] R. Jiang and D. Crookes, “Fpga implementation of 3d discrete wavelet transform for real-time medical imaging,” in Circuit Theory and Design, 2007. ECCTD 2007. 18th European Conference on, pp. 519 –522, 2007. [10] S. Ismail, A. Salama, and M. Abu-ElYazeed, “Fpga implementation of an efficient 3d-wt temporal decomposition algorithm for video compression,” in Signal Processing and Information Technology, 2007 IEEE International Symposium on, pp. 154 –159, 2007. [11] Y. Li, C. Peng, D. Yu, and X. Zhang, “The implementation methods of high speed fir filter on fpga,” in Solid-State and Integrated-Circuit Technology, 2008. ICSICT 2008. 9th International Conference on, pp. 2216 –2219, oct. 2008. [12] X.-W. Wang, W.-N. Chen, C.-L. Peng, and H. jun You, “Hardwaresoftware monitoring techniques for dynamic partial reconfigurable embedded systems,” in Embedded Software and Systems Symposia, 2008. ICESS Symposia ’08. International Conference on, pp. 113 –119, 2008.
(b)
Fig. 16: Image test and result coefficients from FPGA. 978-1-61284-1325-5/12/$26.00 ©2012 IEEE
338