Design and Implementation of the Discrete Wavelet Transform on an FPGA Platform to Process Data Sets of up to Three Dimensions E. A. Rivera-Juarico, Student Member, IEEE, J. M. Ram´ırez-Cort´es, Senior Member, IEEE, V. Alarcon-Aquino and J. Escamilla-Ambrosio, Member, IEEE

Abstract—The aim of this work is the design and implementation of a DWT architecture FPGA-based platform for up to three dimensional signal processing. First, filter banks are designed using a distributed arithmetic technique. Then, we design controllers, interfaces and protocols that handle, transmit and sequence all the data during the computing process. Data is sent via USB to the FPGA and the user interface is programmed in MATLAB. A graphical user interface manages the system operation and displays the results on a PC. Designed filters are compared with a fully parallel architecture in relation to the number of gates used, speed, and algorithm performance. Keywords—Daubechies, Distributed Arithmetic, Filter Design.

ψ up to order n exist and are bounded. A set of orthogonal functions with compact support, widely used now in the wavelet field, and a method to obtain the corresponding filter coefficients was proposed by Ingrid Daubechies [3]. Several interesting properties are listed below. A. Multiresolution A multi-resolution or multi-scale scheme consists of a sequence of successive approximation spaces Vj with the following conditions:

I. I NTRODUCTION Wavelet signal processing is a technique that can be used to analyze the temporal and spectral properties of non-stationary signals. A prototype function called Mother Wavelet (ψ) is stretched and shifted to obtain an orthogonal function set that is used to analyze in a multiresolution approach the time-scale energy distribution of signals. Mother Wavelets can be designed to achieve different requirements. Essentially, mother wavelets are oscillating waves with finite energy and short duration (conditions 1 and 2) that are well localized in the time and frequency spaces [1]. The designers can use the well documented mother wavelets functions found in literature [2] or they can propose their own wavelets to use special characteristics to reach their purposes. So, wavelet theory can view as a flexible signal processing framework rather than a simple transformation. Z ψ dt = 0 (1) Z

+∞

cψ = 2π −∞

|Ψ(ω)|2 dω < ∞ |ω|

(2)

One type of mother wavelet can be distinguish by a property named compact support such that all derivatives of function This work was supported by CONACYT grant No. 290541 E. A. Rivera-Juarico is a MSc by National Institute of Astrophysics, Optics and Electronics, St. Ma. Tonantzintla, PUE, Mex., e-mail: [email protected] J. M. Ram´ırez-Cort´es is a titular researcher at the Electronics Department, National Institute of Astrophysics, Optics, and Electronics, St. Ma. Tonantzintla, PUE, Mex., e-mail: [email protected]. V. Alarcon-Aquino is titular professor at the Electronics Department, Universidad de las Americas, Puebla, Mex., e-mail: [email protected] J. Escamilla-Ambrosio is a titular professor at the Electronics Department, Universidad Politcnica de Guanajuato, Mex., e-mail: [email protected]

978-1-61284-1325-5/12/$26.00 ©2012 IEEE

...V2 ⊂ V1 ⊂ V0 ⊂ V−1 ⊂ V−2 ⊂ ...

(3)

∪j∈Z Vj = L2 (R)

(4)

∩j∈Z Vj = {0}

(5)

The condition 3 says that V0 is a central space which includes all subspaces Vj with j ∈ +Z and belongs in turn to a series of spaces Vj with j ∈ −Z. In addition the complex conjugate of the union of all spaces should be a function in L2 (R) as indicated by the condition 4. These spaces should be independent, and thus their intersection must be zero (condition 5). In addition, the multi-resolution appearance is a consequence of the following requirements: f ∈ Vj ⇔ f (2j · ) ∈ V0

(6)

f ∈ V0 ⇒ f (· −n) ∈ V0 , ∀n ∈ Z

(7)

That is, all spaces are scaled versions of the central space V0 (condition 6) and V0 is invariant under integer translations (condition 7). This implies that if f ∈ Vj , then f (· −2j n) ∈ Vj , for all n ∈ Z. Finally it requires that there be a function φ ∈ V0 such that φ0,n ; n ∈ Z (8) is a normalized orthogonal basis in V0 , where ∀{j, n} ∈ Z, φj,n (s) = 2−j/2 φ(2−j s − n)

(9)

This implies that {φj,n ; n ∈ Z} is a normalized orthogonal basis in Vj for all j ∈ Z. The function φ is called scaling

333

function multi-resolution analysis and equation 9 is known as dilation equation. Multi-resolution analysis can be described through a pyramidal structure that decomposes a signal into the approximations and details coefficients, creating a level of decomposition. It is required to apply the same scaling function φ and mother wavelet ψ in each level on scaled versions of the input signal obtained through the previous levels. The multi-resolution leads naturally to a hierarchical and fast computation of wavelet coefficients at different scales of a given function or sequence. B. Filters Banks for Wavelets Suppose, has been obtained the inner product of f with φ0,k , i.e. the scale j = 0. Then the calculation of hf, ψj,k i for j ≥ 1 is as follows: starting with hf, φ0,n i, estimate hf, ψ1,n i by, hf, ψj,k i =

X

gn−2k hf, φj−1,k i

(10)

n

Which can be obtained by the convolution of hf, φj−1,k i with g−n , for n ∈ Z and retaining only pairs of output samples (decimation by a factor of 2). Then hf, φ1,n i is calculated using a similar process of convolution and decimation given by, hf, φj,k i =

X

hn−2k hf, φj−1,k i

Fig. 1: Wavelet decomposition for three dimensional functions. low pass filter and high pass for each dimension respectively. The whole procedure is shown in fig. 1 where partial results are displayed in each dimension filtering: (a) original volume, (b) filtering output in x, (c) filtered output in y, and (d) filtering output in z which represents the final result of the DWT descomposition three dimensional process at one level. Fig. 2 shows the resulting filtering scheme using filter banks; (a) for x, (b) for y and (c) for z directions.

(11)

n

This can be found for all the family functions ψ and φ , which is useful for multiple levels of wavelet coefficients. The entire procedure can be viewed as the computation of approximations of f , together with the difference in information between two successive levels of approximation. The function ψ can be viewed as a high pass filter where we get the wavelet coefficients and the function φ as a low pass filter which gives us the coefficients of approximation to calculate the next level of decomposition by feeding this output to another pair of filters.This pair of filters are known as analysis filter banks. Then the next level decomposition gives us a representation of f with a version of ψ and φ scaled by a factor of 2j with j the level of decomposition. Moreover, the multiresolution analysis occurs when there are different levels or scaled versions of the input signal. In order to obtain j decomposition levels, j analysis filter banks are connected in such a way that the approximation coefficients of level j − 1 enters the level j with j > 1 , and j = 0 corresponds to the initial level of the original signal f .

2

LPF

2

LPF

2

HPF

f(n)

LPF

HPF

2 2

2

LPF

2

HPF

2

LPF

2

LPF

2

HPF

(a)

2

HPF

(b)

2

HPF

2

LPF HPF

(c)

2

LLL

LLH LHL

LHH

HLL HLH HHL HHH

Fig. 2: DWT filtering scheme for functions in three dimensions. In a general point of view, n-dimensional functions can be processed by a similar procedure of filtering in every dimension.

C. DWT processing on Three Dimensional data sets Processing functions in three dimensions can be considered as a combination of one-dimensional DWT applied to each of the dimensions of the function to be processed. So, what is considered a data volume of N1 N2 N3 , is divided into eight sub-volumes with dimensions N21 N22 N23 each. The following nomenclature is used for sub-volumes: LLL, LLH, LHL, LHH, HLL, HLH, HHL, HHH. Where L and H denote the outputs of 978-1-61284-1325-5/12/$26.00 ©2012 IEEE

D. Distributed Arithmetic Equation 12 (linear convolution) can be modified with distributed arithmetic (DA) to process signed numbers in 2’s complement [4].

334

y[n] = f [n] ∗ x[n] =

L−1 X k=0

f [k]x[n − k]

(12)

In 2’s complement, the most significant bit is used to distinguish between positive and negative numbers. If |x[n]| < 1, then it can be represented by, x[i] = −x0 [i] +

B−1 X

−b

xb [i]2

(13)

b=1

With b = 0 representing the most significant bit (and sign bit) and b = B − 1 least significant bit. The distribution of the architecture for the linear convolution is given by, y[i] =

N −1 X

c[n]x[i − n]

II. M ATERIALS AND M ETHODS A. Target Technology and Tools Field programmable gate array (FPGA) Spartan3E XC3S1200e-4fg320 of Xilinx Inc. over Nexys2 development board of Digilent Inc. was selected as primary target technology. Brief overview of this device is shown in table II. TABLE II: OVERVIEW XC3S1200e-4fg320.

(14)

Gates Equivalent Logic Cells Total CLBs Total Slices Distributed RAM bits Block RAM bits Dedicated Multipliers DCMs Inputs/Outputs Differential Inputs/Outputs pairs

n=0

y[i] =

N −1 X

" c[n] −x0 [i − n] +

n=0

y[i] = −

y[i] = −

N −1 X

N −1 X

c[n]x0 [i − n] +

c[n]

B−1 X

n=0

n=0

b=1

N −1 X

N −1 X

B−1 X

c[n]x0 [i − n] +

N −1 X

N −1 X

c[n]

n=0

c[n]x0 [i − n] +

n=0

y[i] = −

xb [i − n]2−b

(15)

b=1

n=0

y[i] = −

#

B−1 X

xb [i − n]2−b (16)

−b

xb [i − n]2

(17)

b=1

N −1 B−1 X X

c[n]xb [i − n]2−b (18)

n=0 b=1

c[n]x0 [i − n] +

B−1 X

n=0

b=1

2−b

N −1 X

c[n]xb [i − n] (19) {z

hc[n],xb [i−n]i

}

TABLE I: LU T CONTENTS IN GENERAL DA SCHEME

x1[i]

xB-1[i]

x0[i-1]

x1[i-1]

xB-1[i-1]

xb [i − 1] 0 0 0 0 .. . 1

LUT x0[i-N+1]

1200K 19512 2168 8672 136k 504K 28 8 304 124

Design tools used were MATLAB R2010b (Wavelet Toolbox 4.6, Filter Design Toolbox 4.7.1 and Filter Design HDL Coder 2.7) from The MathWorks Inc. for wavelet filter banks design, DA HDL code generation and user interface for running and debugging purposes. Other tools used were ISE WebPack 12.4 from Xilinx Inc. for implementation of designed filters, ADEPT 2.6.1 System and Adept 2.0.1 SDK from Digilent Inc. for programming and data transmission from the register bus implemented on the FPGA system.

y[i]

+/-

x1[i-N+1] xB-1[i-N+1]

2-1

{

+ 0≤b
-

b= B-1

Fig. 3: Structure based on DA. 978-1-61284-1325-5/12/$26.00 ©2012 IEEE

First, both Daubechies wavelets coefficients filters bank db1 to db8 are loaded into the MATLAB environment by the Wavelet Toolbox and passed as parameters to the Filter Design Toolbox to generate a FIR structures. Table III specifies the requirements for this system. TABLE III: PARAMETERS OF THE FILTERS

xb [i − N + 1] 0 1 0 1 .. . 1

... ... ... ... ... .. . ...

Register

x0[i]

Spartan3E

B. Design Method

LU T contents from hc[n], xb [i − n]i are shown in table I. When processing the vector x0 the accumulator performs a subtraction operation and the result is obtained after B cycles. This process is represented through fig. 3.

xb [i] 0 0 0 0 .. . 1

THE

n=0

|

c[n]xb [i − n] 0 c[N − 1] c[N − 2] c[N − 2] + c[N − 1] .. . c[0] + c[1] + ... + c[n − 1]

OF

Filter Structure Arithmetic Coefficients Input Output Rounding Mode Over-flow mode

Direct-Form FIR Fixed Point 16 bits in range [-1 1) 16 bits in range [-1 1) 16 bits in range [-2 2) Convergent Saturate

Then, LUT partition is calculated by considering the length of filters and LUTs of 4 inputs.The sum of the partitions must be equal to the length of the filter. Finally, filters are coded in VHDL language by the Filter Design HDL Coder for DA and parallel architectures. Testbench files are also generated with impulse, chirp and white noise signals for both architectures types. Once the filters designed, system is implemented with the ISE WebPack tool. Epp bus with 8 bits registers was implemented with USB-EPP and EPP-Wavelet interfaces along with a set of eight wavelet analysis filters banks, and decimation, data path control and synchronization modules. Each filter bank is similar to filter prototype Wdb1DA shown in fig. 4. 335

Fig. 4: Wavelet analysis filter bank module Wdb1DA with signals and routings.

Fig. 7: Implemented system resulting; USB interface, wavelet interface and wavelet module C. n-dimensional processing Developed management and administration routines provide the pyramidal structure of iterative computation for three dimensional signals. The routines are based on the structures proposed in [5]. Kernel of filtering convolution was named waveletFPGA1Dv2 function. The waveletFPGA1Dv2 function is responsible for sending the data to process into the FPGA and retrieve the results from it. The first is to reset the wavelet module, select the wavelet to use and prepares the data to be sent byte to byte by a USB channel using dynamic link library DPCUTIL.DLL from the Adept 2.0.1 SDK. System retrieve the data when the counter is even. Latency of the module and the length of the selected filter were considered to process an input vector and obtain all the coefficients that are generated. III. R ESULTS

Fig. 5: Module WdbNda01; Set of eight wavelet analysis filters banks with multiplexing data path modules for wavelet selection. The implementation include a set of eight wavelet analysis filter banks with the necessary control for multiplexing data path as shown in fig. 5. Total filters implemented was sixteen, i.e. low pass filter and high pass filter for each bank. Each filter ranges from first to eighth order. Wavelet Module implemented with synchronization and decimation modules are shown in fig. 6. EnablePulse submodule generate a pulse of 16 clock cycles to process every data when enable signal goes high in one clock cycle. Signal generated from this sub-module serves as clock to the decimation sub-modules decimationby2. Module also outputs a decimation state signal. Finally the entire system designed for the target FPGA is illustrated in fig. 7.

Both wavelet analysis filter bank architectures kernels were synthesized one by one in ISE WebPack. Results of device utilization for each filter bank is presented in tables IV and V for DA and parallel architectures respectively. Particular synthesis options were setup to ensure both architecture were implemented without special or dedicated FPGA resources like dedicated multipliers or embedded shift register in Spartan3E to ensure portability through different technologies. TABLE IV: SUMMARY ARCHITECTURES N Flip Flops Total LUTs Logic Routing Shifters Slices

1 232 329 287 40 2 220

2 231 363 322 35 6 230

RESULTS

WdbNDA filter banks 3 4 5 6 239 252 259 267 396 439 488 515 351 391 425 458 35 34 45 35 10 14 18 22 249 275 297 312

FOR

7 277 553 493 34 26 334

DA

8 291 583 519 34 30 355

TABLE V: SUMMARY RESULTS FOR PARALLEL ARCHITECTURES N Flip Flops Total LUTs - Logic - Route Slices

Fig. 6: Implemented wavelet module 978-1-61284-1325-5/12/$26.00 ©2012 IEEE

336

1 64 803 660 143 430

2 96 1604 1425 179 854

3 128 2082 1856 226 1114

WdbNP Filter Banks 4 5 6 164 192 224 2827 3292 3860 2520 2944 3486 307 348 374 1517 1771 2079

7 256 4240 3858 382 2290

8 288 5281 4836 445 2833

The architecture based on DA provides a better relationship in the use of FPGA resources and is even possible to implement the 8 filter banks occupying less than 26 % of the slices available in the FPGA while the parallel architecture exceeds available resources. Simulations of testbenches generated are presented in fig. 8, 9, 10 and 11 for high and low pass filters of db8 wavelet in DA and parallel architectures respectively. This test process a vector data with the characteristics of such signals within the input range and compare the outputs obtained by the filter in VHDL with the expected output generated by MATLAB.

Fig. 12 shows the simulation for the db1H filter in DA after processing 2056 words, the computing process is activated at 70 ns after the start of the simulation. The indicator recorded a time of 658.01 microseconds when getting the last word useful output. This is 3 samples after the last word given input latency of 3 samples of design. This is 657.94/2056 = 3.2001e − 07 seconds/sample equivalent to 3124905 samples/sec. The timing results are identical to the filter db1L in fig. 13. Tests with designed architectures provide similar results in the processing speed that can reach the module.

Fig. 12: Filter Simulation Speed db1H in DA Fig. 8: DAdb8 H module simulation.

Fig. 13: Filter Simulation Speed db1L in DA

Fig. 9: DAdb8 L module simulation.

Fig. 10: Pdb8 H module simulation.

Fig. 14 shows a one-dimensional test signal of 1024 elements with two Sinc functions with different properties being processed by the FPGA and a function encoded in MATLAB such that compute the same function. The results are displayed in figure 15. In the case of a two-dimensional signal, as shown in fig. 16a (a picture of 64x64 pixels), we have the results in fig. 16b. All figures selected from db1 wavelet analysis filter bank tests. For three-dimensional signals, we develop a small user interface vis_3Dby2Dplanes (Fig. 17) that allows to process volumes of a series of 3D images in Analyze 5.7 format developed by Mayo Clinic, for magnetic resonance imaging data storing. IV. C ONCLUSION

Fig. 11: Pdb8 L module simulation. 978-1-61284-1325-5/12/$26.00 ©2012 IEEE

In this work, design and implementation of a 3D DWT based on a FPGA platform have been described. Daubechies wavelet analysis bank filters dbN with N = 1, 2, ..., 8 in fully parallel and DA architectures are compared. As can be seen in the simulations and results, the architecture based on DA obtains a higher efficiency than the parallel architecture in terms of resource utilization. The results show that the Daubechies wavelet, one of the most widely used wavelet can be implemented efficiently in an FPGA using fewer resources. 337

Fig. 14: Signal test.

Fig. 17: Viewing sub-volumes of wavelet coefficients in developed graphical user interface.

ACKNOWLEDGMENT The authors acknowledge the financial support from the Mexican National Council for Science and Technology (CONACYT). R EFERENCES

Fig. 15: FPGA and MATLAB computing results for signal test of fig. 14.

This expand the framework commonly used in the literature ([6], [7], [8], [9], [10]), which mainly implement one or two wavelet filters. In addition, resource utilization measured in this work extends other studies like [11] by consider words of 16 bits, which usually had worked with up to 8 bits. We have seen that the FPGA technology allows parallel computing, thus, further improvements also include more than one module for process two or more data channels simultaneously and improve in architecture using techniques as dynamic partial reconfigurable systems as proposed in [12].

(a)

[1] I. Daubechies, Ten Lectures on Wavelets (CBMS-NSF Regional Conference Series in Applied Mathematics). SIAM: Society for Industrial and Applied Mathematics, 1992. [2] P. S. Addison, The Illustrated Wavelet Transform Handbook - Introductory Theory and Applications in Science, Engineering, Medicine and Finance. Institute of Physics Publishing, 2002. [3] R. M. Rao and A. S. Bopardikar, Wavelet Transforms: Introduction to Theory & Applications. Prentice Hall PTR, 1998. [4] U. Meyer-Baese, Digital Signal Processing with Field Programmable Gate Arrays (Signals and Communication Technology). Springer, 2007. [5] S. Cai, K. Li, and I. Selesnick, “Matlab implementation of wavelet transform,” Electrical Engineering at Polytechnic University, NY, 2003. [6] P. Longa and A. Miri, “Area-efficient fir filter design on fpgas using distributed arithmetic,” in Signal Processing and Information Technology, 2006 IEEE International Symposium on, pp. 248 –252, 2006. [7] P. Longa, A. Miri, and M. Bolic, “A flexible design of filterbank architectures for discrete wavelet transforms,” in Acoustics, Speech and Signal Processing, 2007. ICASSP 2007. IEEE International Conference on, vol. 3, pp. III–1441 –III–1444, 2007. [8] M. Sharif, A. Sazish, and A. Amira, “An efficient algorithm and architecture for medical image segmentation and tumour detection,” in Biomedical Circuits and Systems Conference, 2008. BioCAS 2008. IEEE, pp. 157 –160, 2008. [9] R. Jiang and D. Crookes, “Fpga implementation of 3d discrete wavelet transform for real-time medical imaging,” in Circuit Theory and Design, 2007. ECCTD 2007. 18th European Conference on, pp. 519 –522, 2007. [10] S. Ismail, A. Salama, and M. Abu-ElYazeed, “Fpga implementation of an efficient 3d-wt temporal decomposition algorithm for video compression,” in Signal Processing and Information Technology, 2007 IEEE International Symposium on, pp. 154 –159, 2007. [11] Y. Li, C. Peng, D. Yu, and X. Zhang, “The implementation methods of high speed fir filter on fpga,” in Solid-State and Integrated-Circuit Technology, 2008. ICSICT 2008. 9th International Conference on, pp. 2216 –2219, oct. 2008. [12] X.-W. Wang, W.-N. Chen, C.-L. Peng, and H. jun You, “Hardwaresoftware monitoring techniques for dynamic partial reconfigurable embedded systems,” in Embedded Software and Systems Symposia, 2008. ICESS Symposia ’08. International Conference on, pp. 113 –119, 2008.

(b)

Fig. 16: Image test and result coefficients from FPGA. 978-1-61284-1325-5/12/$26.00 ©2012 IEEE

338

Design and Implementation of the Discrete Wavelet ...

Tonantzintla, PUE, Mex., e-mail: [email protected]. J. M. Ramırez-Cortés is a titular researcher at the Electronics Department,. National Institute of Astrophysics, Optics, and Electronics, St. Ma. Tonantzintla, PUE, Mex., e-mail: jmram@inaoep.mx. V. Alarcon-Aquino is titular professor at the Electronics Department,. Universidad ...

1MB Sizes 2 Downloads 234 Views

Recommend Documents

Design and implementation of Interactive ...
Processing can be deployed within the development environment, in Java projects, as well as in HTML/JSP pages in tags. For web pages integration, processing.js extension is needed. Its main purpose is to translate the whole sketch into the Javascrip

The design and implementation of calmRISC32 floating ...
Department of Computer Science, Yonsei University, Seoul 120-749 Korea. **MCU Team ... and low power micro-controller, not high performance micm- processor. ... shows the architecture of CalmRISC43 FPU and section 3 de- scribes the ...

Design and Implementation of the Brazilian Information System on ...
archive, and analyze data on spatial and temporal distribution of species along with information about their surrounding environment [1–2]. It is also important to ...

The Design and Implementation of an AFP/AFS Protocol ... - CiteSeerX
The translator is designed to export AFS and UNIX local file system ... using the AppleTalk Filing Protocol (AFP), is the native Macintosh file-sharing mech- .... (NBP), a file service (AFP), and additional print services to the Macintosh (PAP).

The Design and Implementation of a Large-Scale ...
a quadratic analytical initial step serving to create a quick coarse placement ..... GORDIAN [35] is another tool that uses a classical quadratic objective function ...... porting environment of software components with higher levels of functionality

The Design and Implementation of a Large-Scale ...
Figure 2.3: GORDIAN: Center of Gravity Constraints: The average location of ..... as via a distributable solution framework for both global and detailed placement phases. ...... BonnPlace calls this step repartitioning, we call it re-warping.

Design and Implementation of High Performance and Availability Java ...
compute engine object that is exported by one JAVA RMI (JRMI) server, it will take ... addition, to build such a system, reliable multicast communication, grid ...

Design and Implementation of High Performance and Availability Java ...
compute engine object that is exported by one JAVA RMI (JRMI) server, it will take .... Test an application, such as a huge computation on this system. 2. Test the ...

Design and implementation of Interactive visualization of GHSOM ...
presented the GHSOM (Growing Hierarchical Self Organizing Map) algorithm, which is an extension of the standard ... frequently labeled as text mining) is in comparison with classic methods of knowledge discovery in .... Portal provides a coherent sys

Page 1 Programming Languages Design and Implementation ...
Include. C. ) software simulation. : (. ) .... software simulation. (. ). 24 я я я ...... SiP j i. C. SB. SA. S end i output y output xx j begin integer j char y integer. xP ... Global param begin param integer param. SuB procedure. List array. In

Design and Implementation Statewide Pandemic Triage Line.pdf ...
Page 3 of 9. Design and Implementation Statewide Pandemic Triage Line.pdf. Design and Implementation Statewide Pandemic Triage Line.pdf. Open. Extract.

Relational Database Design and Implementation ...
[Read PDF] Relational Database Design and ... administration, this book provides practical information necessary to develop a design and management scheme ...

database systems design implementation and ...
database systems design implementation and management 10th edition contains important information and a detailed explanation about database systems ...

design and implementation of a high spatial resolution remote sensing ...
Therefore, the object-oriented image analysis for extraction of information from remote sensing ... Data Science Journal, Volume 6, Supplement, 4 August 2007.