FIR Filter
FPGA
Implementation
Results
Conclusion and Future Work
D ESIGN S PACE E XPLORATION OF T IME -M ULTIPLEXED FIR F ILTERS ON FPGA S Syed Asad Alam
Linköping University Department of Electrical Engineering Division of Electronic Systems
Master’s Thesis, Spring 2010
FIR Filter
FPGA
Implementation
Outline 1
FIR F ILTER Definition and Properties Structures Time-Multiplexed FIR Filters
2
FPGA Definition Architecture Xilinx Virtex-5
3
I MPLEMENTATION Parallel Architecture Semi-Parallel, Pipelined Architecture Design Flow
4
R ESULTS
5
C ONCLUSION
AND
F UTURE WORK
Results
Conclusion and Future Work
FIR Filter
FPGA
Implementation
Outline 1
FIR F ILTER Definition and Properties Structures Time-Multiplexed FIR Filters
2
FPGA Definition Architecture Xilinx Virtex-5
3
I MPLEMENTATION Parallel Architecture Semi-Parallel, Pipelined Architecture Design Flow
4
R ESULTS
5
C ONCLUSION
AND
F UTURE WORK
Results
Conclusion and Future Work
FIR Filter
FPGA
Implementation
Results
Conclusion and Future Work
Definition and Properties
Definition Impulse response is finite → settles to ’0’ in finite time Impulse response lasts for N + 1 samples Difference Equation y(n) =
N X i=0
h(k)x(n − k); n = 0 to N
(1)
FIR Filter
FPGA
Implementation
Results
Conclusion and Future Work
Definition and Properties
Properties Poles → origin of Z-plane Zeros → on the unit circle or as pairs, mirrored in the unit circle Inherently Stable No Feedback Easily designed to be linear → Symmetric Coefficients Higher filter order required for a sharp transition band
FIR Filter
FPGA
Implementation
Structures
FIR Structures
Different types of Structures Two main structures Direct Form Linear-Phase Direct Form
Results
Conclusion and Future Work
FIR Filter
FPGA
Implementation
Results
Conclusion and Future Work
Structures
FIR Structures
A very simple structure
Different types of Structures Two main structures Direct Form Linear-Phase Direct Form
N+1 Multiplications, N Additions and N delay elements
FIR Filter
FPGA
Implementation
Results
Conclusion and Future Work
Structures
FIR Structures Utilizes Symmetry/Antisymmetry Multiplications reduced by half Same number of additions and delay elements Different types of Structures Two main structures Direct Form Linear-Phase Direct Form
FIR Filter
FPGA
Implementation
Results
Conclusion and Future Work
Time-Multiplexed FIR Filters
Time-Multiplexed FIR Filters
Time-multiplexing → Input data rate slower than clock rate Time-multiplexing → Not enough to use one multiplier/multiply-accumulate Benefit → Reduced cost of multipliers/multiply-accumulate Benefit → Further reduced with symmetry Drawback → Increased cost of memory, but not enough
FIR Filter
FPGA
Implementation
Results
Conclusion and Future Work
Time-Multiplexed FIR Filters
Time-Multiplexing → Effect on Number of MACs
Number of MACs when Symmetry is utilized is given by Number of MACs = N / (M ∗ 2); when N is even Number of MACs = (N + 1) / (M ∗ 2); when N is odd (2) If filter is non-symmetric, number of MACs is given by Number of MACs = N / M; when N is even Number of MACs = (N + 1) / M; when N is odd
(3)
FIR Filter
FPGA
Implementation
Outline 1
FIR F ILTER Definition and Properties Structures Time-Multiplexed FIR Filters
2
FPGA Definition Architecture Xilinx Virtex-5
3
I MPLEMENTATION Parallel Architecture Semi-Parallel, Pipelined Architecture Design Flow
4
R ESULTS
5
C ONCLUSION
AND
F UTURE WORK
Results
Conclusion and Future Work
FIR Filter
FPGA
Implementation
Definition
What is it? An Integrated Circuit Programmed after manufacturing Field Programmable Gate-Arrays Types of FPGAs Anti-Fuse Flash SRAM
Results
Conclusion and Future Work
FIR Filter
FPGA
Implementation
Architecture
Main Components Configurable Logic Block Interconnection Network I/O Blocks Dedicated Blocks
Results
Conclusion and Future Work
FIR Filter
FPGA
Implementation
Architecture
Configurable Logic Block
Basic building block Composed of Look-Up Tables (LUTs) Also contains a multiplexer and a flip-flop Provides both registered and un-registered outputs LUTs implement truth tables Advance CLBs also contain carry chains for adders Also be used to implement memories → Distributed Memory
Results
Conclusion and Future Work
FIR Filter
FPGA
Implementation
Architecture
Configurable Logic Block
Basic building block Composed of Look-Up Tables (LUTs) Also contains a multiplexer and a flip-flop Provides both registered and un-registered outputs LUTs implement truth tables Advance CLBs also contain carry chains for adders Also be used to implement memories → Distributed Memory
Results
Conclusion and Future Work
FIR Filter
FPGA
Implementation
Results
Architecture
Dedicated Blocks
Multipliers Memory Blocks Transceivers Clock Managers PCI Express Blocks Ethernet MACs (Media Access Control) Clock Managers (DCM), PLLs Soft and Hard CPUs
Conclusion and Future Work
FIR Filter
FPGA
Implementation
Results
Conclusion and Future Work
Xilinx Virtex-5
Introduction
Xilinx → First FPGA company and one of the largest A number of platforms → Spartan, Virtex I, II, 4, 5 & 6 Virtex 5 → Focus of this Thesis Introduced in 2006 A number of sub-families Selected Sub-family : “LXT” Adequate dedicated multipliers and memory blocks Filters implemented on → LX20T and LX50T
FIR Filter
FPGA
Implementation
Results
Conclusion and Future Work
Xilinx Virtex-5
Introduction
Xilinx → First FPGA company and one of the largest A number of platforms → Spartan, Virtex I, II, 4, 5 & 6 Virtex 5 → Focus of this Thesis Introduced in 2006 A number of sub-families Selected Sub-family : “LXT” Adequate dedicated multipliers and memory blocks Filters implemented on → LX20T and LX50T
FIR Filter
FPGA
Implementation
Xilinx Virtex-5
Virtex-5 Features
550 MHz Clocking Technology 6 Input Look-up Table Two CLBs in one ’Slice’ 36 Kb Block RAM DSP48E Slice
Results
Conclusion and Future Work
FIR Filter
FPGA
Implementation
Results
Conclusion and Future Work
Xilinx Virtex-5
Virtex-5 Block RAM
Arranged as Columns Each Block RAM → 36K bits Utilized as either ’1’ 36Kb or ’2’ independent 18Kb RAMs Memories can be cascaded → Deeper and Wider Memories Different Types → Single, Dual Port RAMs, ROMs, FIFOs Synchronous Read/Write Operation
FIR Filter
FPGA
Implementation
Results
Conclusion and Future Work
Xilinx Virtex-5
Virtex-5 DSP48E Slice Implements special functions → No use of general FPGA fabric Specially suits signal processing applications/algorithms Arranged in Columns, as tiles ’2’ DSP48E slices in each tiles Special Functions supported Multiply Multiply-Add Multiply-Accumulate Three input Adder Barrel Shifter
FIR Filter
FPGA
Xilinx Virtex-5
Virtex-5 DSP48E Slice
Implementation
Results
Conclusion and Future Work
FIR Filter
FPGA
Implementation
Outline 1
FIR F ILTER Definition and Properties Structures Time-Multiplexed FIR Filters
2
FPGA Definition Architecture Xilinx Virtex-5
3
I MPLEMENTATION Parallel Architecture Semi-Parallel, Pipelined Architecture Design Flow
4
R ESULTS
5
C ONCLUSION
AND
F UTURE WORK
Results
Conclusion and Future Work
FIR Filter
FPGA
Implementation
Results
Required FPGA Resources?
General Logic Resources for Control Logic → CLBs Multipliers and MACs → DSP48E Slices Memories for storing data → Block RAMs and CLBs (Distributed Memory)
Conclusion and Future Work
FIR Filter
FPGA
Implementation
Results
Conclusion and Future Work
Design Specification
“2” Architectures explored Linear Phase, Direct Form → Parallel Architecture Non-linear Phase, Pipelined Direct Form → Semi-Parallel, Pipelined Architecture
Data Word Length → 18 & 24 bits Coefficient Word Length → 18 bits Un-scaled and scaled coefficients Time-Multiplexing Factor → Memory Folding Factor Matlab generated Design Files → Flexibility Infer Dedicated Blocks
FIR Filter
FPGA
Implementation
Results
Conclusion and Future Work
Parallel Architecture
Basic Idea Linear Phase-Direct Form, utilizes Symmetry Four major components Memory Data Memory → Stores Data & Implements the delay line Coefficient Memory → Stores Coefficients - only half of them
Pre-Adders Multiply-Accumulates Adder Tree
FIR Filter Parallel Architecture
Structure
FPGA
Implementation
Results
Conclusion and Future Work
FIR Filter
FPGA
Implementation
Parallel Architecture
Structure Order = Odd → Taps = Even
Order = Even → Taps = Odd
Results
Conclusion and Future Work
FIR Filter Parallel Architecture
Structure
FPGA
Implementation
Results
Conclusion and Future Work
FIR Filter Parallel Architecture
Structure
FPGA
Implementation
Results
Conclusion and Future Work
FIR Filter
FPGA
Implementation
Results
Conclusion and Future Work
Semi-Parallel, Pipelined Architecture
Basic Idea Direct Form, does not utilize Symmetry Even Order filters increased to Odd Order filters Increment → M − 1 M is the memory-folding factor
Three major components Memory Data Memory → Stores Data & Implements the delay line Coefficient Memory → Stores Coefficients - all of them
Multiply-Adds Accumulator
FIR Filter
FPGA
Semi-Parallel, Pipelined Architecture
Structure
Implementation
Results
Conclusion and Future Work
FIR Filter
FPGA
Semi-Parallel, Pipelined Architecture
Structure
Implementation
Results
Conclusion and Future Work
FIR Filter
FPGA
Semi-Parallel, Pipelined Architecture
Structure
Implementation
Results
Conclusion and Future Work
FIR Filter
FPGA
Semi-Parallel, Pipelined Architecture
Structure
Implementation
Results
Conclusion and Future Work
FIR Filter
FPGA
Semi-Parallel, Pipelined Architecture
Structure
Implementation
Results
Conclusion and Future Work
FIR Filter
FPGA
Implementation
Results
Conclusion and Future Work
Semi-Parallel, Pipelined Architecture
Xilinx FIR Core Generator - FIR Compiler
A complete, time-multiplexed FIR Generator Generates two files .vhd file for simulation .ngc file, a Synthesized netlist
Requires a number of inputs, important ones are highlighted A coefficient file (.coe) Input Sampling Frequency and Clock Frequency Coefficient Structure (Symmetric/Non Symmetric) Data and coefficient word length Data and Coefficient memory type
Inputs selected to match the implemented design
FIR Filter Design Flow
Design Flow
FPGA
Implementation
Results
Conclusion and Future Work
FIR Filter
FPGA
Implementation
Results
Conclusion and Future Work
Design Flow
Design Flow Step1 → MATLAB Script The interface to the user Generates all the VHDL Code Takes the following inputs from the user Filter Order Time-Multiplexing Factor Data and Coefficient Word Length Testing Methodology → Random/Exhaustive Number of Test Vectors (for Random Testing)
All scripts check the validity of inputs One main script combined all scripts to generate whole design
FIR Filter
FPGA
Implementation
Outline 1
FIR F ILTER Definition and Properties Structures Time-Multiplexed FIR Filters
2
FPGA Definition Architecture Xilinx Virtex-5
3
I MPLEMENTATION Parallel Architecture Semi-Parallel, Pipelined Architecture Design Flow
4
R ESULTS
5
C ONCLUSION
AND
F UTURE WORK
Results
Conclusion and Future Work
FIR Filter
FPGA
Implementation
Results
Conclusion and Future Work
Main Components
Facts about Xilinx Resources Comparisons Difference between ’2’ Architectures Effect of scaling, data word length(18 & 24), memory-folding factor Difference between Implemented Architectures and Xilinx FIR Core
Dimensions of comparison Resource Utilization of the FPGA Clock Frequency Power Dissipation
FIR Filter
FPGA
Implementation
Results
Conclusion and Future Work
Xilinx Facts
DSP Elements Supports ’Add’ operation in DSP but inference has to be forced Supports ’MAC’, infers it automatically Doesn’t support asynchronous reset, so all registers with asynchronous reset were implemented in logic Supports synchronous reset and all registers were absorbed in the DSP block.
Block RAMs Reading and Writing the same address from different ports produces uncertain behaviour. This is in contrast to what Xilinx claims, where old data is always read out.
FIR Filter
FPGA
Implementation
Results
Conclusion and Future Work
Comparison-Parallel vs Semi-Parallel
350
Parallel Semi−Parallel
300
Slice Count DSP48E Slice Count Clock Frequency Power Dissipation
250
Logic Slice Count
BRAM Count
200
150
100
50 15,4
16,4
19,10 20,10
23,4
24,4
59,10 60,10 99,10 100,10 108,6 127,4 128,4
Filter Order, Memory Folding Factor
FIR Filter
FPGA
Implementation
Results
Conclusion and Future Work
Comparison-Parallel vs Semi-Parallel
Slice Count BRAM Count DSP48E Slice Count Clock Frequency Power Dissipation
Odd Filter Order → Same Even Filter Order → Semi-parallel required more ’1’ 18-k BRAM for 18-bit data path ’2’ 18-k BRAM for 24-bit data path
FIR Filter
FPGA
Implementation
Results
Conclusion and Future Work
Comparison-Parallel vs Semi-Parallel
Semi-Parallel occupies twice more 40
Parallel Semi−Parallel
35
Slice Count DSP48E Slice Count Clock Frequency
30
DSP48E Slice Count
BRAM Count
25
20
15
10
Power Dissipation 5
15,4
16,4
19,10 20,10
23,4
24,4
59,10 60,10 99,10 100,10 108,6 127,4 128,4
Filter Order, Memory Folding Factor
FIR Filter
FPGA
Implementation
Results
Conclusion and Future Work
Comparison-Parallel vs Semi-Parallel
300
Parallel Semi−Parallel
280
260
Slice Count DSP48E Slice Count Clock Frequency
240
Clock Frequency
BRAM Count
220
200
180
160
140
Power Dissipation
120
100 15,4
16,4
19,10 20,10
23,4
24,4
59,10 60,10 99,10 100,10 108,6 127,4 128,4
Filter Order, Memory Folding Factor
FIR Filter
FPGA
Implementation
Results
Conclusion and Future Work
Comparison-Parallel vs Semi-Parallel
800
Parallel Semi−Parallel 700
Slice Count DSP48E Slice Count Clock Frequency Power Dissipation
Power Dissipation
600
BRAM Count
500
400
300
200 15,4
16,4
19,10 20,10
23,4
24,4
59,10 60,10 99,10 100,10 108,6 127,4 128,4
Filter Order and Memory Folding Factor
FIR Filter
FPGA
Implementation
Results
Conclusion and Future Work
Effect of Scaling
Slice Count
400
Non−Scaled Scaled
300 200 100 15,4
16,4
19,10 20,10
23,4
24,4
59,10 60,10 99,10 100,10 108,6 127,4 128,4
Semi-Parallel Architecture
Clock Frequency
Parallel Architecture
Power Dissipation
Filter Order, Memory Folding Factor 400
700
Non−Scaled Scaled
300 200 100
15,4
16,4
19,10 20,10
23,4
24,4
59,10 60,10 99,10 100,10 108,6 127,4 128,4
Filter Order, Memory Folding Factor Non−Scaled Scaled
600 500 400 300 200
15,4
16,4
19,10 20,10
23,4
24,4
59,10 60,10 99,10 100,10 108,6 127,4 128,4
Filter Order, Memory Folding Factor
FIR Filter
FPGA
Implementation
Results
Conclusion and Future Work
Effect of Scaling
Slice Count
300
Non−Scaled Scaled
250 200 150 100 50 15,4
16,4
19,10 20,10
23,4
24,4
59,10 60,10 99,10 100,10 108,6 127,4 128,4
Semi-Parallel Architecture
Non−Scaled Scaled
350 300 250 200 150
15,4
16,4
19,10 20,10
23,4
24,4
59,10 60,10 99,10 100,10 108,6 127,4 128,4
Filter Order, Memory Folding Factor Power Dissipation
Parallel Architecture
Clock Frequency
Filter Order, Memory Folding Factor 400
1000
Non−Scaled Scaled
800 600 400
15,4
16,4
19,10 20,10
23,4
24,4
59,10 60,10 99,10 100,10 108,6 127,4 128,4
Filter Order, Memory Folding Factor
FIR Filter
FPGA
Implementation
Effect of Data Word-Length
Two word-lengths → 18 and 24 bits Parallel Architecture Semi-Parallel Architecture
Results
Conclusion and Future Work
FIR Filter
FPGA
Implementation
Results
Conclusion and Future Work
Effect of Data Word-Length
BRAM Count Doubled 450
18−bit Datapath 24−bit Datapath
400
Two word-lengths → 18 and 24 bits
Slice Count
350 300 250 200 150 100 50 15,4
Parallel Architecture
19,10 20,10
23,4
24,4
59,10 60,10 99,10 100,10 108,6 127,4 128,4
Filter Order, Memory Folding Factor 100
BRAM Count
Semi-Parallel Architecture
16,4
18−bit Datapath 24−bit Datapath
80 60 40 20
15,4
16,4
19,10 20,10
23,4
24,4
59,10 60,10 99,10 100,10 108,6 127,4 128,4
Filter Order, Memory Folding Factor
FIR Filter
FPGA
Implementation
Results
Conclusion and Future Work
Effect of Data Word-Length
Two word-lengths → 18 and 24 bits
Clock Frequency
300
200
150
100
15,4
16,4
19,10 20,10
23,4
24,4
59,10 60,10 99,10 100,10 108,6 127,4 128,4
Filter Order, Memory Folding Factor
Parallel Architecture
1000
Power Dissipation
Semi-Parallel Architecture
18−bit Datapath 24−bit Datapath
250
18−bit Datapath 24−bit Datapath
800
600
400
200
15,4
16,4
19,10 20,10
23,4
24,4
59,10 60,10 99,10 100,10 108,6 127,4 128,4
Filter Order, Memory Folding Factor
FIR Filter
FPGA
Implementation
Results
Conclusion and Future Work
Effect of Data Word-Length
BRAM Count Doubled 300
18−bit Datapath 24−bit Datapath
Two word-lengths → 18 and 24 bits
Slice Count
250 200 150 100 50 15,4
Parallel Architecture
19,10 20,10
23,4
24,4
59,10 60,10 99,10 100,10 108,6 127,4 128,4
Filter Order, Memory Folding Factor 100
BRAM Count
Semi-Parallel Architecture
16,4
18−bit Datapath 24−bit Datapath
80 60 40 20
15,4
16,4
19,10 20,10
23,4
24,4
59,10 60,10 99,10 100,10 108,6 127,4 128,4
Filter Order, Memory Folding Factor
FIR Filter
FPGA
Implementation
Results
Conclusion and Future Work
Effect of Data Word-Length
Two word-lengths → 18 and 24 bits
Clock Frequency
300
200
150
100
15,4
16,4
19,10 20,10
23,4
24,4
59,10 60,10 99,10 100,10 108,6 127,4 128,4
Filter Order, Memory Folding Factor
Parallel Architecture
1000
Power Dissipation
Semi-Parallel Architecture
18−bit Datapath 24−bit Datapath 250
18−bit Datapath 24−bit Datapath
900 800 700 600 500 400
15,4
16,4
19,10 20,10
23,4
24,4
59,10 60,10 99,10 100,10 108,6 127,4 128,4
Filter Order, Memory Folding Factor
FIR Filter
FPGA
Implementation
Results
Conclusion and Future Work
Effect of Different Time-Multiplexing/Memory Folding Factor
BRAM Count
300 200 100
Semi-Parallel Architecture
100
30 20 10 0 20 40 60 80 Memory Folding Factor
Power Dissipation
Parallel Architecture
DSP48E Count
20 40 60 80 Memory Folding Factor
100
Clock Frequency
Slice Count
400
40 20 0
20 40 60 80 Memory Folding Factor
100
20 40 60 80 Memory Folding Factor
100
300 200 100
600 500 400
10
20
30
40 50 60 70 Memory Folding Factor
80
90
100
FIR Filter
FPGA
Implementation
Results
Conclusion and Future Work
Effect of Different Time-Multiplexing/Memory Folding Factor
BRAM Count
200 100
Semi-Parallel Architecture
100
40 20 0 20 40 60 80 Memory Folding Factor
Power Dissipation
Parallel Architecture
DSP48E Count
20 40 60 80 Memory Folding Factor
100
Clock Frequency
Slice Count
300
40 20 0
20 40 60 80 Memory Folding Factor
100
20 40 60 80 Memory Folding Factor
100
300 200 100
800 700 600 500
10
20
30
40 50 60 70 Memory Folding Factor
80
90
100
FIR Filter
FPGA
Implementation
Results
Conclusion and Future Work
Comparison-Parallel Design vs Symmetric FIR Core
Slice Count
800 Implemented Design FIR Core
600 400 200 15,4
16,4
19,10 20,10
23,4
24,4
59,10 60,10 99,10 100,10 108,6 127,4 128,4
24-bit Data Path
Implemented Design FIR Core
40 30 20 10 0 15,4
16,4
19,10 20,10
23,4
24,4
59,10 60,10 99,10 100,10 108,6 127,4 128,4
Filter Order, Memory Folding Factor DSP48E Count
18-bit Data Path
BRAM Count
Filter Order, Memory Folding Factor 50
30 Implemented Design FIR Core
20 10 0 15,4
16,4
19,10 20,10
23,4
24,4
59,10 60,10 99,10 100,10 108,6 127,4 128,4
Filter Order, Memory Folding Factor
FIR Filter
FPGA
Implementation
Results
Conclusion and Future Work
Comparison-Parallel Design vs Symmetric FIR Core
Clock Frequency
1000 Implemented Design FIR Core
800 600 400 200 15,4
16,4
1400
Power Dissipation
24-bit Data Path
19,10 20,10
23,4
24,4
59,10 60,10 99,10 100,10 108,6 127,4 128,4
Filter Order, Memory Folding Factor
18-bit Data Path
Implemented Design FIR Core
1200 1000 800 600 400 200
15,4
16,4
19,10 20,10
23,4
24,4
59,10 60,10 99,10 100,10 108,6 127,4 128,4
Filter Order, Memory Folding Factor
FIR Filter
FPGA
Implementation
Results
Conclusion and Future Work
Comparison-Parallel Design vs Symmetric FIR Core
Slice Count
800 Implemented Design FIR Core
600 400 200 15,4
16,4
19,10 20,10
23,4
24,4
59,10 60,10 99,10 100,10 108,6 127,4 128,4
24-bit Data Path
Implemented Design FIR Core 50
0
15,4
16,4
19,10 20,10
23,4
24,4
59,10 60,10 99,10 100,10 108,6 127,4 128,4
Filter Order, Memory Folding Factor DSP48E Count
18-bit Data Path
BRAM Count
Filter Order, Memory Folding Factor 100
30 Implemented Design FIR Core
20 10 0 15,4
16,4
19,10 20,10
23,4
24,4
59,10 60,10 99,10 100,10 108,6 127,4 128,4
Filter Order, Memory Folding Factor
FIR Filter
FPGA
Implementation
Results
Conclusion and Future Work
Comparison-Parallel Design vs Symmetric FIR Core
Clock Frequency
1000 Implemented Design FIR Core
800 600 400 200 15,4
16,4
1400
Power Dissipation
24-bit Data Path
19,10 20,10
23,4
24,4
59,10 60,10 99,10 100,10 108,6 127,4 128,4
Filter Order, Memory Folding Factor
18-bit Data Path
Implemented Design FIR Core
1200 1000 800 600 400 200
15,4
16,4
19,10 20,10
23,4
24,4
59,10 60,10 99,10 100,10 108,6 127,4 128,4
Filter Order, Memory Folding Factor
FIR Filter
FPGA
Implementation
Results
Conclusion and Future Work
Comparison-Semi-Parallel Design vs Non-Symmetric FIR Core
Slice Count
300 Implemented Design FIR Core
250 200 150 100 50 15,4
16,4
19,10 20,10
23,4
24,4
59,10 60,10 99,10 100,10 108,6 127,4 128,4
24-bit Data Path
Implemented Design FIR Core
40 30 20 10 0
15,4
16,4
19,10 20,10
23,4
24,4
59,10 60,10 99,10 100,10 108,6 127,4 128,4
Filter Order, Memory Folding Factor DSP48E Count
18-bit Data Path
BRAM Count
Filter Order, Memory Folding Factor 50
50 Implemented Design FIR Core
40 30 20 10 0
15,4
16,4
19,10 20,10
23,4
24,4
59,10 60,10 99,10 100,10 108,6 127,4 128,4
Filter Order, Memory Folding Factor
FIR Filter
FPGA
Implementation
Results
Conclusion and Future Work
Comparison-Semi-Parallel Design vs Non-Symmetric FIR Core
Clock Frequency
1000 Implemented Design FIR Core 800
600
400
200
15,4
16,4
1400
Power Dissipation
24-bit Data Path
19,10 20,10
23,4
24,4
59,10 60,10 99,10 100,10 108,6 127,4 128,4
Filter Order, Memory Folding Factor
18-bit Data Path
Implemented Design FIR Core
1200 1000 800 600 400 15,4
16,4
19,10 20,10
23,4
24,4
59,10 60,10 99,10 100,10 108,6 127,4 128,4
Filter Order, Memory Folding Factor
FIR Filter
FPGA
Implementation
Results
Conclusion and Future Work
Comparison-Semi-Parallel Design vs Non-Symmetric FIR Core
Slice Count
300 Implemented Design FIR Core
250 200 150 100 50 15,4
16,4
19,10 20,10
23,4
24,4
59,10 60,10 99,10 100,10 108,6 127,4 128,4
24-bit Data Path
Implemented Design FIR Core
60 40 20 0
15,4
16,4
19,10 20,10
23,4
24,4
59,10 60,10 99,10 100,10 108,6 127,4 128,4
Filter Order, Memory Folding Factor DSP48E Count
18-bit Data Path
BRAM Count
Filter Order, Memory Folding Factor 80
50 Implemented Design FIR Core
40 30 20 10 0
15,4
16,4
19,10 20,10
23,4
24,4
59,10 60,10 99,10 100,10 108,6 127,4 128,4
Filter Order, Memory Folding Factor
FIR Filter
FPGA
Implementation
Results
Conclusion and Future Work
Comparison-Semi-Parallel Design vs Non-Symmetric FIR Core
Clock Frequency
1000 Implemented Design FIR Core
800 600 400 200 15,4
16,4
1400
Power Dissipation
24-bit Data Path
19,10 20,10
23,4
24,4
59,10 60,10 99,10 100,10 108,6 127,4 128,4
Filter Order, Memory Folding Factor
18-bit Data Path
Implemented Design FIR Core
1200 1000 800 600 400 15,4
16,4
19,10 20,10
23,4
24,4
59,10 60,10 99,10 100,10 108,6 127,4 128,4
Filter Order, Memory Folding Factor
FIR Filter
FPGA
Implementation
Outline 1
FIR F ILTER Definition and Properties Structures Time-Multiplexed FIR Filters
2
FPGA Definition Architecture Xilinx Virtex-5
3
I MPLEMENTATION Parallel Architecture Semi-Parallel, Pipelined Architecture Design Flow
4
R ESULTS
5
C ONCLUSION
AND
F UTURE WORK
Results
Conclusion and Future Work
FIR Filter
FPGA
Implementation
Results
Conclusion and Future Work
Conclusion
FPGAs are excellent devices to implement signal processing algorithms They provide fast, dedicated multipliers and dedicated memory blocks Architectures Exploiting coefficient symmetric reduces nummber of multiplications, but suffer from lower frequency and higher logic slice count Based on the available FPGAs, architectures not exploiting coefficient symmetry can be a better choice Scaling the coefficients does not produce much benefit
FIR Filter
FPGA
Implementation
Results
Conclusion and Future Work
Conclusion. . . Time-multiplexing factor has a major influence on memory count, slice count, DSP count and clock frequency. Power dissipation, however, remains more or less same Xilinx provides its own FIR Core which can be generated easily using Coregen, a software provided by Xilinx These cores provide high clock frequency but suffer from very high power dissipation, specially when time-multiplexing factor is low They also suffer from high logic slice count when coefficient symmetry is used. However, when this is not used, logic slice count is lower DSP48E slice count is only ’1’ more in the FIR core which is not very significant.
FIR Filter
FPGA
Implementation
Results
Conclusion and Future Work
Future Work Single-Port Rams Write Port used only for ’1’ cycle If memory depth ’1’ less than time-multiplexing factor → Only ’1’ port needed Benefit → BRAM count reduced or replaced by Distributed Memory
Bit & Digit Serial Arithmetic Other Number Representations
FIR Filter
FPGA
Implementation
Future Work Single-Port Rams Bit & Digit Serial Arithmetic Eliminate BRAM and DSP48E
Other Number Representations
Results
Conclusion and Future Work
FIR Filter
FPGA
Implementation
Results
Conclusion and Future Work
Future Work Single-Port Rams Bit & Digit Serial Arithmetic Other Number Representations 2’s complement → High switching activity Redundant number systems (SDC, CSDC) & Residue number systems
FIR Filter
FPGA
Implementation
Results
Salutations
Thank You!!! Questions and Opposition!!!
Conclusion and Future Work
Appendix
Outline
6
A PPENDIX Additional material
Appendix Additional material
Design Flow, further details
Step2 & 3 → Design/Test Files & Simulation Simulation using ModelSim Design Files → RTL Design Testbench and Behaviroal Model → Matlab generated Model → A simple, direct form design Testbench → Compares output by DUT & Model Testbench → Stops on mismatch Testbench → Stops when all outputs matched
Appendix Additional material
Design Flow, further details
Step4 → Synthesis & Implementation Xilinx Tool → ISE Synthesis → RTL to Xilinx specific netlist (NGC) Translate → NGC to Logic Elements (NGD) Map → Logic Elements mapped to FPGA resources PAR → FPGA resources placed and routed
Appendix Additional material
Design Flow, further details
Step5 → Post PAR Simulation Verifies functionality of the placed and routed design Post PAR Simulation Model generated by ISE Post PAR Testbench generated by Matlab Required SIMPRIM library Generates VCD file → Switching Activity
Appendix Additional material
Design Flow, further details
Step6 → Power Estimation XPower → Xilinx Application Requires 3 or 4 files VCD File → Generated during Post PAR Simulation by ModelSim PCF File → Physicial constraints file generated by PAR NCD File → The final Circuit Description file by PAR XML File → Optional settings file
Appendix Additional material
Scaling
Appendix Additional material
Scaling
To prevent overflow at critical nodes Critical nodes are normally the inputs to multipliers Scaling multipliers introduced to reduced signal level Main two types of scaling Lp-Norms Utilizes signal range effectively However, there is always a probability of error
Safe Scaling Does not utilize signal range effectively However, ensures there is no overflow Sufficient for short length FIR Filters