FIR Filter

FPGA

Implementation

Results

Conclusion and Future Work

D ESIGN S PACE E XPLORATION OF T IME -M ULTIPLEXED FIR F ILTERS ON FPGA S Syed Asad Alam

Linköping University Department of Electrical Engineering Division of Electronic Systems

Master’s Thesis, Spring 2010

FIR Filter

FPGA

Implementation

Outline 1

FIR F ILTER Definition and Properties Structures Time-Multiplexed FIR Filters

2

FPGA Definition Architecture Xilinx Virtex-5

3

I MPLEMENTATION Parallel Architecture Semi-Parallel, Pipelined Architecture Design Flow

4

R ESULTS

5

C ONCLUSION

AND

F UTURE WORK

Results

Conclusion and Future Work

FIR Filter

FPGA

Implementation

Outline 1

FIR F ILTER Definition and Properties Structures Time-Multiplexed FIR Filters

2

FPGA Definition Architecture Xilinx Virtex-5

3

I MPLEMENTATION Parallel Architecture Semi-Parallel, Pipelined Architecture Design Flow

4

R ESULTS

5

C ONCLUSION

AND

F UTURE WORK

Results

Conclusion and Future Work

FIR Filter

FPGA

Implementation

Results

Conclusion and Future Work

Definition and Properties

Definition Impulse response is finite → settles to ’0’ in finite time Impulse response lasts for N + 1 samples Difference Equation y(n) =

N X i=0

h(k)x(n − k); n = 0 to N

(1)

FIR Filter

FPGA

Implementation

Results

Conclusion and Future Work

Definition and Properties

Properties Poles → origin of Z-plane Zeros → on the unit circle or as pairs, mirrored in the unit circle Inherently Stable No Feedback Easily designed to be linear → Symmetric Coefficients Higher filter order required for a sharp transition band

FIR Filter

FPGA

Implementation

Structures

FIR Structures

Different types of Structures Two main structures Direct Form Linear-Phase Direct Form

Results

Conclusion and Future Work

FIR Filter

FPGA

Implementation

Results

Conclusion and Future Work

Structures

FIR Structures

A very simple structure

Different types of Structures Two main structures Direct Form Linear-Phase Direct Form

N+1 Multiplications, N Additions and N delay elements

FIR Filter

FPGA

Implementation

Results

Conclusion and Future Work

Structures

FIR Structures Utilizes Symmetry/Antisymmetry Multiplications reduced by half Same number of additions and delay elements Different types of Structures Two main structures Direct Form Linear-Phase Direct Form

FIR Filter

FPGA

Implementation

Results

Conclusion and Future Work

Time-Multiplexed FIR Filters

Time-Multiplexed FIR Filters

Time-multiplexing → Input data rate slower than clock rate Time-multiplexing → Not enough to use one multiplier/multiply-accumulate Benefit → Reduced cost of multipliers/multiply-accumulate Benefit → Further reduced with symmetry Drawback → Increased cost of memory, but not enough

FIR Filter

FPGA

Implementation

Results

Conclusion and Future Work

Time-Multiplexed FIR Filters

Time-Multiplexing → Effect on Number of MACs

Number of MACs when Symmetry is utilized is given by Number of MACs = N / (M ∗ 2); when N is even Number of MACs = (N + 1) / (M ∗ 2); when N is odd (2) If filter is non-symmetric, number of MACs is given by Number of MACs = N / M; when N is even Number of MACs = (N + 1) / M; when N is odd

(3)

FIR Filter

FPGA

Implementation

Outline 1

FIR F ILTER Definition and Properties Structures Time-Multiplexed FIR Filters

2

FPGA Definition Architecture Xilinx Virtex-5

3

I MPLEMENTATION Parallel Architecture Semi-Parallel, Pipelined Architecture Design Flow

4

R ESULTS

5

C ONCLUSION

AND

F UTURE WORK

Results

Conclusion and Future Work

FIR Filter

FPGA

Implementation

Definition

What is it? An Integrated Circuit Programmed after manufacturing Field Programmable Gate-Arrays Types of FPGAs Anti-Fuse Flash SRAM

Results

Conclusion and Future Work

FIR Filter

FPGA

Implementation

Architecture

Main Components Configurable Logic Block Interconnection Network I/O Blocks Dedicated Blocks

Results

Conclusion and Future Work

FIR Filter

FPGA

Implementation

Architecture

Configurable Logic Block

Basic building block Composed of Look-Up Tables (LUTs) Also contains a multiplexer and a flip-flop Provides both registered and un-registered outputs LUTs implement truth tables Advance CLBs also contain carry chains for adders Also be used to implement memories → Distributed Memory

Results

Conclusion and Future Work

FIR Filter

FPGA

Implementation

Architecture

Configurable Logic Block

Basic building block Composed of Look-Up Tables (LUTs) Also contains a multiplexer and a flip-flop Provides both registered and un-registered outputs LUTs implement truth tables Advance CLBs also contain carry chains for adders Also be used to implement memories → Distributed Memory

Results

Conclusion and Future Work

FIR Filter

FPGA

Implementation

Results

Architecture

Dedicated Blocks

Multipliers Memory Blocks Transceivers Clock Managers PCI Express Blocks Ethernet MACs (Media Access Control) Clock Managers (DCM), PLLs Soft and Hard CPUs

Conclusion and Future Work

FIR Filter

FPGA

Implementation

Results

Conclusion and Future Work

Xilinx Virtex-5

Introduction

Xilinx → First FPGA company and one of the largest A number of platforms → Spartan, Virtex I, II, 4, 5 & 6 Virtex 5 → Focus of this Thesis Introduced in 2006 A number of sub-families Selected Sub-family : “LXT” Adequate dedicated multipliers and memory blocks Filters implemented on → LX20T and LX50T

FIR Filter

FPGA

Implementation

Results

Conclusion and Future Work

Xilinx Virtex-5

Introduction

Xilinx → First FPGA company and one of the largest A number of platforms → Spartan, Virtex I, II, 4, 5 & 6 Virtex 5 → Focus of this Thesis Introduced in 2006 A number of sub-families Selected Sub-family : “LXT” Adequate dedicated multipliers and memory blocks Filters implemented on → LX20T and LX50T

FIR Filter

FPGA

Implementation

Xilinx Virtex-5

Virtex-5 Features

550 MHz Clocking Technology 6 Input Look-up Table Two CLBs in one ’Slice’ 36 Kb Block RAM DSP48E Slice

Results

Conclusion and Future Work

FIR Filter

FPGA

Implementation

Results

Conclusion and Future Work

Xilinx Virtex-5

Virtex-5 Block RAM

Arranged as Columns Each Block RAM → 36K bits Utilized as either ’1’ 36Kb or ’2’ independent 18Kb RAMs Memories can be cascaded → Deeper and Wider Memories Different Types → Single, Dual Port RAMs, ROMs, FIFOs Synchronous Read/Write Operation

FIR Filter

FPGA

Implementation

Results

Conclusion and Future Work

Xilinx Virtex-5

Virtex-5 DSP48E Slice Implements special functions → No use of general FPGA fabric Specially suits signal processing applications/algorithms Arranged in Columns, as tiles ’2’ DSP48E slices in each tiles Special Functions supported Multiply Multiply-Add Multiply-Accumulate Three input Adder Barrel Shifter

FIR Filter

FPGA

Xilinx Virtex-5

Virtex-5 DSP48E Slice

Implementation

Results

Conclusion and Future Work

FIR Filter

FPGA

Implementation

Outline 1

FIR F ILTER Definition and Properties Structures Time-Multiplexed FIR Filters

2

FPGA Definition Architecture Xilinx Virtex-5

3

I MPLEMENTATION Parallel Architecture Semi-Parallel, Pipelined Architecture Design Flow

4

R ESULTS

5

C ONCLUSION

AND

F UTURE WORK

Results

Conclusion and Future Work

FIR Filter

FPGA

Implementation

Results

Required FPGA Resources?

General Logic Resources for Control Logic → CLBs Multipliers and MACs → DSP48E Slices Memories for storing data → Block RAMs and CLBs (Distributed Memory)

Conclusion and Future Work

FIR Filter

FPGA

Implementation

Results

Conclusion and Future Work

Design Specification

“2” Architectures explored Linear Phase, Direct Form → Parallel Architecture Non-linear Phase, Pipelined Direct Form → Semi-Parallel, Pipelined Architecture

Data Word Length → 18 & 24 bits Coefficient Word Length → 18 bits Un-scaled and scaled coefficients Time-Multiplexing Factor → Memory Folding Factor Matlab generated Design Files → Flexibility Infer Dedicated Blocks

FIR Filter

FPGA

Implementation

Results

Conclusion and Future Work

Parallel Architecture

Basic Idea Linear Phase-Direct Form, utilizes Symmetry Four major components Memory Data Memory → Stores Data & Implements the delay line Coefficient Memory → Stores Coefficients - only half of them

Pre-Adders Multiply-Accumulates Adder Tree

FIR Filter Parallel Architecture

Structure

FPGA

Implementation

Results

Conclusion and Future Work

FIR Filter

FPGA

Implementation

Parallel Architecture

Structure Order = Odd → Taps = Even

Order = Even → Taps = Odd

Results

Conclusion and Future Work

FIR Filter Parallel Architecture

Structure

FPGA

Implementation

Results

Conclusion and Future Work

FIR Filter Parallel Architecture

Structure

FPGA

Implementation

Results

Conclusion and Future Work

FIR Filter

FPGA

Implementation

Results

Conclusion and Future Work

Semi-Parallel, Pipelined Architecture

Basic Idea Direct Form, does not utilize Symmetry Even Order filters increased to Odd Order filters Increment → M − 1 M is the memory-folding factor

Three major components Memory Data Memory → Stores Data & Implements the delay line Coefficient Memory → Stores Coefficients - all of them

Multiply-Adds Accumulator

FIR Filter

FPGA

Semi-Parallel, Pipelined Architecture

Structure

Implementation

Results

Conclusion and Future Work

FIR Filter

FPGA

Semi-Parallel, Pipelined Architecture

Structure

Implementation

Results

Conclusion and Future Work

FIR Filter

FPGA

Semi-Parallel, Pipelined Architecture

Structure

Implementation

Results

Conclusion and Future Work

FIR Filter

FPGA

Semi-Parallel, Pipelined Architecture

Structure

Implementation

Results

Conclusion and Future Work

FIR Filter

FPGA

Semi-Parallel, Pipelined Architecture

Structure

Implementation

Results

Conclusion and Future Work

FIR Filter

FPGA

Implementation

Results

Conclusion and Future Work

Semi-Parallel, Pipelined Architecture

Xilinx FIR Core Generator - FIR Compiler

A complete, time-multiplexed FIR Generator Generates two files .vhd file for simulation .ngc file, a Synthesized netlist

Requires a number of inputs, important ones are highlighted A coefficient file (.coe) Input Sampling Frequency and Clock Frequency Coefficient Structure (Symmetric/Non Symmetric) Data and coefficient word length Data and Coefficient memory type

Inputs selected to match the implemented design

FIR Filter Design Flow

Design Flow

FPGA

Implementation

Results

Conclusion and Future Work

FIR Filter

FPGA

Implementation

Results

Conclusion and Future Work

Design Flow

Design Flow Step1 → MATLAB Script The interface to the user Generates all the VHDL Code Takes the following inputs from the user Filter Order Time-Multiplexing Factor Data and Coefficient Word Length Testing Methodology → Random/Exhaustive Number of Test Vectors (for Random Testing)

All scripts check the validity of inputs One main script combined all scripts to generate whole design

FIR Filter

FPGA

Implementation

Outline 1

FIR F ILTER Definition and Properties Structures Time-Multiplexed FIR Filters

2

FPGA Definition Architecture Xilinx Virtex-5

3

I MPLEMENTATION Parallel Architecture Semi-Parallel, Pipelined Architecture Design Flow

4

R ESULTS

5

C ONCLUSION

AND

F UTURE WORK

Results

Conclusion and Future Work

FIR Filter

FPGA

Implementation

Results

Conclusion and Future Work

Main Components

Facts about Xilinx Resources Comparisons Difference between ’2’ Architectures Effect of scaling, data word length(18 & 24), memory-folding factor Difference between Implemented Architectures and Xilinx FIR Core

Dimensions of comparison Resource Utilization of the FPGA Clock Frequency Power Dissipation

FIR Filter

FPGA

Implementation

Results

Conclusion and Future Work

Xilinx Facts

DSP Elements Supports ’Add’ operation in DSP but inference has to be forced Supports ’MAC’, infers it automatically Doesn’t support asynchronous reset, so all registers with asynchronous reset were implemented in logic Supports synchronous reset and all registers were absorbed in the DSP block.

Block RAMs Reading and Writing the same address from different ports produces uncertain behaviour. This is in contrast to what Xilinx claims, where old data is always read out.

FIR Filter

FPGA

Implementation

Results

Conclusion and Future Work

Comparison-Parallel vs Semi-Parallel

350

Parallel Semi−Parallel

300

Slice Count DSP48E Slice Count Clock Frequency Power Dissipation

250

Logic Slice Count

BRAM Count

200

150

100

50 15,4

16,4

19,10 20,10

23,4

24,4

59,10 60,10 99,10 100,10 108,6 127,4 128,4

Filter Order, Memory Folding Factor

FIR Filter

FPGA

Implementation

Results

Conclusion and Future Work

Comparison-Parallel vs Semi-Parallel

Slice Count BRAM Count DSP48E Slice Count Clock Frequency Power Dissipation

Odd Filter Order → Same Even Filter Order → Semi-parallel required more ’1’ 18-k BRAM for 18-bit data path ’2’ 18-k BRAM for 24-bit data path

FIR Filter

FPGA

Implementation

Results

Conclusion and Future Work

Comparison-Parallel vs Semi-Parallel

Semi-Parallel occupies twice more 40

Parallel Semi−Parallel

35

Slice Count DSP48E Slice Count Clock Frequency

30

DSP48E Slice Count

BRAM Count

25

20

15

10

Power Dissipation 5

15,4

16,4

19,10 20,10

23,4

24,4

59,10 60,10 99,10 100,10 108,6 127,4 128,4

Filter Order, Memory Folding Factor

FIR Filter

FPGA

Implementation

Results

Conclusion and Future Work

Comparison-Parallel vs Semi-Parallel

300

Parallel Semi−Parallel

280

260

Slice Count DSP48E Slice Count Clock Frequency

240

Clock Frequency

BRAM Count

220

200

180

160

140

Power Dissipation

120

100 15,4

16,4

19,10 20,10

23,4

24,4

59,10 60,10 99,10 100,10 108,6 127,4 128,4

Filter Order, Memory Folding Factor

FIR Filter

FPGA

Implementation

Results

Conclusion and Future Work

Comparison-Parallel vs Semi-Parallel

800

Parallel Semi−Parallel 700

Slice Count DSP48E Slice Count Clock Frequency Power Dissipation

Power Dissipation

600

BRAM Count

500

400

300

200 15,4

16,4

19,10 20,10

23,4

24,4

59,10 60,10 99,10 100,10 108,6 127,4 128,4

Filter Order and Memory Folding Factor

FIR Filter

FPGA

Implementation

Results

Conclusion and Future Work

Effect of Scaling

Slice Count

400

Non−Scaled Scaled

300 200 100 15,4

16,4

19,10 20,10

23,4

24,4

59,10 60,10 99,10 100,10 108,6 127,4 128,4

Semi-Parallel Architecture

Clock Frequency

Parallel Architecture

Power Dissipation

Filter Order, Memory Folding Factor 400

700

Non−Scaled Scaled

300 200 100

15,4

16,4

19,10 20,10

23,4

24,4

59,10 60,10 99,10 100,10 108,6 127,4 128,4

Filter Order, Memory Folding Factor Non−Scaled Scaled

600 500 400 300 200

15,4

16,4

19,10 20,10

23,4

24,4

59,10 60,10 99,10 100,10 108,6 127,4 128,4

Filter Order, Memory Folding Factor

FIR Filter

FPGA

Implementation

Results

Conclusion and Future Work

Effect of Scaling

Slice Count

300

Non−Scaled Scaled

250 200 150 100 50 15,4

16,4

19,10 20,10

23,4

24,4

59,10 60,10 99,10 100,10 108,6 127,4 128,4

Semi-Parallel Architecture

Non−Scaled Scaled

350 300 250 200 150

15,4

16,4

19,10 20,10

23,4

24,4

59,10 60,10 99,10 100,10 108,6 127,4 128,4

Filter Order, Memory Folding Factor Power Dissipation

Parallel Architecture

Clock Frequency

Filter Order, Memory Folding Factor 400

1000

Non−Scaled Scaled

800 600 400

15,4

16,4

19,10 20,10

23,4

24,4

59,10 60,10 99,10 100,10 108,6 127,4 128,4

Filter Order, Memory Folding Factor

FIR Filter

FPGA

Implementation

Effect of Data Word-Length

Two word-lengths → 18 and 24 bits Parallel Architecture Semi-Parallel Architecture

Results

Conclusion and Future Work

FIR Filter

FPGA

Implementation

Results

Conclusion and Future Work

Effect of Data Word-Length

BRAM Count Doubled 450

18−bit Datapath 24−bit Datapath

400

Two word-lengths → 18 and 24 bits

Slice Count

350 300 250 200 150 100 50 15,4

Parallel Architecture

19,10 20,10

23,4

24,4

59,10 60,10 99,10 100,10 108,6 127,4 128,4

Filter Order, Memory Folding Factor 100

BRAM Count

Semi-Parallel Architecture

16,4

18−bit Datapath 24−bit Datapath

80 60 40 20

15,4

16,4

19,10 20,10

23,4

24,4

59,10 60,10 99,10 100,10 108,6 127,4 128,4

Filter Order, Memory Folding Factor

FIR Filter

FPGA

Implementation

Results

Conclusion and Future Work

Effect of Data Word-Length

Two word-lengths → 18 and 24 bits

Clock Frequency

300

200

150

100

15,4

16,4

19,10 20,10

23,4

24,4

59,10 60,10 99,10 100,10 108,6 127,4 128,4

Filter Order, Memory Folding Factor

Parallel Architecture

1000

Power Dissipation

Semi-Parallel Architecture

18−bit Datapath 24−bit Datapath

250

18−bit Datapath 24−bit Datapath

800

600

400

200

15,4

16,4

19,10 20,10

23,4

24,4

59,10 60,10 99,10 100,10 108,6 127,4 128,4

Filter Order, Memory Folding Factor

FIR Filter

FPGA

Implementation

Results

Conclusion and Future Work

Effect of Data Word-Length

BRAM Count Doubled 300

18−bit Datapath 24−bit Datapath

Two word-lengths → 18 and 24 bits

Slice Count

250 200 150 100 50 15,4

Parallel Architecture

19,10 20,10

23,4

24,4

59,10 60,10 99,10 100,10 108,6 127,4 128,4

Filter Order, Memory Folding Factor 100

BRAM Count

Semi-Parallel Architecture

16,4

18−bit Datapath 24−bit Datapath

80 60 40 20

15,4

16,4

19,10 20,10

23,4

24,4

59,10 60,10 99,10 100,10 108,6 127,4 128,4

Filter Order, Memory Folding Factor

FIR Filter

FPGA

Implementation

Results

Conclusion and Future Work

Effect of Data Word-Length

Two word-lengths → 18 and 24 bits

Clock Frequency

300

200

150

100

15,4

16,4

19,10 20,10

23,4

24,4

59,10 60,10 99,10 100,10 108,6 127,4 128,4

Filter Order, Memory Folding Factor

Parallel Architecture

1000

Power Dissipation

Semi-Parallel Architecture

18−bit Datapath 24−bit Datapath 250

18−bit Datapath 24−bit Datapath

900 800 700 600 500 400

15,4

16,4

19,10 20,10

23,4

24,4

59,10 60,10 99,10 100,10 108,6 127,4 128,4

Filter Order, Memory Folding Factor

FIR Filter

FPGA

Implementation

Results

Conclusion and Future Work

Effect of Different Time-Multiplexing/Memory Folding Factor

BRAM Count

300 200 100

Semi-Parallel Architecture

100

30 20 10 0 20 40 60 80 Memory Folding Factor

Power Dissipation

Parallel Architecture

DSP48E Count

20 40 60 80 Memory Folding Factor

100

Clock Frequency

Slice Count

400

40 20 0

20 40 60 80 Memory Folding Factor

100

20 40 60 80 Memory Folding Factor

100

300 200 100

600 500 400

10

20

30

40 50 60 70 Memory Folding Factor

80

90

100

FIR Filter

FPGA

Implementation

Results

Conclusion and Future Work

Effect of Different Time-Multiplexing/Memory Folding Factor

BRAM Count

200 100

Semi-Parallel Architecture

100

40 20 0 20 40 60 80 Memory Folding Factor

Power Dissipation

Parallel Architecture

DSP48E Count

20 40 60 80 Memory Folding Factor

100

Clock Frequency

Slice Count

300

40 20 0

20 40 60 80 Memory Folding Factor

100

20 40 60 80 Memory Folding Factor

100

300 200 100

800 700 600 500

10

20

30

40 50 60 70 Memory Folding Factor

80

90

100

FIR Filter

FPGA

Implementation

Results

Conclusion and Future Work

Comparison-Parallel Design vs Symmetric FIR Core

Slice Count

800 Implemented Design FIR Core

600 400 200 15,4

16,4

19,10 20,10

23,4

24,4

59,10 60,10 99,10 100,10 108,6 127,4 128,4

24-bit Data Path

Implemented Design FIR Core

40 30 20 10 0 15,4

16,4

19,10 20,10

23,4

24,4

59,10 60,10 99,10 100,10 108,6 127,4 128,4

Filter Order, Memory Folding Factor DSP48E Count

18-bit Data Path

BRAM Count

Filter Order, Memory Folding Factor 50

30 Implemented Design FIR Core

20 10 0 15,4

16,4

19,10 20,10

23,4

24,4

59,10 60,10 99,10 100,10 108,6 127,4 128,4

Filter Order, Memory Folding Factor

FIR Filter

FPGA

Implementation

Results

Conclusion and Future Work

Comparison-Parallel Design vs Symmetric FIR Core

Clock Frequency

1000 Implemented Design FIR Core

800 600 400 200 15,4

16,4

1400

Power Dissipation

24-bit Data Path

19,10 20,10

23,4

24,4

59,10 60,10 99,10 100,10 108,6 127,4 128,4

Filter Order, Memory Folding Factor

18-bit Data Path

Implemented Design FIR Core

1200 1000 800 600 400 200

15,4

16,4

19,10 20,10

23,4

24,4

59,10 60,10 99,10 100,10 108,6 127,4 128,4

Filter Order, Memory Folding Factor

FIR Filter

FPGA

Implementation

Results

Conclusion and Future Work

Comparison-Parallel Design vs Symmetric FIR Core

Slice Count

800 Implemented Design FIR Core

600 400 200 15,4

16,4

19,10 20,10

23,4

24,4

59,10 60,10 99,10 100,10 108,6 127,4 128,4

24-bit Data Path

Implemented Design FIR Core 50

0

15,4

16,4

19,10 20,10

23,4

24,4

59,10 60,10 99,10 100,10 108,6 127,4 128,4

Filter Order, Memory Folding Factor DSP48E Count

18-bit Data Path

BRAM Count

Filter Order, Memory Folding Factor 100

30 Implemented Design FIR Core

20 10 0 15,4

16,4

19,10 20,10

23,4

24,4

59,10 60,10 99,10 100,10 108,6 127,4 128,4

Filter Order, Memory Folding Factor

FIR Filter

FPGA

Implementation

Results

Conclusion and Future Work

Comparison-Parallel Design vs Symmetric FIR Core

Clock Frequency

1000 Implemented Design FIR Core

800 600 400 200 15,4

16,4

1400

Power Dissipation

24-bit Data Path

19,10 20,10

23,4

24,4

59,10 60,10 99,10 100,10 108,6 127,4 128,4

Filter Order, Memory Folding Factor

18-bit Data Path

Implemented Design FIR Core

1200 1000 800 600 400 200

15,4

16,4

19,10 20,10

23,4

24,4

59,10 60,10 99,10 100,10 108,6 127,4 128,4

Filter Order, Memory Folding Factor

FIR Filter

FPGA

Implementation

Results

Conclusion and Future Work

Comparison-Semi-Parallel Design vs Non-Symmetric FIR Core

Slice Count

300 Implemented Design FIR Core

250 200 150 100 50 15,4

16,4

19,10 20,10

23,4

24,4

59,10 60,10 99,10 100,10 108,6 127,4 128,4

24-bit Data Path

Implemented Design FIR Core

40 30 20 10 0

15,4

16,4

19,10 20,10

23,4

24,4

59,10 60,10 99,10 100,10 108,6 127,4 128,4

Filter Order, Memory Folding Factor DSP48E Count

18-bit Data Path

BRAM Count

Filter Order, Memory Folding Factor 50

50 Implemented Design FIR Core

40 30 20 10 0

15,4

16,4

19,10 20,10

23,4

24,4

59,10 60,10 99,10 100,10 108,6 127,4 128,4

Filter Order, Memory Folding Factor

FIR Filter

FPGA

Implementation

Results

Conclusion and Future Work

Comparison-Semi-Parallel Design vs Non-Symmetric FIR Core

Clock Frequency

1000 Implemented Design FIR Core 800

600

400

200

15,4

16,4

1400

Power Dissipation

24-bit Data Path

19,10 20,10

23,4

24,4

59,10 60,10 99,10 100,10 108,6 127,4 128,4

Filter Order, Memory Folding Factor

18-bit Data Path

Implemented Design FIR Core

1200 1000 800 600 400 15,4

16,4

19,10 20,10

23,4

24,4

59,10 60,10 99,10 100,10 108,6 127,4 128,4

Filter Order, Memory Folding Factor

FIR Filter

FPGA

Implementation

Results

Conclusion and Future Work

Comparison-Semi-Parallel Design vs Non-Symmetric FIR Core

Slice Count

300 Implemented Design FIR Core

250 200 150 100 50 15,4

16,4

19,10 20,10

23,4

24,4

59,10 60,10 99,10 100,10 108,6 127,4 128,4

24-bit Data Path

Implemented Design FIR Core

60 40 20 0

15,4

16,4

19,10 20,10

23,4

24,4

59,10 60,10 99,10 100,10 108,6 127,4 128,4

Filter Order, Memory Folding Factor DSP48E Count

18-bit Data Path

BRAM Count

Filter Order, Memory Folding Factor 80

50 Implemented Design FIR Core

40 30 20 10 0

15,4

16,4

19,10 20,10

23,4

24,4

59,10 60,10 99,10 100,10 108,6 127,4 128,4

Filter Order, Memory Folding Factor

FIR Filter

FPGA

Implementation

Results

Conclusion and Future Work

Comparison-Semi-Parallel Design vs Non-Symmetric FIR Core

Clock Frequency

1000 Implemented Design FIR Core

800 600 400 200 15,4

16,4

1400

Power Dissipation

24-bit Data Path

19,10 20,10

23,4

24,4

59,10 60,10 99,10 100,10 108,6 127,4 128,4

Filter Order, Memory Folding Factor

18-bit Data Path

Implemented Design FIR Core

1200 1000 800 600 400 15,4

16,4

19,10 20,10

23,4

24,4

59,10 60,10 99,10 100,10 108,6 127,4 128,4

Filter Order, Memory Folding Factor

FIR Filter

FPGA

Implementation

Outline 1

FIR F ILTER Definition and Properties Structures Time-Multiplexed FIR Filters

2

FPGA Definition Architecture Xilinx Virtex-5

3

I MPLEMENTATION Parallel Architecture Semi-Parallel, Pipelined Architecture Design Flow

4

R ESULTS

5

C ONCLUSION

AND

F UTURE WORK

Results

Conclusion and Future Work

FIR Filter

FPGA

Implementation

Results

Conclusion and Future Work

Conclusion

FPGAs are excellent devices to implement signal processing algorithms They provide fast, dedicated multipliers and dedicated memory blocks Architectures Exploiting coefficient symmetric reduces nummber of multiplications, but suffer from lower frequency and higher logic slice count Based on the available FPGAs, architectures not exploiting coefficient symmetry can be a better choice Scaling the coefficients does not produce much benefit

FIR Filter

FPGA

Implementation

Results

Conclusion and Future Work

Conclusion. . . Time-multiplexing factor has a major influence on memory count, slice count, DSP count and clock frequency. Power dissipation, however, remains more or less same Xilinx provides its own FIR Core which can be generated easily using Coregen, a software provided by Xilinx These cores provide high clock frequency but suffer from very high power dissipation, specially when time-multiplexing factor is low They also suffer from high logic slice count when coefficient symmetry is used. However, when this is not used, logic slice count is lower DSP48E slice count is only ’1’ more in the FIR core which is not very significant.

FIR Filter

FPGA

Implementation

Results

Conclusion and Future Work

Future Work Single-Port Rams Write Port used only for ’1’ cycle If memory depth ’1’ less than time-multiplexing factor → Only ’1’ port needed Benefit → BRAM count reduced or replaced by Distributed Memory

Bit & Digit Serial Arithmetic Other Number Representations

FIR Filter

FPGA

Implementation

Future Work Single-Port Rams Bit & Digit Serial Arithmetic Eliminate BRAM and DSP48E

Other Number Representations

Results

Conclusion and Future Work

FIR Filter

FPGA

Implementation

Results

Conclusion and Future Work

Future Work Single-Port Rams Bit & Digit Serial Arithmetic Other Number Representations 2’s complement → High switching activity Redundant number systems (SDC, CSDC) & Residue number systems

FIR Filter

FPGA

Implementation

Results

Salutations

Thank You!!! Questions and Opposition!!!

Conclusion and Future Work

Appendix

Outline

6

A PPENDIX Additional material

Appendix Additional material

Design Flow, further details

Step2 & 3 → Design/Test Files & Simulation Simulation using ModelSim Design Files → RTL Design Testbench and Behaviroal Model → Matlab generated Model → A simple, direct form design Testbench → Compares output by DUT & Model Testbench → Stops on mismatch Testbench → Stops when all outputs matched

Appendix Additional material

Design Flow, further details

Step4 → Synthesis & Implementation Xilinx Tool → ISE Synthesis → RTL to Xilinx specific netlist (NGC) Translate → NGC to Logic Elements (NGD) Map → Logic Elements mapped to FPGA resources PAR → FPGA resources placed and routed

Appendix Additional material

Design Flow, further details

Step5 → Post PAR Simulation Verifies functionality of the placed and routed design Post PAR Simulation Model generated by ISE Post PAR Testbench generated by Matlab Required SIMPRIM library Generates VCD file → Switching Activity

Appendix Additional material

Design Flow, further details

Step6 → Power Estimation XPower → Xilinx Application Requires 3 or 4 files VCD File → Generated during Post PAR Simulation by ModelSim PCF File → Physicial constraints file generated by PAR NCD File → The final Circuit Description file by PAR XML File → Optional settings file

Appendix Additional material

Scaling

Appendix Additional material

Scaling

To prevent overflow at critical nodes Critical nodes are normally the inputs to multipliers Scaling multipliers introduced to reduced signal level Main two types of scaling Lp-Norms Utilizes signal range effectively However, there is always a probability of error

Safe Scaling Does not utilize signal range effectively However, ensures there is no overflow Sufficient for short length FIR Filters

Design Space Exploration of Time-Multiplexed FIR ...

Impulse response is finite → settles to '0' in finite time. Impulse response lasts .... Implements special functions → No use of general FPGA fabric. Specially suits ... Data Memory → Stores Data & Implements the delay line. Coefficient Memory ...

3MB Sizes 2 Downloads 204 Views

Recommend Documents

Experimental exploration of ultra-low power CMOS design space ...
approach for energy efficient high performance computing[ 1,2,3]. However, V, scaling is ultimately limited by increasing subthreshold leakage current.

Design Space Exploration for Multicore Architectures: A ...
to the amount of parallelism available in the software. They evalu- ate several ... namic thermal management techniques for CMPs and SMTs. Li et al. .... shared bus (57 GB/s), pipelined and clocked at half of the core clock (see Table 2).

Design Space Exploration for Multicore Architectures: A ...
Power efficiency is evaluated through the system energy, i.e. the energy needed to run ... Furthermore, in Section 7, we evaluate several alternative floorplans by ...

FIR Online.pdf
Sign in. Loading… Page 1. Whoops! There was a problem loading more pages. Retrying... FIR Online.pdf. FIR Online.pdf. Open. Extract. Open with. Sign In.

pdf-149\commercial-space-exploration-ethics-policy-and ...
... the apps below to open or edit this item. pdf-149\commercial-space-exploration-ethics-policy-and-g ... ies-ethics-and-international-affairs-by-jai-galliott.pdf.

Safe and Efficient Robotic Space Exploration with ... - Semantic Scholar
The authors are currently developing, under the support of NASA, a Robot ... supported by the Advanced Space Operations Technology. Program of NASA's ...

pdf-1320\the-vision-for-space-exploration-by-national-aeronautics ...
Try one of the apps below to open or edit this item. pdf-1320\the-vision-for-space-exploration-by-national-aeronautics-space-administration.pdf.

Safe and Efficient Robotic Space Exploration with Tele ...
which will allow a small number of human operators to safely and ... telesupervision of a fleet of mobile robots. This research is ... While we do not explicitly address issues of ...... [11] T.E. Rogers, J. Peng, and S. Zein-Sabatto. “Modeling.

Design Exploration of Hybrid CMOS and Memristor ...
Singapore (corresponding author to provide phone: +65-6790-4509; fax: +65-. 6793-3318; e-mail: ... Note that memristor is promising with wide applications in new circuit ... development of related circuit simulators, all the above applications are ..

FIR-ON-JAGAN.pdf
Connect more apps... Try one of the apps below to open or edit this item. FIR-ON-JAGAN.pdf. FIR-ON-JAGAN.pdf. Open. Extract. Open with. Sign In. Main menu.

One Method for Design of Wide-band FIR Filters using Frequency ...
Abstract—This paper presents a modification of the. Frequency Masking Method (FMM) proposed by Yong. Ching Lim for Wide-band FIR filters with less multipliers. This method is based on the masking method reducing the complexity, using IFIR filter in