Numerical mathematics on FPGAs using CaSH - From ... - GitHub

Viewer
Transcript

Functional programming FPGA CλaSH Problem definition and breakdown CλaSH project overview

Numerical mathematics on FPGAs using CλaSH From Haskell to a hardware accelerator Martijn Bakker

Results Performance Demo

Computer Architecture for Embedded Systems (CAES) University of Twente

Discussion Conclusion

July 1, 2015

Additional stuff

1 / 47

Overview Functional programming

1

Functional programming

FPGA

2

FPGA

3

CλaSH

4

Problem definition and breakdown

5

CλaSH project overview

6

Results

7

Performance

8

Demo

9

Discussion

10

Conclusion

CλaSH Problem definition and breakdown CλaSH project overview Results Performance Demo Discussion Conclusion Additional stuff

2 / 47

Properties Functional programming FPGA CλaSH

• Main building block: functions

Problem definition and breakdown CλaSH project overview Results

• No assignments, only unchangeable definitions • No statements, only expressions • ‘Variables‘ that cannot vary

Performance Demo Discussion

• The execution of a program is a function evaluation

Conclusion Additional stuff

3 / 47

Resulting features Functional programming FPGA CλaSH

• No global, mutable state • No side effects

Problem definition and breakdown

• Pure functions

CλaSH project overview

• Lazy evaluation

Results Performance

• Higher-order functions • Strong type system

Demo

• Partial function application

Discussion

• Function composition

Conclusion Additional stuff

• Clear structure of the program, similar to mathematics

4 / 47

Example of functional programming Functional programming FPGA

Listing 1: A very short introduction to functional programming

CλaSH Problem definition and breakdown CλaSH project overview Results Performance Demo Discussion Conclusion

1 2 3 4 5 6 7 8 9 10 11 12 13

fib :: Integral a fib n | n == 0 | n == 1 | otherwise

=> a −> a =1 =1 = fib (n−2) + fib (n−1)

fib list :: Integral a => a −> [a] fib list n = map fib [0.. n] choose list :: Integral a => (a −> a) −> [a] choose list function = map function [0..] sum list :: Integral a => [a] −> a sum list list = foldl (+) 0 list

Additional stuff

5 / 47

The Field-Programmable Gate Array Functional programming

• ’Programmable hardware’ • Create your own circuit instead of a list of instructions

FPGA CλaSH Problem definition and breakdown CλaSH project overview Results Performance Demo Discussion Conclusion Additional stuff

Figure: FPGA structure 6 / 47

Why would you want to use an FPGA? Functional programming FPGA CλaSH Problem definition and breakdown CλaSH project overview Results

Advantages over CPUs • High throughput • Low guaranteed latency • Low power use • Support for new ”instructions”

Performance Demo Discussion Conclusion

Advantages over ASICs • Reconfigurability

Additional stuff

7 / 47

Drawbacks Functional programming

Listing 2: A single AND-gate specified in VHDL

FPGA CλaSH Problem definition and breakdown CλaSH project overview Results Performance Demo Discussion Conclusion Additional stuff

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15

LIBRARY IEEE; USE IEEE.std logic 1164. all ; ENTITY andgate IS PORT ( : in std logic ; input 0 input 1 : in std logic ; output 0 : out std logic ); END ENTITY andgate; ARCHITECTURE a OF andgate IS BEGIN output 0 <= input 0 and input 1; END ARCHITECTURE a;

8 / 47

Drawbacks Functional programming FPGA CλaSH

1

FPGA development is hard • • • •

Problem definition and breakdown CλaSH project overview

Vendor-specific, closed-source tools Small communities Ancient HDLs Synthesis, debugging and verification is a slow process

Results Performance Demo

2

FPGAs cannot be reconfigured as quickly as CPUs

3

FPGAs run at lower clock speeds than ASICs

Discussion Conclusion Additional stuff

9 / 47

CλaSH - What? Functional programming FPGA CλaSH

• CAES Language for Synchronous Hardware

Problem definition and breakdown CλaSH project overview Results

• A library for the specification of hardware in Haskell. • Includes a compiler: generation of VHDL and Verilog • Written by Christiaan Baaij at CAES

Performance Demo Discussion

• www.clash-lang.org

Conclusion Additional stuff

10 / 47

CλaSH - Why? Functional programming FPGA CλaSH Problem definition and breakdown CλaSH project overview Results

Similarities between hardware and functional programming • Hardware consists of largely combinatorial parts: pure

functions • Combinatorial circuits contain a dependency tree: function

dependencies

Performance Demo

• Higher-order functions

Discussion Conclusion Additional stuff

11 / 47

CλaSH - Apparent issues? Functional programming FPGA CλaSH Problem definition and breakdown CλaSH project overview

• Hardware has a mutable state • Functional programming has no mutable states

Results Performance

Solution: the Mealy machine.

Demo Discussion Conclusion Additional stuff

12 / 47

CλaSH - Apparent issues? Functional programming FPGA CλaSH Problem definition and breakdown CλaSH project overview

• Hardware has a mutable state • Functional programming has no mutable states

Results Performance

Solution: the Mealy machine.

Demo Discussion Conclusion Additional stuff

12 / 47

The Mealy machine Functional programming FPGA CλaSH Problem definition and breakdown CλaSH project overview Results Performance Demo Discussion Conclusion Additional stuff

Figure: Block diagram describing a Mealy machine. The loop gets executed once every clock cycle. 13 / 47

Example Functional programming FPGA CλaSH Problem definition and breakdown CλaSH project overview Results Performance Demo Discussion Conclusion

Listing 3: A basic multiply-accumulate circuit specification in CλaSH 1 module MAC where 2 import CLaSH.Prelude 3 4 topEntity :: Signal (Signed 9, Signed 9) −> Signal (Signed 9) 5 topEntity = mealy mac 0 6 7 mac :: Num t => t −> (t, t) −> (t, t) 8 mac state input = (state ’, output) 9 where 10 (x,y) = input −− unpack the two inputs 11 state ’ = state + x∗y −− the new state 12 output = state’ −− output the new state

Additional stuff

14 / 47

Wrap-up of introduction Functional programming FPGA

Imperative languages: - C / C++ - Matlab

CλaSH Problem definition and breakdown CλaSH project overview

Consist of: - Assignments and statements i=i+i

compile

Functional languages: - Haskell (CλaSH) Consist of: - Function definitions fac 0 = 1 fac n = n * fac (n-1)

compile (CλaSH)

Results Performance Demo

Sequence of instructions

Discussion Conclusion

execute (CPU)

Hardware Description Language synthesis deploy (FPGA) run

Additional stuff

Result

Result

15 / 47

Definition Functional programming FPGA CλaSH Problem definition and breakdown CλaSH project overview

The feasibility and performance of performing numerical approximations to ODEs on an FPGA with hardware specified in CλaSH. 1

Feasibility: does the most simple version work?

2

Performance: comparison with a desktop CPU

3

Numerical approximations: simple integration schemes (Euler, RK2, RK4)

4

ODEs: 4 coupled first order differential equations with constant coefficients (4x4 matrix)

5

FPGA: a Terasic SoCKit development board, containing an Altera Cyclone V SoC FPGA

Results Performance Demo Discussion Conclusion Additional stuff

16 / 47

Breakdown - requirements for the system Functional programming FPGA

• Number representation

CλaSH

• Number storage

Problem definition and breakdown

• Control protocol

CλaSH project overview

• Supplying a suitable clock frequency

Results

1

Programming

Performance

2

Getting matrix constants and initial values in

3

Controlling and monitoring the state

4

Performing the actual computation

5

Getting results out of the FPGA (go to 3)

Demo Discussion Conclusion Additional stuff

17 / 47

Number representation Functional programming FPGA CλaSH

• Floating point or fixed point?

Problem definition and breakdown

• CλaSH only supports fixed point

CλaSH project overview

• Signed fixed point with 8 integer bits and 24 fractional bits

Results Performance

• Total width: 32 bits

Demo

• Integer range: [-128 .. 127]

Discussion

• ULP1 : 2−24 ≈ 6 × 10−8 , roughly 7 decimal digits.

Conclusion Additional stuff

1

Unit of least precision 18 / 47

Number representation Functional programming FPGA CλaSH

• Floating point or fixed point?

Problem definition and breakdown

• CλaSH only supports fixed point

CλaSH project overview

• Signed fixed point with 8 integer bits and 24 fractional bits

Results Performance

• Total width: 32 bits

Demo

• Integer range: [-128 .. 127]

Discussion

• ULP1 : 2−24 ≈ 6 × 10−8 , roughly 7 decimal digits.

Conclusion Additional stuff

1

Unit of least precision 18 / 47

Number representation Functional programming FPGA CλaSH

• Floating point or fixed point?

Problem definition and breakdown

• CλaSH only supports fixed point

CλaSH project overview

• Signed fixed point with 8 integer bits and 24 fractional bits

Results Performance

• Total width: 32 bits

Demo

• Integer range: [-128 .. 127]

Discussion

• ULP1 : 2−24 ≈ 6 × 10−8 , roughly 7 decimal digits.

Conclusion Additional stuff

1

Unit of least precision 18 / 47

Number storage Functional programming FPGA CλaSH Problem definition and breakdown CλaSH project overview Results Performance

• Registers2 , SRAM3 or SDRAM4 ? • All variables get updated every single cycle. • All data fits in the registers. • No need for SRAM or SDRAM.

Demo Discussion Conclusion Additional stuff

2

Direcly accessible on-chip: latches Static RAM, on-chip, single cycle access delays 4 Synchronous Dynamic RAM, off-chip, ’high’ latency 3

19 / 47

Number storage Functional programming FPGA CλaSH Problem definition and breakdown CλaSH project overview Results Performance

• Registers2 , SRAM3 or SDRAM4 ? • All variables get updated every single cycle. • All data fits in the registers. • No need for SRAM or SDRAM.

Demo Discussion Conclusion Additional stuff

2

Direcly accessible on-chip: latches Static RAM, on-chip, single cycle access delays 4 Synchronous Dynamic RAM, off-chip, ’high’ latency 3

19 / 47

Supplying a suitable clock frequency Functional programming FPGA

• A higher* clock frequency means higher throughput

CλaSH Problem definition and breakdown CλaSH project overview

• *up to a certain limit, otherwise your system fails. • Output stabilization of the combinatorial circuits takes

time.

Results Performance

• FPGA contains a crystal (50 MHz) and PLL5 circuits.

Demo Discussion

• CλaSH + IO + Altera PLL = non-deterministic behaviour.

Conclusion

• Solution: a simple integer frequency divider.

Additional stuff

5

Phase-Locked Loop 20 / 47

Supplying a suitable clock frequency Functional programming FPGA

• A higher* clock frequency means higher throughput

CλaSH Problem definition and breakdown CλaSH project overview

• *up to a certain limit, otherwise your system fails. • Output stabilization of the combinatorial circuits takes

time.

Results Performance

• FPGA contains a crystal (50 MHz) and PLL5 circuits.

Demo Discussion

• CλaSH + IO + Altera PLL = non-deterministic behaviour.

Conclusion

• Solution: a simple integer frequency divider.

Additional stuff

5

Phase-Locked Loop 20 / 47

Supplying a suitable clock frequency Functional programming FPGA

• A higher* clock frequency means higher throughput

CλaSH Problem definition and breakdown CλaSH project overview

• *up to a certain limit, otherwise your system fails. • Output stabilization of the combinatorial circuits takes

time.

Results Performance

• FPGA contains a crystal (50 MHz) and PLL5 circuits.

Demo Discussion

• CλaSH + IO + Altera PLL = non-deterministic behaviour.

Conclusion

• Solution: a simple integer frequency divider.

Additional stuff

5

Phase-Locked Loop 20 / 47

Actual computation - External Types Functional programming FPGA CλaSH Problem definition and breakdown CλaSH project overview Results Performance Demo

Listing 4: CλaSH topEntity for the ODE solver 1 topEntity :: Signal InputSignals −> Signal OutputSignals 2 topEntity = mealy solveODE initialState 3 4 solveODE state input = (state ’, output) 5 where 6 state = (systemState,systemConstants,oul, block) 7 state ’ = (systemState’,systemConstants’, oul ’, block ’)

Discussion Conclusion Additional stuff

21 / 47

Actual computation - Internal Types Functional programming FPGA CλaSH Problem definition and breakdown CλaSH project overview Results Performance Demo Discussion

Listing 5: Internal types for the ODE solver 1 2 3 4 5 6 7 8 9 10 11

type type type type

Data = SFixed 8 24 UInt = Unsigned 32 ValueVector = Vec 5 Data ConstantVector = Vec 20 Data

data ODEState = ODEState { valueVector :: ValueVector , time :: Data } deriving (Show) type Equation = (ODEState, ConstantVector) −> ValueVector type Scheme = SystemConstants −> Equation −> ODEState −> ODEState

Conclusion Additional stuff

22 / 47

Actual computation - Equation Functional programming FPGA

x0 =

c1 c2 x c3 c4

(1)

CλaSH Problem definition and breakdown CλaSH project overview Results Performance Demo Discussion Conclusion Additional stuff

Listing 6: Computing the derivative 1 matrix2d :: Equation 2 matrix2d (odestate , constants) = dxs 3 where 4 xs = valueVector odestate 5 6 c1 = constants !! 4 7 c2 = constants !! 5 8 c3 = constants !! 6 9 c4 = constants !! 7 10 11 x0 = c1 ∗ (xs !! 0) + c2 ∗ (xs !! 1) 12 x1 = c3 ∗ (xs !! 0) + c4 ∗ (xs !! 1) 13 14 dxs = fst $ shiftInAt0 xs (x0 :> x1 :> Nil ) 23 / 47

Actual computation - Scheme Functional programming FPGA CλaSH Problem definition and breakdown CλaSH project overview Results Performance Demo Discussion Conclusion Additional stuff

xk+1

( xk + hx0k = xk

if tk < tmax otherwise

(2)

Listing 7: Eulers method 1 euler :: Scheme 2 euler constants equation state = state’ 3 where 4 c user = userconstants constants 5 c maxtime = maxtime constants 6 h = timestep constants 7 t = time state 8 xs = valueVector state 9 10 −−Apply Euler’s integration scheme 11 eulerxs = zipWith (+) xs $ map (∗h) (equation (state , c user )) 12 (xs ’, t ’) = if t < c maxtime then (eulerxs, t + h) 13 else (xs , t) −− already at maximum time 14 15 state ’ = ODEState {valueVector = xs’, time = t’} 24 / 47

Actual computation - Control Functional programming FPGA CλaSH Problem definition and breakdown CλaSH project overview Results Performance Demo Discussion Conclusion Additional stuff

Listing 8: Controlling the ODE solver (important lines start with ’ !’) 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

! scheme = euler ! equation = matrix4d (systemState ’, oul ’, block ’) −−Handle the setup (reset the state , insert input values , start the computation) | i c == 2 && i cs == 1 && i ca == 0 = ( initialSystemState, 0, 0) | i is == 1 = ( systemState{ odestate = s odestate in ’ }, 0, 1) | i c == 1 && i cs == 1 && i ca == 0 = ( systemState{ step = 0 } , 0, 0) −−Handle the computation and output: ! | block == 1 && i os == 1 = ( systemState, pack (xs !! i oa ), block) ! | block == 0 && s step < c maxstep = ( systemState up’ , 0, block) ! | block == 0 && s step >= c maxstep = ( systemState{ step = uIntMax}, pack uIntMax, 1) −−Default, do nothing | otherwise = ( systemState, oul , block) where s odestate in ’ = s odestate {valueVector = replace i ia (unpack i i :: Data) xs} !

s odestate up valueVector wt s odestate up ’ s step ’

= = = =

scheme systemConstants equation s odestate replace 4 (time s odestate up ) (valueVector s odestate up ) s odestate up {valueVector = valueVector wt } s step + 1

systemState up’ = systemState{ odestate = s odestate up ’, step = s step’} 25 / 47

An overview of the entire system Functional programming

Physical input (clock, reset, keys, switches)

FPGA

Physical output (LEDs)

avalon internal data transfer buses

CλaSH Problem definition and breakdown CλaSH project overview

Linux on ARM

Generated (QSys)

Manual VHDL

Manual VHDL

Generated VHDL by CλaSH

- Manage the avalon bridge - Receive ssh and scp connections

Convert avalon protocol into usable signals

Divide clock frequency

Update the LED status

memory_io

sockit_pll

Perform useful operations: - control the Avalon bridge - bidirectional control channel - output channel - input channel - read physical input - write physical output

Results Performance

sockit

Ethernet

HPS FPGA

io_led

compute_main

Demo Desktop

Discussion Conclusion Additional stuff

Control FPGA through HPS over ssh Host

26 / 47

Results Functional programming FPGA CλaSH Problem definition and breakdown CλaSH project overview Results

• Each solution plot contains 3 curves 1 FPGA-approximation 2 MATLAB-approximation (same functionality as FPGA) 3 Analytical or obtained from ode45(), the real solution • Each error plot contains at least 2 curves 1 FPGA - real 2 FPGA - MATLAB approximation

Performance Demo Discussion Conclusion

• MATLAB uses IEEE 754 double precision floating point • FPGA uses 8/24 signed fixed point.

Additional stuff

27 / 47

Results Functional programming FPGA CλaSH Problem definition and breakdown CλaSH project overview Results

• Each solution plot contains 3 curves 1 FPGA-approximation 2 MATLAB-approximation (same functionality as FPGA) 3 Analytical or obtained from ode45(), the real solution • Each error plot contains at least 2 curves 1 FPGA - real 2 FPGA - MATLAB approximation

Performance Demo Discussion Conclusion

• MATLAB uses IEEE 754 double precision floating point • FPGA uses 8/24 signed fixed point.

Additional stuff

27 / 47

Simple oscillation - I Functional programming

Solution (! = 1) 100

FPGA

Problem definition and breakdown CλaSH project overview

0

-50

-100 0

20

30

40

50

60

70

80

90

100

Error

40

Performance

FPGA - analytical FPGA - matlab euler Maximum error Maximum error curve

20

Error

Discussion

10

time (s)

Results

Demo

FPGA Matlab-Euler Analytical

50

Value

CλaSH

x(t) = 50 cos(t)

0

Conclusion

-20

Additional stuff

-40 0

10

20

30

40

50

60

70

80

90

100

time (s)

Figure: The final error is ≈ 30 (h = 0.01) 28 / 47

Simple oscillation - II Functional programming

Solution (! = 1) 60

20

Value

Problem definition and breakdown

0 -20 -40

CλaSH project overview

-60 0

10

20

30

40

50

60

70

80

90

100

time (s) 0.3

Performance

0.2

Demo

0.1

Error

Results

Discussion

FPGA Matlab-Euler Analytical

40

FPGA CλaSH

x(t) = 50 cos(t)

Error FPGA - analytical FPGA - matlab euler

0 -0.1

Conclusion -0.2

Additional stuff

-0.3 0

10

20

30

40

50

60

70

80

90

100

time (s)

Figure: The final error decreased by a factor 150 (h = 1×10−4 ) 29 / 47

Simple oscillation - III Functional programming

50

FPGA

48

CλaSH

46

Solution (! = 1)

Value

Problem definition and breakdown

44

40 0

0.1

0.2

0.3

0.4

0.5

0.6

time (s)

Results

Error

0

Performance

FPGA - analytical FPGA - matlab euler

-0.1

Error

Discussion

FPGA Matlab-Euler Analytical

42

CλaSH project overview

Demo

x(t) = 50 cos(t)

-0.2

Conclusion

-0.3

Additional stuff

-0.4 0

0.1

0.2

0.3

0.4

0.5

0.6

time (s)

Figure: Breakdown of the number representation (h = 1×10−7 ) 30 / 47

Switching equations Functional programming FPGA CλaSH Problem definition and breakdown

• Eulers method (first order) to RK2 (second order) • Simple oscillations to 4 coupled equations • Matrix chosen such that all eigenvalues negative

CλaSH project overview Results Performance Demo Discussion Conclusion



 2 3 2 0 −5 −5 −3 1   x0 =   3 −1 −2 −3 x 4 2 2 −3

with

  7 5  x(t0 ) =  7 5

(3)

Additional stuff

31 / 47

RK2 - I Functional programming FPGA

Problem definition and breakdown

0 0

Additional stuff

1

1.5

2

2.5

3

Error

0.3

Performance

Conclusion

0.5

time (s)

Results

FPGA - ODE45 FPGA - matlab RK2

0.2

Error

Discussion

10

5

CλaSH project overview

Demo

FPGA Matlab-RK2 ODE45

15

Value

CλaSH

Solution

20

0.1

0

-0.1 0

0.5

1

1.5

2

2.5

3

time (s)

Figure: A high value of h, maximum error ≈ 0.3 (h = 0.01) 32 / 47

RK2 - II Functional programming FPGA

Problem definition and breakdown

10

5

CλaSH project overview

0 0

0.5

1

1.5

2

2.5

3

3.5

time (s) 0.3

Performance

0.2

Demo

0.1

Error

Results

Discussion

FPGA Matlab-RK2 ODE45

15

Value

CλaSH

Solution

20

Error FPGA - ODE45 FPGA - matlab RK2

0 -0.1

Conclusion -0.2

Additional stuff

-0.3 0

0.5

1

1.5

2

2.5

3

3.5

time (s)

Figure: A very low value of h, maximum error ≈ 0.2 (h = 5×10−7 ) 33 / 47

RK4 Functional programming FPGA

Problem definition and breakdown

40

20

CλaSH project overview

0 0

Performance

60

Demo

40

Additional stuff

Error

80

Conclusion

0.5

1

1.5

2

2.5

3

time (s)

Results

Discussion

FPGA Matlab-RK2 ODE45

60

Value

CλaSH

Solution

80

Error FPGA - ODE45 FPGA - matlab RK2

20 0 -20 0

0.5

1

1.5

2

2.5

3

time (s)

Figure: For all tested values of h, (h = 1×10−5 ) 34 / 47

Euler revisited Functional programming FPGA

Problem definition and breakdown

0 0

0.5

1

1.5

2

2.5

3

time (s)

Results

1

Error

#10 -3

FPGA - ODE45 FPGA - matlab euler

0.5

Performance

0

Error

Discussion

10

5

CλaSH project overview

Demo

FPGA Matlab-Euler ODE45

15

Value

CλaSH

Solution

20

-0.5 -1

Conclusion -1.5

Additional stuff

-2 0

0.5

1

1.5

2

2.5

3

time (s)

Figure: Euler outperforming both RK2 and RK4, (h = 1×10−4 ) 35 / 47

Results Functional programming FPGA CλaSH Problem definition and breakdown CλaSH project overview Results Performance Demo Discussion Conclusion

Table: Eulers method, sorted by iterations per second. Device CPU - C++ - int FPGA FPGA FPGA FPGA FPGA FPGA CPU - C++ - float FPGA CPU - Haskell - float CPU - MATLAB - float

Iterations 1×108 1×108 1×108 1×108 1×108 1×107 1×106 5×107 1×105 1×106 1×107

time (s) 1,35 2,18 2,25 2,30 3,27 0,39 0,21 58,1 0,19 3,23 304

Output interrupts 1 1 1×103 1×104 1×105 1 1 1 1 1 1

Iterations per second (×106 ) 74,1 45,8 44,4 43,4 30,6 25,6 4,76 0,86 0,53 0,31 0,03

Additional stuff

36 / 47

Notes Functional programming FPGA CλaSH Problem definition and breakdown CλaSH project overview Results Performance Demo

• FPGA for ”low-power, cost-sensitive design needs” • No automatic derivation of the optimal clock frequency • 4x4 matrix equation only used ∼10% of the FPGA space • Power usage is ∼ 2 orders of magnitude lower than the

CPU • CPUs are heavily optimized for fast integer arithmetic

Discussion Conclusion Additional stuff

37 / 47

The entire process takes ∼10 minutes Functional programming FPGA CλaSH Problem definition and breakdown CλaSH project overview Results Performance Demo

1

CλaSH - Windows/Linux

2

Synthesis (Quartus) - Windows

3

Driver compilation - (arm-linux-gnueabihf-g++) - Linux

4

Run and extract results - Windows/Linux

5

Verification of results (MATLAB) - Windows

• Normal work flow - Use two machines

Discussion

• Linux for compiling and deploying the driver

Conclusion

• Windows for the other steps

Additional stuff

38 / 47

Discussion Functional programming FPGA CλaSH Problem definition and breakdown CλaSH project overview Results Performance Demo Discussion Conclusion Additional stuff

• Accuracy • Equality between the FPGA and MATLAB implementations • Limitations of the number representation and CλaSH • Functionality • 4x4 matrix with constant coefficients • Variety of basic solver schemes • Limitations of the number representation and CλaSH • Performance • Sub-par development FPGA is close to a modern CPU core • Trade-off between power usage and performance

39 / 47

What has been created? Functional programming FPGA CλaSH Problem definition and breakdown CλaSH project overview Results Performance

• Single-command toolchain integration for deploying

CλaSH on an FPGA • IO system of sufficiently high performance • Configuration of the Avalon bridges • C++ library for easy FPGA accessibility

Demo Discussion Conclusion Additional stuff

40 / 47

What has been shown? Functional programming FPGA CλaSH Problem definition and breakdown CλaSH project overview Results Performance

• Hardware acceleration of numerical solvers is feasible • Number representations are important • CλaSH is usable for projects with complex IO by use as a

module • FPGAs are fast: subpar FPGA close to a modern CPU core

Demo Discussion Conclusion Additional stuff

41 / 47

Acknowledgements Functional programming FPGA CλaSH Problem definition and breakdown CλaSH project overview Results Performance Demo Discussion

• Jan Kuper - Introducing me to functional programming,

CλaSH and giving feedback • Christiaan Baaij - Creating CλaSH and answering related

questions • Ruud van Damme & Jan Broenink - Feedback • Rinse Wester - Input on configuring and using the Avalon

bridges

Conclusion Additional stuff

42 / 47

Frequency too high? Random number generator. Functional programming FPGA CλaSH Problem definition and breakdown CλaSH project overview Results Performance Demo Discussion Conclusion Additional stuff

Figure: The main source of failure after overclocking.

43 / 47

Nondeterministic behaviour Functional programming FPGA CλaSH Problem definition and breakdown CλaSH project overview Results Performance Demo Discussion Conclusion Additional stuff

Figure: Only when using the Altera PLL

44 / 47

The core of the circuit: the matrix multiplier Functional programming FPGA CλaSH Problem definition and breakdown CλaSH project overview Results Performance Demo Discussion Conclusion Additional stuff

Figure: A 4x4 matrix vector multiplication consists of 4 ∗ (4 + 3) = 28 operations 45 / 47

Synthesized hardware Functional programming FPGA CλaSH Problem definition and breakdown CλaSH project overview Results Performance Demo Discussion Conclusion Additional stuff

Figure: An overview of the circuit, generated by Quartus 46 / 47

Numerical mathematics on FPGAs using CaSH - From ... - GitHub

Jul 1, 2015 - 6 data ODEState = ODEState { valueVector :: ValueVector. 7. , time :: .... 3 Analytical or obtained from ode45(), the real solution. â¢ Each error plot ...

Download PDF

5MB Sizes 55 Downloads 237 Views

Report

Numerical mathematics on FPGAs using CaSH - From ... - GitHub

Recommend Documents