Functional programming FPGA CλaSH Problem definition and breakdown CλaSH project overview
Numerical mathematics on FPGAs using CλaSH From Haskell to a hardware accelerator Martijn Bakker
Results Performance Demo
Computer Architecture for Embedded Systems (CAES) University of Twente
Discussion Conclusion
July 1, 2015
Additional stuff
1 / 47
Overview Functional programming
1
Functional programming
FPGA
2
FPGA
3
CλaSH
4
Problem definition and breakdown
5
CλaSH project overview
6
Results
7
Performance
8
Demo
9
Discussion
10
Conclusion
CλaSH Problem definition and breakdown CλaSH project overview Results Performance Demo Discussion Conclusion Additional stuff
2 / 47
Properties Functional programming FPGA CλaSH
• Main building block: functions
Problem definition and breakdown CλaSH project overview Results
• No assignments, only unchangeable definitions • No statements, only expressions • ‘Variables‘ that cannot vary
Performance Demo Discussion
• The execution of a program is a function evaluation
Conclusion Additional stuff
3 / 47
Resulting features Functional programming FPGA CλaSH
• No global, mutable state • No side effects
Problem definition and breakdown
• Pure functions
CλaSH project overview
• Lazy evaluation
Results Performance
• Higher-order functions • Strong type system
Demo
• Partial function application
Discussion
• Function composition
Conclusion Additional stuff
• Clear structure of the program, similar to mathematics
4 / 47
Example of functional programming Functional programming FPGA
Listing 1: A very short introduction to functional programming
CλaSH Problem definition and breakdown CλaSH project overview Results Performance Demo Discussion Conclusion
1 2 3 4 5 6 7 8 9 10 11 12 13
fib :: Integral a fib n | n == 0 | n == 1 | otherwise
=> a −> a =1 =1 = fib (n−2) + fib (n−1)
fib list :: Integral a => a −> [a] fib list n = map fib [0.. n] choose list :: Integral a => (a −> a) −> [a] choose list function = map function [0..] sum list :: Integral a => [a] −> a sum list list = foldl (+) 0 list
Additional stuff
5 / 47
The Field-Programmable Gate Array Functional programming
• ’Programmable hardware’ • Create your own circuit instead of a list of instructions
FPGA CλaSH Problem definition and breakdown CλaSH project overview Results Performance Demo Discussion Conclusion Additional stuff
Figure: FPGA structure 6 / 47
Why would you want to use an FPGA? Functional programming FPGA CλaSH Problem definition and breakdown CλaSH project overview Results
Advantages over CPUs • High throughput • Low guaranteed latency • Low power use • Support for new ”instructions”
Performance Demo Discussion Conclusion
Advantages over ASICs • Reconfigurability
Additional stuff
7 / 47
Drawbacks Functional programming
Listing 2: A single AND-gate specified in VHDL
FPGA CλaSH Problem definition and breakdown CλaSH project overview Results Performance Demo Discussion Conclusion Additional stuff
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
LIBRARY IEEE; USE IEEE.std logic 1164. all ; ENTITY andgate IS PORT ( : in std logic ; input 0 input 1 : in std logic ; output 0 : out std logic ); END ENTITY andgate; ARCHITECTURE a OF andgate IS BEGIN output 0 <= input 0 and input 1; END ARCHITECTURE a;
8 / 47
Drawbacks Functional programming FPGA CλaSH
1
FPGA development is hard • • • •
Problem definition and breakdown CλaSH project overview
Vendor-specific, closed-source tools Small communities Ancient HDLs Synthesis, debugging and verification is a slow process
Results Performance Demo
2
FPGAs cannot be reconfigured as quickly as CPUs
3
FPGAs run at lower clock speeds than ASICs
Discussion Conclusion Additional stuff
9 / 47
CλaSH - What? Functional programming FPGA CλaSH
• CAES Language for Synchronous Hardware
Problem definition and breakdown CλaSH project overview Results
• A library for the specification of hardware in Haskell. • Includes a compiler: generation of VHDL and Verilog • Written by Christiaan Baaij at CAES
Performance Demo Discussion
• www.clash-lang.org
Conclusion Additional stuff
10 / 47
CλaSH - Why? Functional programming FPGA CλaSH Problem definition and breakdown CλaSH project overview Results
Similarities between hardware and functional programming • Hardware consists of largely combinatorial parts: pure
functions • Combinatorial circuits contain a dependency tree: function
dependencies
Performance Demo
• Higher-order functions
Discussion Conclusion Additional stuff
11 / 47
CλaSH - Apparent issues? Functional programming FPGA CλaSH Problem definition and breakdown CλaSH project overview
• Hardware has a mutable state • Functional programming has no mutable states
Results Performance
Solution: the Mealy machine.
Demo Discussion Conclusion Additional stuff
12 / 47
CλaSH - Apparent issues? Functional programming FPGA CλaSH Problem definition and breakdown CλaSH project overview
• Hardware has a mutable state • Functional programming has no mutable states
Results Performance
Solution: the Mealy machine.
Demo Discussion Conclusion Additional stuff
12 / 47
The Mealy machine Functional programming FPGA CλaSH Problem definition and breakdown CλaSH project overview Results Performance Demo Discussion Conclusion Additional stuff
Figure: Block diagram describing a Mealy machine. The loop gets executed once every clock cycle. 13 / 47
Example Functional programming FPGA CλaSH Problem definition and breakdown CλaSH project overview Results Performance Demo Discussion Conclusion
Listing 3: A basic multiply-accumulate circuit specification in CλaSH 1 module MAC where 2 import CLaSH.Prelude 3 4 topEntity :: Signal (Signed 9, Signed 9) −> Signal (Signed 9) 5 topEntity = mealy mac 0 6 7 mac :: Num t => t −> (t, t) −> (t, t) 8 mac state input = (state ’, output) 9 where 10 (x,y) = input −− unpack the two inputs 11 state ’ = state + x∗y −− the new state 12 output = state’ −− output the new state
Additional stuff
14 / 47
Wrap-up of introduction Functional programming FPGA
Imperative languages: - C / C++ - Matlab
CλaSH Problem definition and breakdown CλaSH project overview
Consist of: - Assignments and statements i=i+i
compile
Functional languages: - Haskell (CλaSH) Consist of: - Function definitions fac 0 = 1 fac n = n * fac (n-1)
compile (CλaSH)
Results Performance Demo
Sequence of instructions
Discussion Conclusion
execute (CPU)
Hardware Description Language synthesis deploy (FPGA) run
Additional stuff
Result
Result
15 / 47
Definition Functional programming FPGA CλaSH Problem definition and breakdown CλaSH project overview
The feasibility and performance of performing numerical approximations to ODEs on an FPGA with hardware specified in CλaSH. 1
Feasibility: does the most simple version work?
2
Performance: comparison with a desktop CPU
3
Numerical approximations: simple integration schemes (Euler, RK2, RK4)
4
ODEs: 4 coupled first order differential equations with constant coefficients (4x4 matrix)
5
FPGA: a Terasic SoCKit development board, containing an Altera Cyclone V SoC FPGA
Results Performance Demo Discussion Conclusion Additional stuff
16 / 47
Breakdown - requirements for the system Functional programming FPGA
• Number representation
CλaSH
• Number storage
Problem definition and breakdown
• Control protocol
CλaSH project overview
• Supplying a suitable clock frequency
Results
1
Programming
Performance
2
Getting matrix constants and initial values in
3
Controlling and monitoring the state
4
Performing the actual computation
5
Getting results out of the FPGA (go to 3)
Demo Discussion Conclusion Additional stuff
17 / 47
Number representation Functional programming FPGA CλaSH
• Floating point or fixed point?
Problem definition and breakdown
• CλaSH only supports fixed point
CλaSH project overview
• Signed fixed point with 8 integer bits and 24 fractional bits
Results Performance
• Total width: 32 bits
Demo
• Integer range: [-128 .. 127]
Discussion
• ULP1 : 2−24 ≈ 6 × 10−8 , roughly 7 decimal digits.
Conclusion Additional stuff
1
Unit of least precision 18 / 47
Number representation Functional programming FPGA CλaSH
• Floating point or fixed point?
Problem definition and breakdown
• CλaSH only supports fixed point
CλaSH project overview
• Signed fixed point with 8 integer bits and 24 fractional bits
Results Performance
• Total width: 32 bits
Demo
• Integer range: [-128 .. 127]
Discussion
• ULP1 : 2−24 ≈ 6 × 10−8 , roughly 7 decimal digits.
Conclusion Additional stuff
1
Unit of least precision 18 / 47
Number representation Functional programming FPGA CλaSH
• Floating point or fixed point?
Problem definition and breakdown
• CλaSH only supports fixed point
CλaSH project overview
• Signed fixed point with 8 integer bits and 24 fractional bits
Results Performance
• Total width: 32 bits
Demo
• Integer range: [-128 .. 127]
Discussion
• ULP1 : 2−24 ≈ 6 × 10−8 , roughly 7 decimal digits.
Conclusion Additional stuff
1
Unit of least precision 18 / 47
Number storage Functional programming FPGA CλaSH Problem definition and breakdown CλaSH project overview Results Performance
• Registers2 , SRAM3 or SDRAM4 ? • All variables get updated every single cycle. • All data fits in the registers. • No need for SRAM or SDRAM.
Demo Discussion Conclusion Additional stuff
2
Direcly accessible on-chip: latches Static RAM, on-chip, single cycle access delays 4 Synchronous Dynamic RAM, off-chip, ’high’ latency 3
19 / 47
Number storage Functional programming FPGA CλaSH Problem definition and breakdown CλaSH project overview Results Performance
• Registers2 , SRAM3 or SDRAM4 ? • All variables get updated every single cycle. • All data fits in the registers. • No need for SRAM or SDRAM.
Demo Discussion Conclusion Additional stuff
2
Direcly accessible on-chip: latches Static RAM, on-chip, single cycle access delays 4 Synchronous Dynamic RAM, off-chip, ’high’ latency 3
19 / 47
Supplying a suitable clock frequency Functional programming FPGA
• A higher* clock frequency means higher throughput
CλaSH Problem definition and breakdown CλaSH project overview
• *up to a certain limit, otherwise your system fails. • Output stabilization of the combinatorial circuits takes
time.
Results Performance
• FPGA contains a crystal (50 MHz) and PLL5 circuits.
Demo Discussion
• CλaSH + IO + Altera PLL = non-deterministic behaviour.
Conclusion
• Solution: a simple integer frequency divider.
Additional stuff
5
Phase-Locked Loop 20 / 47
Supplying a suitable clock frequency Functional programming FPGA
• A higher* clock frequency means higher throughput
CλaSH Problem definition and breakdown CλaSH project overview
• *up to a certain limit, otherwise your system fails. • Output stabilization of the combinatorial circuits takes
time.
Results Performance
• FPGA contains a crystal (50 MHz) and PLL5 circuits.
Demo Discussion
• CλaSH + IO + Altera PLL = non-deterministic behaviour.
Conclusion
• Solution: a simple integer frequency divider.
Additional stuff
5
Phase-Locked Loop 20 / 47
Supplying a suitable clock frequency Functional programming FPGA
• A higher* clock frequency means higher throughput
CλaSH Problem definition and breakdown CλaSH project overview
• *up to a certain limit, otherwise your system fails. • Output stabilization of the combinatorial circuits takes
time.
Results Performance
• FPGA contains a crystal (50 MHz) and PLL5 circuits.
Demo Discussion
• CλaSH + IO + Altera PLL = non-deterministic behaviour.
Conclusion
• Solution: a simple integer frequency divider.
Additional stuff
5
Phase-Locked Loop 20 / 47
Actual computation - External Types Functional programming FPGA CλaSH Problem definition and breakdown CλaSH project overview Results Performance Demo
Listing 4: CλaSH topEntity for the ODE solver 1 topEntity :: Signal InputSignals −> Signal OutputSignals 2 topEntity = mealy solveODE initialState 3 4 solveODE state input = (state ’, output) 5 where 6 state = (systemState,systemConstants,oul, block) 7 state ’ = (systemState’,systemConstants’, oul ’, block ’)
Discussion Conclusion Additional stuff
21 / 47
Actual computation - Internal Types Functional programming FPGA CλaSH Problem definition and breakdown CλaSH project overview Results Performance Demo Discussion
Listing 5: Internal types for the ODE solver 1 2 3 4 5 6 7 8 9 10 11
type type type type
Data = SFixed 8 24 UInt = Unsigned 32 ValueVector = Vec 5 Data ConstantVector = Vec 20 Data
data ODEState = ODEState { valueVector :: ValueVector , time :: Data } deriving (Show) type Equation = (ODEState, ConstantVector) −> ValueVector type Scheme = SystemConstants −> Equation −> ODEState −> ODEState
Conclusion Additional stuff
22 / 47
Actual computation - Equation Functional programming FPGA
x0 =
c1 c2 x c3 c4
(1)
CλaSH Problem definition and breakdown CλaSH project overview Results Performance Demo Discussion Conclusion Additional stuff
Listing 6: Computing the derivative 1 matrix2d :: Equation 2 matrix2d (odestate , constants) = dxs 3 where 4 xs = valueVector odestate 5 6 c1 = constants !! 4 7 c2 = constants !! 5 8 c3 = constants !! 6 9 c4 = constants !! 7 10 11 x0 = c1 ∗ (xs !! 0) + c2 ∗ (xs !! 1) 12 x1 = c3 ∗ (xs !! 0) + c4 ∗ (xs !! 1) 13 14 dxs = fst $ shiftInAt0 xs (x0 :> x1 :> Nil ) 23 / 47
Actual computation - Scheme Functional programming FPGA CλaSH Problem definition and breakdown CλaSH project overview Results Performance Demo Discussion Conclusion Additional stuff
xk+1
( xk + hx0k = xk
if tk < tmax otherwise
(2)
Listing 7: Eulers method 1 euler :: Scheme 2 euler constants equation state = state’ 3 where 4 c user = userconstants constants 5 c maxtime = maxtime constants 6 h = timestep constants 7 t = time state 8 xs = valueVector state 9 10 −−Apply Euler’s integration scheme 11 eulerxs = zipWith (+) xs $ map (∗h) (equation (state , c user )) 12 (xs ’, t ’) = if t < c maxtime then (eulerxs, t + h) 13 else (xs , t) −− already at maximum time 14 15 state ’ = ODEState {valueVector = xs’, time = t’} 24 / 47
Actual computation - Control Functional programming FPGA CλaSH Problem definition and breakdown CλaSH project overview Results Performance Demo Discussion Conclusion Additional stuff
Listing 8: Controlling the ODE solver (important lines start with ’ !’) 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25
! scheme = euler ! equation = matrix4d (systemState ’, oul ’, block ’) −−Handle the setup (reset the state , insert input values , start the computation) | i c == 2 && i cs == 1 && i ca == 0 = ( initialSystemState, 0, 0) | i is == 1 = ( systemState{ odestate = s odestate in ’ }, 0, 1) | i c == 1 && i cs == 1 && i ca == 0 = ( systemState{ step = 0 } , 0, 0) −−Handle the computation and output: ! | block == 1 && i os == 1 = ( systemState, pack (xs !! i oa ), block) ! | block == 0 && s step < c maxstep = ( systemState up’ , 0, block) ! | block == 0 && s step >= c maxstep = ( systemState{ step = uIntMax}, pack uIntMax, 1) −−Default, do nothing | otherwise = ( systemState, oul , block) where s odestate in ’ = s odestate {valueVector = replace i ia (unpack i i :: Data) xs} !
s odestate up valueVector wt s odestate up ’ s step ’
= = = =
scheme systemConstants equation s odestate replace 4 (time s odestate up ) (valueVector s odestate up ) s odestate up {valueVector = valueVector wt } s step + 1
systemState up’ = systemState{ odestate = s odestate up ’, step = s step’} 25 / 47
An overview of the entire system Functional programming
Physical input (clock, reset, keys, switches)
FPGA
Physical output (LEDs)
avalon internal data transfer buses
CλaSH Problem definition and breakdown CλaSH project overview
Linux on ARM
Generated (QSys)
Manual VHDL
Manual VHDL
Generated VHDL by CλaSH
- Manage the avalon bridge - Receive ssh and scp connections
Convert avalon protocol into usable signals
Divide clock frequency
Update the LED status
memory_io
sockit_pll
Perform useful operations: - control the Avalon bridge - bidirectional control channel - output channel - input channel - read physical input - write physical output
Results Performance
sockit
Ethernet
HPS FPGA
io_led
compute_main
Demo Desktop
Discussion Conclusion Additional stuff
Control FPGA through HPS over ssh Host
26 / 47
Results Functional programming FPGA CλaSH Problem definition and breakdown CλaSH project overview Results
• Each solution plot contains 3 curves 1 FPGA-approximation 2 MATLAB-approximation (same functionality as FPGA) 3 Analytical or obtained from ode45(), the real solution • Each error plot contains at least 2 curves 1 FPGA - real 2 FPGA - MATLAB approximation
Performance Demo Discussion Conclusion
• MATLAB uses IEEE 754 double precision floating point • FPGA uses 8/24 signed fixed point.
Additional stuff
27 / 47
Results Functional programming FPGA CλaSH Problem definition and breakdown CλaSH project overview Results
• Each solution plot contains 3 curves 1 FPGA-approximation 2 MATLAB-approximation (same functionality as FPGA) 3 Analytical or obtained from ode45(), the real solution • Each error plot contains at least 2 curves 1 FPGA - real 2 FPGA - MATLAB approximation
Performance Demo Discussion Conclusion
• MATLAB uses IEEE 754 double precision floating point • FPGA uses 8/24 signed fixed point.
Additional stuff
27 / 47
Simple oscillation - I Functional programming
Solution (! = 1) 100
FPGA
Problem definition and breakdown CλaSH project overview
0
-50
-100 0
20
30
40
50
60
70
80
90
100
Error
40
Performance
FPGA - analytical FPGA - matlab euler Maximum error Maximum error curve
20
Error
Discussion
10
time (s)
Results
Demo
FPGA Matlab-Euler Analytical
50
Value
CλaSH
x(t) = 50 cos(t)
0
Conclusion
-20
Additional stuff
-40 0
10
20
30
40
50
60
70
80
90
100
time (s)
Figure: The final error is ≈ 30 (h = 0.01) 28 / 47
Simple oscillation - II Functional programming
Solution (! = 1) 60
20
Value
Problem definition and breakdown
0 -20 -40
CλaSH project overview
-60 0
10
20
30
40
50
60
70
80
90
100
time (s) 0.3
Performance
0.2
Demo
0.1
Error
Results
Discussion
FPGA Matlab-Euler Analytical
40
FPGA CλaSH
x(t) = 50 cos(t)
Error FPGA - analytical FPGA - matlab euler
0 -0.1
Conclusion -0.2
Additional stuff
-0.3 0
10
20
30
40
50
60
70
80
90
100
time (s)
Figure: The final error decreased by a factor 150 (h = 1×10−4 ) 29 / 47
Simple oscillation - III Functional programming
50
FPGA
48
CλaSH
46
Solution (! = 1)
Value
Problem definition and breakdown
44
40 0
0.1
0.2
0.3
0.4
0.5
0.6
time (s)
Results
Error
0
Performance
FPGA - analytical FPGA - matlab euler
-0.1
Error
Discussion
FPGA Matlab-Euler Analytical
42
CλaSH project overview
Demo
x(t) = 50 cos(t)
-0.2
Conclusion
-0.3
Additional stuff
-0.4 0
0.1
0.2
0.3
0.4
0.5
0.6
time (s)
Figure: Breakdown of the number representation (h = 1×10−7 ) 30 / 47
Switching equations Functional programming FPGA CλaSH Problem definition and breakdown
• Eulers method (first order) to RK2 (second order) • Simple oscillations to 4 coupled equations • Matrix chosen such that all eigenvalues negative
CλaSH project overview Results Performance Demo Discussion Conclusion
2 3 2 0 −5 −5 −3 1 x0 = 3 −1 −2 −3 x 4 2 2 −3
with
7 5 x(t0 ) = 7 5
(3)
Additional stuff
31 / 47
RK2 - I Functional programming FPGA
Problem definition and breakdown
0 0
Additional stuff
1
1.5
2
2.5
3
Error
0.3
Performance
Conclusion
0.5
time (s)
Results
FPGA - ODE45 FPGA - matlab RK2
0.2
Error
Discussion
10
5
CλaSH project overview
Demo
FPGA Matlab-RK2 ODE45
15
Value
CλaSH
Solution
20
0.1
0
-0.1 0
0.5
1
1.5
2
2.5
3
time (s)
Figure: A high value of h, maximum error ≈ 0.3 (h = 0.01) 32 / 47
RK2 - II Functional programming FPGA
Problem definition and breakdown
10
5
CλaSH project overview
0 0
0.5
1
1.5
2
2.5
3
3.5
time (s) 0.3
Performance
0.2
Demo
0.1
Error
Results
Discussion
FPGA Matlab-RK2 ODE45
15
Value
CλaSH
Solution
20
Error FPGA - ODE45 FPGA - matlab RK2
0 -0.1
Conclusion -0.2
Additional stuff
-0.3 0
0.5
1
1.5
2
2.5
3
3.5
time (s)
Figure: A very low value of h, maximum error ≈ 0.2 (h = 5×10−7 ) 33 / 47
RK4 Functional programming FPGA
Problem definition and breakdown
40
20
CλaSH project overview
0 0
Performance
60
Demo
40
Additional stuff
Error
80
Conclusion
0.5
1
1.5
2
2.5
3
time (s)
Results
Discussion
FPGA Matlab-RK2 ODE45
60
Value
CλaSH
Solution
80
Error FPGA - ODE45 FPGA - matlab RK2
20 0 -20 0
0.5
1
1.5
2
2.5
3
time (s)
Figure: For all tested values of h, (h = 1×10−5 ) 34 / 47
Euler revisited Functional programming FPGA
Problem definition and breakdown
0 0
0.5
1
1.5
2
2.5
3
time (s)
Results
1
Error
#10 -3
FPGA - ODE45 FPGA - matlab euler
0.5
Performance
0
Error
Discussion
10
5
CλaSH project overview
Demo
FPGA Matlab-Euler ODE45
15
Value
CλaSH
Solution
20
-0.5 -1
Conclusion -1.5
Additional stuff
-2 0
0.5
1
1.5
2
2.5
3
time (s)
Figure: Euler outperforming both RK2 and RK4, (h = 1×10−4 ) 35 / 47
Results Functional programming FPGA CλaSH Problem definition and breakdown CλaSH project overview Results Performance Demo Discussion Conclusion
Table: Eulers method, sorted by iterations per second. Device CPU - C++ - int FPGA FPGA FPGA FPGA FPGA FPGA CPU - C++ - float FPGA CPU - Haskell - float CPU - MATLAB - float
Iterations 1×108 1×108 1×108 1×108 1×108 1×107 1×106 5×107 1×105 1×106 1×107
time (s) 1,35 2,18 2,25 2,30 3,27 0,39 0,21 58,1 0,19 3,23 304
Output interrupts 1 1 1×103 1×104 1×105 1 1 1 1 1 1
Iterations per second (×106 ) 74,1 45,8 44,4 43,4 30,6 25,6 4,76 0,86 0,53 0,31 0,03
Additional stuff
36 / 47
Notes Functional programming FPGA CλaSH Problem definition and breakdown CλaSH project overview Results Performance Demo
• FPGA for ”low-power, cost-sensitive design needs” • No automatic derivation of the optimal clock frequency • 4x4 matrix equation only used ∼10% of the FPGA space • Power usage is ∼ 2 orders of magnitude lower than the
CPU • CPUs are heavily optimized for fast integer arithmetic
Discussion Conclusion Additional stuff
37 / 47
The entire process takes ∼10 minutes Functional programming FPGA CλaSH Problem definition and breakdown CλaSH project overview Results Performance Demo
1
CλaSH - Windows/Linux
2
Synthesis (Quartus) - Windows
3
Driver compilation - (arm-linux-gnueabihf-g++) - Linux
4
Run and extract results - Windows/Linux
5
Verification of results (MATLAB) - Windows
• Normal work flow - Use two machines
Discussion
• Linux for compiling and deploying the driver
Conclusion
• Windows for the other steps
Additional stuff
38 / 47
Discussion Functional programming FPGA CλaSH Problem definition and breakdown CλaSH project overview Results Performance Demo Discussion Conclusion Additional stuff
• Accuracy • Equality between the FPGA and MATLAB implementations • Limitations of the number representation and CλaSH • Functionality • 4x4 matrix with constant coefficients • Variety of basic solver schemes • Limitations of the number representation and CλaSH • Performance • Sub-par development FPGA is close to a modern CPU core • Trade-off between power usage and performance
39 / 47
What has been created? Functional programming FPGA CλaSH Problem definition and breakdown CλaSH project overview Results Performance
• Single-command toolchain integration for deploying
CλaSH on an FPGA • IO system of sufficiently high performance • Configuration of the Avalon bridges • C++ library for easy FPGA accessibility
Demo Discussion Conclusion Additional stuff
40 / 47
What has been shown? Functional programming FPGA CλaSH Problem definition and breakdown CλaSH project overview Results Performance
• Hardware acceleration of numerical solvers is feasible • Number representations are important • CλaSH is usable for projects with complex IO by use as a
module • FPGAs are fast: subpar FPGA close to a modern CPU core
Demo Discussion Conclusion Additional stuff
41 / 47
Acknowledgements Functional programming FPGA CλaSH Problem definition and breakdown CλaSH project overview Results Performance Demo Discussion
• Jan Kuper - Introducing me to functional programming,
CλaSH and giving feedback • Christiaan Baaij - Creating CλaSH and answering related
questions • Ruud van Damme & Jan Broenink - Feedback • Rinse Wester - Input on configuring and using the Avalon
bridges
Conclusion Additional stuff
42 / 47
Frequency too high? Random number generator. Functional programming FPGA CλaSH Problem definition and breakdown CλaSH project overview Results Performance Demo Discussion Conclusion Additional stuff
Figure: The main source of failure after overclocking.
43 / 47
Nondeterministic behaviour Functional programming FPGA CλaSH Problem definition and breakdown CλaSH project overview Results Performance Demo Discussion Conclusion Additional stuff
Figure: Only when using the Altera PLL
44 / 47
The core of the circuit: the matrix multiplier Functional programming FPGA CλaSH Problem definition and breakdown CλaSH project overview Results Performance Demo Discussion Conclusion Additional stuff
Figure: A 4x4 matrix vector multiplication consists of 4 ∗ (4 + 3) = 28 operations 45 / 47
Synthesized hardware Functional programming FPGA CλaSH Problem definition and breakdown CλaSH project overview Results Performance Demo Discussion Conclusion Additional stuff
Figure: An overview of the circuit, generated by Quartus 46 / 47