Altera brings OpenCL to FPGAs December, 2012

© 2012 Altera Corporation CONFIDENTIAL

© 2012 Altera Corporation CONFIDENTIAL 2

© 2012 Altera Corporation CONFIDENTIAL 3

© 2012 Altera Corporation CONFIDENTIAL 4

© 2012 Altera Corporation CONFIDENTIAL 5

© 2012 Altera Corporation CONFIDENTIAL 6

© 2012 Altera Corporation CONFIDENTIAL 7

© 2012 Altera Corporation CONFIDENTIAL 8

© 2012 Altera Corporation CONFIDENTIAL 9

© 2012 Altera Corporation CONFIDENTIAL 10

© 2012 Altera Corporation CONFIDENTIAL 11

© 2012 Altera Corporation CONFIDENTIAL 12

© 2012 Altera Corporation CONFIDENTIAL 13

© 2012 Altera Corporation CONFIDENTIAL 14

OpenCL Overview 

OpenCL is a software programming model  Uses Standard C language (C99)  Uses OpenCL C extensions (adds parallelism to C)  Includes API (open standard for different devices)



Targets heterogeneous systems  Performance via hardware acceleration



Host CPU

The consortium (short list):  Apple, Altera, AMD, Broadcom, Khronos, Intel, ARM,

Ericsson, Texas Instruments, Samsung, IBM, Google, Fujitsu

© 2012 Altera Corporation CONFIDENTIAL 15

Hardware Acceleration

OpenCL Enables Portability C to gates programs are proprietary

Heterogeneous Multicore CPU

Multicore CPU

SoC FPGA

FPGA

FPGA

Source: RapidMind

© 2012 Altera Corporation CONFIDENTIAL 16

OpenCL Programming Model

Accelerator

Host

Processor Accelerator Accelerator Accelerator

Local Mem Local Mem Local LocalMem Mem

Host Program

Global Mem

main() { read_data( … ); maninpulate( … ); clEnqueueWriteBuffer( … ); clEnqueueNDRange(…,sum,…); clEnqueueReadBuffer( … ); display_result( … ); }

__kernel void sum(__global float *a, __global float *b, __global float *y) { int gid = get_global_id(0); y[gid] = a[gid] + b[gid]; }

Kernel Program OpenCL application is combination of Host & Kernel © 2012 Altera Corporation CONFIDENTIAL 17

© 2012 Altera Corporation CONFIDENTIAL 18

Mapping Multithreaded Kernels to FPGAs 

Simplest way of mapping kernel functions to FPGAs is to replicate hardware for each thread  Inefficient and wasteful



Technique: deep pipeline parallelism  Attempt to create a deeply pipelined representation of a kernel  On each clock cycle, we attempt to send in input data for a new

thread  Method of mapping coarse grained thread parallelism to finegrained FPGA parallelism

© 2012 Altera Corporation CONFIDENTIAL 19

Example Pipeline for Vector Add 8 threads for vector add example

0

Load

1

2

3

4

5

6

7

Load

Thread IDs +



Store



© 2012 Altera Corporation CONFIDENTIAL 20

On each cycle the portions of the pipeline are processing different threads While thread 2 is being loaded, thread 1 is being added, and thread 0 is being stored

Example Pipeline for Vector Add 8 threads for vector add example

1

2

3

4

5

6

7

0

Load

Load

Thread IDs +



Store



© 2012 Altera Corporation CONFIDENTIAL 21

On each cycle the portions of the pipeline are processing different threads While thread 2 is being loaded, thread 1 is being added, and thread 0 is being stored

Example Pipeline for Vector Add 8 threads for vector add example

2

3

4

5

6

7

1

Load

Load

Thread IDs

0 +



Store



© 2012 Altera Corporation CONFIDENTIAL 22

On each cycle the portions of the pipeline are processing different threads While thread 2 is being loaded, thread 1 is being added, and thread 0 is being stored

Example Pipeline for Vector Add 8 threads for vector add example

3

4

5

6

7

2

Load

Load

Thread IDs

1 + 0

Store

© 2012 Altera Corporation CONFIDENTIAL 23





On each cycle the portions of the pipeline are processing different threads While thread 2 is being loaded, thread 1 is being added, and thread 0 is being stored

Example Pipeline for Vector Add 8 threads for vector add example

4

5

6

7

3

Load

Load

Thread IDs

2 + 1

Store 0

© 2012 Altera Corporation CONFIDENTIAL 24





On each cycle the portions of the pipeline are processing different threads While thread 2 is being loaded, thread 1 is being added, and thread 0 is being stored

Example Pipeline for Vector Add  Load

Load Load +

Load Load

Load

+

Store

+

Store Store

© 2012 Altera Corporation CONFIDENTIAL 25

Replicate the kernel circuit multiple times to process multiple workgroups simultaneously

© 2012 Altera Corporation CONFIDENTIAL 26

© 2012 Altera Corporation CONFIDENTIAL 27

© 2012 Altera Corporation CONFIDENTIAL 28

© 2012 Altera Corporation CONFIDENTIAL 29

© 2012 Altera Corporation CONFIDENTIAL 30

© 2012 Altera Corporation CONFIDENTIAL 31

© 2012 Altera Corporation CONFIDENTIAL 32

© 2012 Altera Corporation CONFIDENTIAL 33

Case Studies

© 2012 Altera Corporation CONFIDENTIAL

Performance (Monte-Carlo Black Scholes) 12.0

OpenCL MCBS Quad Core Xeon Simulations 240M per Second Number of 8 Cores

11.5

11.0

Stock Price

10.5

10.0

9.5

9.0

448

N/A

1.00

0.95

0.90

0.85

0.80

0.75

0.70

0.65

0.60

0.55

0.50

0.45

0.40

0.35

0.30

0.25

0.20

0.15

0.10

0.05

8.0

0.00

8.5

950M

Stratix® IV 530 FPGA 2,200M

NVIDIA S870

Time

 

Calculate the value of an option with multiple sources of uncertainty FPGA delivers higher performance at a fraction of the power

Achieve Higher Performance vs CPU © 2012 Altera Corporation CONFIDENTIAL 35

Performance/Watt (Document Search) 

Documenting filtering algorithm 

Review incoming stream (documents) and return best match

 E.g. Monitors news feeds and recommends others



FPGA outperforms by >5x 

Saving power = Saving $  Annual power cost was $2.9 million or $456 per KW

Higher Perf/Watt vs GPU © 2012 Altera Corporation CONFIDENTIAL 36

ALTERA OpenCL What’s Next

© 2012 Altera Corporation CONFIDENTIAL 37

Current OpenCL System Architecture Host Processor

Kernel0

Kernel1

Kernel2



Kernel N

Global Memory

High demand on CPU Memory-to-memory paradigm © 2012 Altera Corporation CONFIDENTIAL 38

Desired Architecture ( OpenCL Pipes ) Host Processor Initialize() Buffer

Kern0

Buffer

Kern1

p

Buffer Buffer Traffic Manager

Kern2

KernN

1-p

Buffer

Global Memory Traffic Manager

CPU: Configure and “Go” Stream orientation when needed © 2012 Altera Corporation CONFIDENTIAL 39

ALTERA OpenCL Q&A

© 2012 Altera Corporation CONFIDENTIAL 40

Altera brings OpenCL to FPGAs

Uses OpenCL C extensions (adds parallelism to C). - Includes API (open standard for different devices). ▫ Targets heterogeneous systems. - Performance via hardware acceleration. ▫ The consortium (short list):. - Apple, Altera, AMD, Broadcom, Khronos, Intel, ARM,. Ericsson, Texas Instruments, Samsung, IBM,. Google ...

2MB Sizes 0 Downloads 248 Views

Recommend Documents

Introduction to Altera Quartus II & ModelSim.pdf
Introduction to Altera Quartus II & ModelSim.pdf. Introduction to Altera Quartus II & ModelSim.pdf. Open. Extract. Open with. Sign In. Main menu.

opencl programming book pdf
Download. Connect more apps... Try one of the apps below to open or edit this item. opencl programming book pdf. opencl programming book pdf. Open. Extract.

Michael Piscopo - Gnuradio Opencl-Enabled Blocks - GRCon ...
Aerospace Contractor – Real-time distributed centrifuge team ... Several parallelization modes: data parallel [“Single Instruction Multiple Data” (SIMD)] and ... Michael Piscopo - Gnuradio Opencl-Enabled Blocks - GRCon 2017v7.pdf. Michael ...

West Side Story Brings Life to Stage
wait to take roll online and keep all ... working to master the musical numbers ... the student body of WCHS, its administration nor the West Covina Unified School.

Elgebra brings unparalleled agility and quality to IT Staffing Industry ...
Elgebra specializes in recruiting and staffing of US citizen, ... corporations on strategic sourcing and business transformation. Elgebra delivers ... PDF. Elgebra brings unparalleled agility and quality to IT Staffing Industry - Saurabh Agarwal.PDF.

OpenCL on the Playstation 3
Parallel computing has become more popular with the intro- duction of multi-core ... Cell/B.E. and the NVIDIA Tesla [3] simultaneously and separately would ...

GPT Brings Happy Advertisers and Higher Earnings to ...
Today, Weight Watchers is a thriving public company with millions of members from New York to. Hong Kong. “Weight Watchers is not a quick-fix diet,” says Jordan Tuck of Weight Watchers. “We're a healthy-living brand, focused on long-term weight

Startup Brings CRM to the Inbox with Google Cloud Platform
build and host web apps on the same systems that power Google applications. It offers fast development and deployment, effortless scalability and simple ...

Warwickshire College Group brings international experts to the ...
Since students already used Google. Apps for Education and teachers were familiar with Gmail, the solution met. WCG's goal to introduce technology that equips ...

GTX Corp brings personal location-based services to ... Earth
phones, life preservers, motorcycles, laptops, or other items; we just make them ... Google is a trademark of Google Inc. All other company and product names ...

Partial Reconfiguration Across FPGAs
where the imager priority is higher than the thrusters, and those logic cores replace the valve actuator logic cores. Table 2 shows a table of typical satellite subsystems and the logic needed to function properly. Figure 6 shows what the. FPGAs migh

OpenCL Framework for ARM Processors with NEON ...
Engineering. Seoul National University ... WPMVP '14, February 15–19 2014, Orlando, FL, USA. Copyright c .... OpenCL host programs usually enqueues commands .... hardware, most OpenCL applications use the online compilation in order ...

0750689749 - (2008) FPGAs instant access.pdf
Page 1 of 8. The penguinmadagascar dual.Waxahatchee – IvyTripp.80918014956 - Download Farcry 3 iso.Real humans 720p x265.As it initiates the move,. think ofit has"kicking afootball"has it is moving to the middle ofthering. When theleft footmakescon

opencl-20training.1-36.pdf
There was a problem previewing this document. Retrying... Download. Connect more apps... Try one of the apps below to open or edit this item.

OpenCL Quick Reference Card - Khronos Group
for high-performance compute servers, desktop ... cl_int clReleaseProgram (cl_program program) ... cl_int clGetProgramBuildInfo (cl_program program,.

ABOUT PICT The Computer Engineering PREAMBLE OpenCL (Open ...
Engineering, Information Technology and. Electronics and ... technologies like. OpenCL (Open. Computing. Language) ... Dr. B.A.Sonkamble. Professor, Dept. of ...

0750689749 - (2008) FPGAs instant access.pdf
as a designer of central processing units for mainframe computers. During his. career, he has designed everything from ASICs to PCBs and has meandered.

Lei 375_05 - Altera o Art 246 do CTM.pdf
There was a problem previewing this document. Retrying... Download. Connect more apps... Try one of the apps below to open or edit this item. Lei 375_05 ...

Learning-Tableau-How-Data-Visualization-Brings-Business ...
eBook PDF Learning Tableau - How Data Visualization Brings Business Intelligence To Life. [eBooks] Learning Tableau - How Data Visualization Brings. Business Intelligence To Life. LEARNING TABLEAU - HOW DATA VISUALIZATION BRINGS BUSINESS INTELLIGENCE