Using the VTune™ Performance Enhancement Environment for the Streaming SIMD Extensions

®

Copyright © 1999, Intel Corporation. All rights reserved.

1

Agenda n Background n The

VTune™ Performance Enhancement Environment for the Streaming SIMD Extensions n Development Methods for the Streaming SIMD Extensions n Summary

®

Copyright © 1999, Intel Corporation. All rights reserved.

2

Background: MMX™ Technology Tools n Enabled

low level work (assembly language) n Efforts to provide high level support (compilers) were: n late n not

utilized by ISVs n immature technically n not adopted by the industry quickly, or at all

®

Copyright © 1999, Intel Corporation. All rights reserved.

3

MMX™ MMX™ Technology Technology Tools Tools

What Developers Told Us

n It

is painful to realize performance benefits from MMX™ technology n Compilers are not capable of taking highlevel code and, automatically, producing optimized MMX™ technology instructions n You, the developer, had to use assembly

Lack Lackof ofgood goodtools toolscreates creates big bigheadaches headachesfor fordevelopers developers ®

Copyright © 1999, Intel Corporation. All rights reserved.

4

The VTune™ Performance Enhancement Environment, 4.0 n Intel®

C/C++ Compiler n VTune™ Analyzer n Register Viewing Tool n Performance Library Suite n Intel® Architecture Training Center The Thedefinitive definitivetoolkit toolkitfor forStreaming Streaming SIMD SIMDExtensions Extensionsprogramming programming ®

Copyright © 1999, Intel Corporation. All rights reserved.

5

VTune™ VTune™ Performance Performance Enhancement Enhancement Environment Environment

Intel® C/C++ Compiler

nA

“plug-in” to Microsoft* Developer Studio versions 5.0 and 6.0

Microsoft Visual Studio 97* n Object,

language compatible with MSVC v5.0 and v6.0

®

Copyright © 1999, Intel Corporation. All rights reserved. *Third-party brands and names are the property of their respective owners.

6

VTune™ VTune™ Performance Performance Enhancement Enhancement Environment Environment

Intel® C/C++ Compiler Optimization

n For

SIMD coding:

n inlined-asm, intrinsics,

vector classes, and

vectorization n data alignment mechanisms n CPU

Dispatch:

n different

code for different processors - one executable

n Scalar

optimization:

n aggressive

floating-point optimization n profile-guided and inter-procedural optimization Let Letthe thecompiler compilerdeal dealwith withoptimization optimization ®

Copyright © 1999, Intel Corporation. All rights reserved.

7

VTune™ VTune™ Performance Performance Enhancement Enhancement Environment Environment

n

VTune™ Analyzer

Performance tune apps via several methods: n Processor

sampling for CPU usage without binary instrumentation n CPU simulation in software - Dynamic Analysis n Chronologies of performance counters from OS, processor, 740 Graphics chipset n Call graph profiling for C/C++ and Java* with binary instrumentation ®

Copyright © 1999, Intel Corporation. All rights reserved. *Third-party brands and names are the property of their respective owners.

8

VTune™ VTune™ Performance Performance Enhancement Enhancement Environment Environment

VTune™ Analyzer

All tuning methods centered around source code views n Offers performance tuning advice for C/C++, Fortran, Java*, assembly n Teaches how to write better performing code n Supports Pentium® III, Pentium II, Pentium Pro, and Pentium processors on Windows* 95/98, Windows NT* 4.0/5.0 n

®

Copyright © 1999, Intel Corporation. All rights reserved. *Third-party brands and names are the property of their respective owners.

9

VTune™ VTune™ Performance Performance Enhancement Enhancement Environment Environment

®

™ VTune

Analyzer: Hotspots

Copyright © 1999, Intel Corporation. All rights reserved.

10

VTune™ VTune™ Performance Performance Enhancement Enhancement Environment Environment

®

™ VTune

Analyzer: Coach Advice

Copyright © 1999, Intel Corporation. All rights reserved.

11

VTune™ VTune™ Performance Performance Enhancement Enhancement Environment Environment

®

VTune™ Analyzer: Dynamic Analysis

Copyright © 1999, Intel Corporation. All rights reserved.

12

VTune™ VTune™ Performance Performance Enhancement Enhancement Environment Environment

®

VTune™ Analyzer: Call Graph Profiling

Copyright © 1999, Intel Corporation. All rights reserved.

13

VTune™ VTune™ Performance Performance Enhancement Enhancement Environment Environment n

®

Register Viewing Tool

Shows xmm registers during execution, debugging

Copyright © 1999, Intel Corporation. All rights reserved.

14

VTune™ VTune™ Performance Performance Enhancement Enhancement Environment Environment

Performance Libraries

n Image

Processing n Signal Processing n Recognition Primitives n Math Kernel n JPEG Library Streaming StreamingSIMD SIMDExtensions Extensionsand and MMX™ MMX™ Technology Technology used used extensively extensively ®

Copyright © 1999, Intel Corporation. All rights reserved.

15

VTune™ VTune™ Performance Performance Enhancement Enhancement Environment Environment

Intel® Architecture Training Center

n Computer

Based Training (CBT):

n interactive

®

tutorial on Streaming SIMD Extensions

Copyright © 1999, Intel Corporation. All rights reserved.

16

VTune™ VTune™ Performance Performance Enhancement Enhancement Environment Environment

Intel® Architecture Training Center, cont’d

n Pentium®

II and Pentium III processors Programmer’s Reference Manuals n Optimization Manual n Application

notes and code samples using Streaming SIMD Extensions: n 3D lighting/transform, filters, min/max, Newton-Raphson, FFT, deformable surfaces, & lots more

®

Copyright © 1999, Intel Corporation. All rights reserved.

17

Development Methods for the Streaming SIMD Extensions Hand coded assembly Bit Bangers Only!

Intrinsics

movaps xmm0, b[i] movaps xmm1, c[i] addps xmm0, xmm1 movaps a[i], xmm0

a[i]=_mm_add_ps(b[i],c[i])

Performance libraries You make the call

C++ class library

RLsbAdd3()

Difficulty

Fast food assembly

a[i]=b[i]+c[i]

Performance for the masses

Vectorization

#pragma vector

Let the compiler do the work, sort of...

®

Copyright © 1999, Intel Corporation. All rights reserved.

18

Development Development Methods Methods

Intrinsics

n New

data type: __m128 n No need to schedule and register allocate n Can

still choose instruction sequences n Use for MMX™ Technology and the Streaming SIMD Extensions n Definitions in Pentium® III Processor Programmer’s Reference Manual n Near hand-coded assembly performance (< 15% difference) a[i]=_mm_add_ps(b[i],c[i]) ®

Copyright © 1999, Intel Corporation. All rights reserved.

19

Development Development Methods Methods

C++ class library

n Abstract

the underlying technology n Performance gain everywhere class used n New packed data types: n I32vec2(2

32-bit ints), I16vec4 (4 16-bit ints), I8vec8(8 8-bit ints), and unsigned versions n F32vec4(4 32-bit floats) n Extensible,

easy to use, keeps code readable and portable n Nearly matches intrinsics performance (0-5% difference) a[i]=b[i]+c[i] ®

Copyright © 1999, Intel Corporation. All rights reserved.

20

Development Development Methods Methods

Vectorization

n Compiler

generates SIMD integer or FP code for you under strict conditions: n countable

loops with single-unit stride n body of loop must be single block – no internal branching, single entry/exit n data

types: float, char/short/int

n user

ensures correct alignment for floats n user ensures no aliases for pointers n no function calls in loop

®

Copyright © 1999, Intel Corporation. All rights reserved.

21

Development Development Methods Methods

Vector-Multiply-Add in C

void do_c(float *ac, float *m, float *v, int n) { for(int i=0; i
®

Copyright © 1999, Intel Corporation. All rights reserved.

22

Development Development Methods Methods

Intrinsics Example

// assumes data passed in is 16-byte aligned!!! void do_intrin(float *ac, float *m, float *v, int n) { __m128 t, vp, mp, ap; for(int i=0; i
Copyright © 1999, Intel Corporation. All rights reserved.

23

Development Development Methods Methods

Vector Class Example

void do_fvec(float *ac, float *m, float *v, int n) { F32vec4 *a=(F32vec4 *) ac; F32vec4 *c=(F32vec4 *) m; F32vec4 *b=(F32vec4 *) v; for(int i=0; i<(n / 4); i++) // reduced iterations a[i] += (c[i] * b[i]); } The Thevector vectorclasses classesoffer offerefficient, efficient, portable portable performance performance ®

Copyright © 1999, Intel Corporation. All rights reserved.

24

Development Development Methods Methods

Vectorization Example

// restrict says data is only accessible via given pointer void do_vectorize(float *restrict ac, float *restrict m, float *restrict v, int n) { #pragma vectorize aligned // user ensures data is aligned for(int i=0; i<100; i++) ac[i] += (m[i] * v[i]); } Give Give some some characteristics; characteristics; compiler compilergenerates generatesthe theSIMD SIMDcode code ®

Copyright © 1999, Intel Corporation. All rights reserved.

25

Summary n We

learned our lesson: lack of tools = big headaches for developers n The VTune™ Performance Enhancement Environment has everything you need for Streaming SIMD Extensions programming n The new development methods allow you to concentrate on your application; the tools will help you achieve the performance

®

Copyright © 1999, Intel Corporation. All rights reserved.

26

Now, it’s your turn:

www.intel.com/VTune

®

Copyright © 1999, Intel Corporation. All rights reserved.

27

INTEL The VTuneTM.pdf

Tools. MMXTM. Lack of good tools creates. big headaches for developers. Page 4 of 14. INTEL The VTuneTM.pdf. INTEL The VTuneTM.pdf. Open. Extract.

2MB Sizes 1 Downloads 364 Views

Recommend Documents

No documents