Using the VTune™ Performance Enhancement Environment for the Streaming SIMD Extensions

®

Copyright © 1999, Intel Corporation. All rights reserved.

1

Agenda n Background n The

VTune™ Performance Enhancement Environment for the Streaming SIMD Extensions n Development Methods for the Streaming SIMD Extensions n Summary

®

Copyright © 1999, Intel Corporation. All rights reserved.

2

Background: MMX™ Technology Tools n Enabled

low level work (assembly language) n Efforts to provide high level support (compilers) were: n late n not

utilized by ISVs n immature technically n not adopted by the industry quickly, or at all

®

Copyright © 1999, Intel Corporation. All rights reserved.

3

MMX™ MMX™ Technology Technology Tools Tools

What Developers Told Us

n It

is painful to realize performance benefits from MMX™ technology n Compilers are not capable of taking highlevel code and, automatically, producing optimized MMX™ technology instructions n You, the developer, had to use assembly

Lack Lackof ofgood goodtools toolscreates creates big bigheadaches headachesfor fordevelopers developers ®

Copyright © 1999, Intel Corporation. All rights reserved.

4

The VTune™ Performance Enhancement Environment, 4.0 n Intel®

C/C++ Compiler n VTune™ Analyzer n Register Viewing Tool n Performance Library Suite n Intel® Architecture Training Center The Thedefinitive definitivetoolkit toolkitfor forStreaming Streaming SIMD SIMDExtensions Extensionsprogramming programming ®

Copyright © 1999, Intel Corporation. All rights reserved.

5

VTune™ VTune™ Performance Performance Enhancement Enhancement Environment Environment

Intel® C/C++ Compiler

nA

“plug-in” to Microsoft* Developer Studio versions 5.0 and 6.0

Microsoft Visual Studio 97* n Object,

language compatible with MSVC v5.0 and v6.0

®

Copyright © 1999, Intel Corporation. All rights reserved. *Third-party brands and names are the property of their respective owners.

6

VTune™ VTune™ Performance Performance Enhancement Enhancement Environment Environment

Intel® C/C++ Compiler Optimization

n For

SIMD coding:

n inlined-asm, intrinsics,

vector classes, and

vectorization n data alignment mechanisms n CPU

Dispatch:

n different

code for different processors - one executable

n Scalar

optimization:

n aggressive

floating-point optimization n profile-guided and inter-procedural optimization Let Letthe thecompiler compilerdeal dealwith withoptimization optimization ®

Copyright © 1999, Intel Corporation. All rights reserved.

7

VTune™ VTune™ Performance Performance Enhancement Enhancement Environment Environment

n

VTune™ Analyzer

Performance tune apps via several methods: n Processor

sampling for CPU usage without binary instrumentation n CPU simulation in software - Dynamic Analysis n Chronologies of performance counters from OS, processor, 740 Graphics chipset n Call graph profiling for C/C++ and Java* with binary instrumentation ®

Copyright © 1999, Intel Corporation. All rights reserved. *Third-party brands and names are the property of their respective owners.

8

VTune™ VTune™ Performance Performance Enhancement Enhancement Environment Environment

VTune™ Analyzer

All tuning methods centered around source code views n Offers performance tuning advice for C/C++, Fortran, Java*, assembly n Teaches how to write better performing code n Supports Pentium® III, Pentium II, Pentium Pro, and Pentium processors on Windows* 95/98, Windows NT* 4.0/5.0 n

®

Copyright © 1999, Intel Corporation. All rights reserved. *Third-party brands and names are the property of their respective owners.

9

VTune™ VTune™ Performance Performance Enhancement Enhancement Environment Environment

®

™ VTune

Analyzer: Hotspots

Copyright © 1999, Intel Corporation. All rights reserved.

10

VTune™ VTune™ Performance Performance Enhancement Enhancement Environment Environment

®

™ VTune

Analyzer: Coach Advice

Copyright © 1999, Intel Corporation. All rights reserved.

11

VTune™ VTune™ Performance Performance Enhancement Enhancement Environment Environment

®

VTune™ Analyzer: Dynamic Analysis

Copyright © 1999, Intel Corporation. All rights reserved.

12

VTune™ VTune™ Performance Performance Enhancement Enhancement Environment Environment

®

VTune™ Analyzer: Call Graph Profiling

Copyright © 1999, Intel Corporation. All rights reserved.

13

VTune™ VTune™ Performance Performance Enhancement Enhancement Environment Environment n

®

Register Viewing Tool

Shows xmm registers during execution, debugging

Copyright © 1999, Intel Corporation. All rights reserved.

14

VTune™ VTune™ Performance Performance Enhancement Enhancement Environment Environment

Performance Libraries

n Image

Processing n Signal Processing n Recognition Primitives n Math Kernel n JPEG Library Streaming StreamingSIMD SIMDExtensions Extensionsand and MMX™ MMX™ Technology Technology used used extensively extensively ®

Copyright © 1999, Intel Corporation. All rights reserved.

15

VTune™ VTune™ Performance Performance Enhancement Enhancement Environment Environment

Intel® Architecture Training Center

n Computer

Based Training (CBT):

n interactive

®

tutorial on Streaming SIMD Extensions

Copyright © 1999, Intel Corporation. All rights reserved.

16

VTune™ VTune™ Performance Performance Enhancement Enhancement Environment Environment

Intel® Architecture Training Center, cont’d

n Pentium®

II and Pentium III processors Programmer’s Reference Manuals n Optimization Manual n Application

notes and code samples using Streaming SIMD Extensions: n 3D lighting/transform, filters, min/max, Newton-Raphson, FFT, deformable surfaces, & lots more

®

Copyright © 1999, Intel Corporation. All rights reserved.

17

Development Methods for the Streaming SIMD Extensions Hand coded assembly Bit Bangers Only!

Intrinsics

movaps xmm0, b[i] movaps xmm1, c[i] addps xmm0, xmm1 movaps a[i], xmm0

a[i]=_mm_add_ps(b[i],c[i])

Performance libraries You make the call

C++ class library

RLsbAdd3()

Difficulty

Fast food assembly

a[i]=b[i]+c[i]

Performance for the masses

Vectorization

#pragma vector

Let the compiler do the work, sort of...

®

Copyright © 1999, Intel Corporation. All rights reserved.

18

Development Development Methods Methods

Intrinsics

n New

data type: __m128 n No need to schedule and register allocate n Can

still choose instruction sequences n Use for MMX™ Technology and the Streaming SIMD Extensions n Definitions in Pentium® III Processor Programmer’s Reference Manual n Near hand-coded assembly performance (< 15% difference) a[i]=_mm_add_ps(b[i],c[i]) ®

Copyright © 1999, Intel Corporation. All rights reserved.

19

Development Development Methods Methods

C++ class library

n Abstract

the underlying technology n Performance gain everywhere class used n New packed data types: n I32vec2(2

32-bit ints), I16vec4 (4 16-bit ints), I8vec8(8 8-bit ints), and unsigned versions n F32vec4(4 32-bit floats) n Extensible,

easy to use, keeps code readable and portable n Nearly matches intrinsics performance (0-5% difference) a[i]=b[i]+c[i] ®

Copyright © 1999, Intel Corporation. All rights reserved.

20

Development Development Methods Methods

Vectorization

n Compiler

generates SIMD integer or FP code for you under strict conditions: n countable

loops with single-unit stride n body of loop must be single block – no internal branching, single entry/exit n data

types: float, char/short/int

n user

ensures correct alignment for floats n user ensures no aliases for pointers n no function calls in loop

®

Copyright © 1999, Intel Corporation. All rights reserved.

21

Development Development Methods Methods

Vector-Multiply-Add in C

void do_c(float *ac, float *m, float *v, int n) { for(int i=0; i
®

Copyright © 1999, Intel Corporation. All rights reserved.

22

Development Development Methods Methods

Intrinsics Example

// assumes data passed in is 16-byte aligned!!! void do_intrin(float *ac, float *m, float *v, int n) { __m128 t, vp, mp, ap; for(int i=0; i
Copyright © 1999, Intel Corporation. All rights reserved.

23

Development Development Methods Methods

Vector Class Example

void do_fvec(float *ac, float *m, float *v, int n) { F32vec4 *a=(F32vec4 *) ac; F32vec4 *c=(F32vec4 *) m; F32vec4 *b=(F32vec4 *) v; for(int i=0; i<(n / 4); i++) // reduced iterations a[i] += (c[i] * b[i]); } The Thevector vectorclasses classesoffer offerefficient, efficient, portable portable performance performance ®

Copyright © 1999, Intel Corporation. All rights reserved.

24

Development Development Methods Methods

Vectorization Example

// restrict says data is only accessible via given pointer void do_vectorize(float *restrict ac, float *restrict m, float *restrict v, int n) { #pragma vectorize aligned // user ensures data is aligned for(int i=0; i<100; i++) ac[i] += (m[i] * v[i]); } Give Give some some characteristics; characteristics; compiler compilergenerates generatesthe theSIMD SIMDcode code ®

Copyright © 1999, Intel Corporation. All rights reserved.

25

Summary n We

learned our lesson: lack of tools = big headaches for developers n The VTune™ Performance Enhancement Environment has everything you need for Streaming SIMD Extensions programming n The new development methods allow you to concentrate on your application; the tools will help you achieve the performance

®

Copyright © 1999, Intel Corporation. All rights reserved.

26

Now, it’s your turn:

www.intel.com/VTune

®

Copyright © 1999, Intel Corporation. All rights reserved.

27

INTEL The VTuneTM.pdf

Technology. Tools. MMXTM. Lack of good tools creates. big headaches for developers. Page 4 of 27. INTEL The VTuneTM.pdf. INTEL The VTuneTM.pdf. Open.

2MB Sizes 0 Downloads 213 Views

Recommend Documents

INTEL The VTuneTM.pdf
Tools. MMXTM. Lack of good tools creates. big headaches for developers. Page 4 of 14. INTEL The VTuneTM.pdf. INTEL The VTuneTM.pdf. Open. Extract.

Bridging the time zones - Intel
To operate effectively, a retail or wholesale company needs to make sure that the ... switched off, so they could be updated with new software or business data.

Intel - Media12
Bossers & Cnossen looks to Intel®vPro™technology to boost services revenue ... evolves (e.g., toward cloud computing), it is becoming increasingly difficult for IT ... of our gross turnover comes from hardware sales, but these margins are starting

Bridging the time zones - Intel
To operate effectively, a retail or wholesale company needs to make sure that the ... switched off, so they could be updated with new software or business data.

Bridging the time zones - Intel
METRO Cash & Carry, an international leader in self-service wholesale, ... several days per month with more effective remote management of the machines, and will ... Security features enabled by Intel® Active Management Technology require an enabled

Intel Sat901 -
Email:[email protected], [email protected]. 1700$. This document was created with the trial version of Print2PDF! Once Print2PDF is ...

Intel training.pdf
There was a problem previewing this document. Retrying... Download. Connect more apps... Try one of the apps below to open or edit this item. Intel training.pdf.

Best Practices: Deploying the Intel Unite® Solution
that connect to the corporate network using a Wi-Fi* connection. Because of ... quality-of-service standards, such as less than 1-percent packet loss for wireless ...

Making the cloud more transparent - Intel - Media13
Communications/Media. Cloud Security ... the right type of cloud resources based on a list of user-configurable criteria would fulfill .... All rights reserved. Intel ...

Making the cloud more transparent - Intel - Media13
CHALLENGES. • Understanding needs: When developing its new cloud brokerage service, a priority for. CompatibleOne was gaining a detailed understanding ...

Scout7 Changing the Game - Intel - Media15
In North America, Toronto FC is about to embark on its ninth ... involved in domestic college soccer during the ... As is the case at Swansea, the Toronto system.

Infinite performance - Intel - Media13
quad data rate (QDR) InfiniBand network. TECHNOLOGY ... University of Coimbra evaluates performance and scalability benefits of the latest Intel®technology.

3121_CS_2P_Bossers&Cnossen.qxd:Layout 1 - Intel
IT services to medium and large not-for-profit and private enterprises. Its expertise extends from the design, delivery, and installation of the physical infrastructure to ongoing maintenance and management. Most of Bossers & Cnossen's revenue comes

3121_CS_2P_Bossers&Cnossen.qxd:Layout 1 - Intel
and resolve more issues from the central helpdesk, reducing the number of deskside visits. This saves valuable time, reduces down- time, and improves the end-user experience. Hanze UAS also plans to record event logs into the Intel vPro technology ca

Infinite performance - Intel - Media13
Performance testing. Evaluate core applications' performance and scalability when running on the latest Intel® technology. SOLUTIONS. • Processing power.

nypd-intel-morocco.pdf
Sign in. Loading… Whoops! There was a problem loading more pages. Whoops! There was a problem previewing this document. Retrying... Download. Connect ...

Faultless customer service - Intel - Media13
in high season when sales campaigns were in full flow. The custom application which ran the Web portal was developed with two tiers: a front end and a custom ...

Intel PXA27x Processor.pdf
Page 1 of 16. Intel® PXA27x Processor. Developer's Kit. Quick Start Guide. April 2004. Order Number: 278953-003. Page 1 of 16 ...

Intel PXA27x Processor.pdf
... the United States and other. countries. *Other names and brands may be claimed as the property of others. Copyright © Intel Corporation, 2004. Page 2 of 16 ...

High-performance weather forecasting - Intel
Intel® Xeon® Processor E5-2600 v2 Product Family. High-Performance Computing. Government/Public Sector. High-performance weather forecasting.

Intel ESS Cerner Case Study
group needed to improve management, support, and security for more than ... Configuration Manager* (SCCM*) to manage PCs remotely, the IT group has.

Faultless customer service - Intel - Media13
Due to slow processing speeds and lack of computing power, the customer was unable to complete all orders and maintain client service levels. SOLUTIONS.