Ocelot Installation Manual A Dynamic Compilation Framework for Heterogeneous Systems
Jeffrey Young August 28th, 2012 SVN revision 2028 http://code.google.com/p/gpuocelot/
This document details the installation of Ocelot from the Subversion repository using Ubuntu 12.04 as the base OS. If you can use one of the prepackaged builds, this is recommended unless you need to use the latest features of Ocelot. If you run into any problems with the process in this document first try checking the wiki at http://code.google.com/p/gpuocelot/w/list and then the Google forum at https://groups.google.com/forum/#!forum/gpuocelot.
1) Make sure the required compilers and tools are installed: C++ compiler (≥ GCC 4.5.x) Lex lexer generator (≥ Flex 2.5.35) YACC parser generator ( ≥ Bison 2.4.1) Scons build tool Subversion sudo aptget install flex bison scons buildessential subversion
2) Get the source code from Google Code. Note that this pulls the entire trunk with all tests and examples, which is very large svn checkout http://gpuocelot.googlecode.com/svn/trunk/ gpuocelotreadonly
If you don't want any test code or examples, you can pull just the source directory for Ocelot: svn checkout http://gpuocelot.googlecode.com/svn/trunk/ocelot/ocelot/ gpuocelotreadonly
3) Install LLVM if you want to use different back-ends (x86, AMD, etc.) If you are running an OS that supports a version of LLVM ≥ 3.1 sudo aptget install llvm
For OS's that support versions of LLVM < 3.1 (for instance, Ubuntu 12.04 uses 2.9 by default), you'll need to pull the latest version from the LLVM SVN repository. Otherwise when running Ocelot's build script you may run into the following error: ocelot/ir/implementation/ExternalFunctionSet.cpp: In function 'llvm::Function* ir::jitFunction(const ir::ExternalFunctionSet::ExternalFunction&, const ir::PTXKernel::Prototype&, llvm::Module*)': ocelot/ir/implementation/ExternalFunctionSet.cpp:320:9: error: 'class llvm::SMDiagnostic' has no member named 'print'
To pull from the LLVM SVN and build: svn co http://llvm.org/svn/llvmproject/llvm/trunk llvm ./configure enableoptimized sudo make sudo make install
4) Install Boost, LibGlew, and LibGlut Boost (≥ 1.4.6 suggested): sudo aptget install libboostdev libboostsystemdev libboostfilesystemdev libboostthreaddev
Libglew (version 1.6) is used to support OpenGL features used in some of the SDK examples sudo aptget install libglew1.6dev
-Note that this will install a large number of support libraries as dependencies. To install LibGlut (used by some of the test applications in the CUDA SDK) sudo aptget install freeglut3 freeglut3dev
5) Other notes (Items that are not needed for installation) Hydrazine is checked out automatically with the trunk of GPUOcelot. This folder contains some utility functionality like a JSON parser, floating-point functions, bit-wise operations, casts, and exception objects. librt.so is used for some of the SDK examples, and refers to Linux real-time functionality. This library is installed by default with GCC as part of libc, so there is no need to install any additional packages to provide this library. Please note that librt was mentioned as “rt” or “rt3.8” in previous installation guides, but in Ubuntu this refers to the request-tracker package.
6)Build Ocelot using the build.py script: cd gpuocelotreadonly (or cd gpuocelotreadonly/ocelot sudo build.py install
to only build the ocelot library)
If Ocelot finishes compiling correctly, the Ocelot library should be installed in "/usr/local/lib/libocelot.so" or can be found locally under the ".release_build" directory.
7)Building tests: You can either build tests using the build.py script in the root directory of the trunk (top-level "gpuocelot-read-only" directory checked out from SVN) or in the specific test directory using the scons build tool and the local SConscript file. It is recommended that you use the high-level build.py script as this will require fewer changes to the files in each directory. To build test folders: sudo ./build.py test_lists test_level=
An example of building one test folder: sudo ./build.py test_lists parboil test_level=basic
An example of building multiple test folders: sudo ./build.py test_lists cuda4.1,parboil test_level=full
If you receive the following error while building any of the tests with the build.py script: OSError: [Errno 2] No such file or directory: '../../.release_build/tests/cuda2.3
-Go to the top-level directory ("gpuocelot-read-only") -sudo mkdir .release_build/tests/cuda2.3 -Rerun the build for your particular test, and the compiled test applications should now be placed in the correct release directory. To build all the tests in one folder using scons and local scripts (not recommended): Note that this step is not recommended since build.py can be used to build all tests. This method is provided only if you should run into problems with the build.py script, or you want to move test folders out of the normal directory structure. You will need to modify the SConscript file in the test folder since you are calling it in the local directory, not from the build.py script in a higher-level directory. sys.path.append('../../scripts') ==> sys.path.append('../scripts') Comment out env.Append(LIBPATH = [os.path.join(env['install_path'], 'lib')]) and env.Append(CPPFLAGS = 'I' + os.path.join(env['install_path'], 'include'))
8)Building applications with Ocelot To link against Ocelot: To link against the Ocelot library, you can use the OcelotConfig tool as follows: g++ o my_program my_program.o `OcelotConfig l`
The configure.ocelot file: Each test folder contains a file called configure.ocelot that can be used to set trace options and specify the device that is used to run the CUDA kernel (llvm, emulated, nvidia, amd). By default, this file is set to use the "emulated" device. Try running the same example using different devices such as “emulated” or “nvidia” so you can get a feel for the difference in performance and functionality of each device. For more information on creating and modifying these files see http://code.google.com/p/gpuocelot/wiki/OcelotConfigFile An example individual compilation of a test program from the 4.1 SDK, Clock: Input Files: clock.cu - Main CUDA file that launches kernel on the selected device ( as configured through configure.ocelot) clock_kernel.cu - timing kernel that runs on the GPU Intermediate Output Files: clock.cu.cpp, clock_kernel.cu.cpp - intermediate file that contains the CUDA code from the kernel (converted to PTX) embedded as an array of bytes. NVCC generates this intermediate file in case you need to compile your CUDA code with other C++ files or languages like OpenMP or MPI. You can skip this intermediate step by passing the "-c" option to nvcc (e.g., “nvcc -c myprogram.cu”). //Use NVCC without -c flag to compile the clock kernel to an intermediate cu.cpp file /usr/local/cuda/bin/nvcc I. I/usr/local/cuda/include arch=sm_20 I./tests/clock I./shared I./sdk I. tests/clock/clock.cu cuda o tests/clock/clock.cu.cpp /usr/local/cuda/bin/nvcc I. I/usr/local/cuda/include arch=sm_20 I./tests/clock I./shared I./sdk I. tests/clock/clock_kernel.cu cuda o tests/clock/clock_kernel.cu.cpp
//Use g++ to compile each of the cu.cpp files to cu.o (object) files. This allows for finer-grained //compilation. g++ o tests/clock/clock_kernel.cu.o c Wall O2 g I. I/usr/local/cuda/include std=c++0x I./shared I./sdk I. I./tests/clock I./shared I./sdk I. tests/clock/clock_kernel.cu.cpp g++ o tests/clock/clock.cu.o c Wall O2 g I. I/usr/local/cuda/include std=c++0x I./shared I./sdk I. I./tests/clock I./shared I./sdk I. tests/clock/clock.cu.cpp
//Link against Ocelot. Note that libGlew and libGlut are also linked in for this test g++ o Clock tests/clock/clock_kernel.cu.o tests/clock/clock.cu.o L/usr/local/cuda/lib64 lglut locelot libsdk4_1.a lGLEW lGLU
More information For more information on any of the above topics, please check our wiki at http://code.google.com/p/gpuocelot/w/list or the Google forum at https://groups.google.com/forum/#! forum/gpuocelot