From a5a2bd14cdf8024ed35650d79515652d8aedd927 Mon Sep 17 00:00:00 2001 From: Oreste Villa Date: Mon, 21 Oct 2019 08:43:22 -0700 Subject: [PATCH] Update README.md --- README.md | 158 ++++-------------------------------------------------- 1 file changed, 9 insertions(+), 149 deletions(-) diff --git a/README.md b/README.md index 9aeda12..ac73d5d 100644 --- a/README.md +++ b/README.md @@ -39,155 +39,15 @@ function. Because NVBit does not require application source code, any pre-compiled GPU application should work regardless of which compiler (or version) has been -used (i.e. nvcc, pgicc, etc). NVBit supports Kepler, Maxwell, Pascal and -Volta architectures (specifically SM >= 3.5 && SM <= 7.0) and it is developed -for compute (tested with CUDA and OpenACC) under x86-64 and PPC64 Linux. +used (i.e. nvcc, pgicc, etc). -## Getting Started with NVBit +## Requirements -NVBit is provided in a .tgz file containing this README file and three folders: -1. A ```core``` folder, which contains the main static library -```libnvbit.a``` and various headers files (among which the ```nvbit.h``` -file which contains all the main NVBit APIs declarations). -2. A ```tools``` folder, which contains various source code examples of NVBit -tools. A new user of NVBit, after familiarizing with these pre-existing tools -will typically make a copy of one of them and modify appropriately. -3. A ```test-apps``` folder, which contains a simple application that can be -used to test NVBit tools. There is nothing special about this application, it -is a simple vector addition program. - -NVBit tools must be compiled with nvcc version >= 8.0 , using GCC -version >= 5.3.0. - -To compile the NVBit tools simply type ```make``` from inside the ```tools``` -folder (make sure ```nvcc``` is in your PATH). -Compile the test application by typing ```make``` inside the ```test-apps``` -folder. - -## Using an NVBit tool - -Before running an NVBit tool, make sure ```nvdisasm``` is in your PATH. In -Ubuntu distributions this is typically done by adding /usr/local/cuda/bin or -/usr/local/cuda-"version"/bin to the PATH environment variable. - -To use an NVBit tool we simply LD_PRELOAD the tool before the application -execution command. - -For instance if the application vector add runs natively as: - -``` -./test-apps/vectoradd/vectoradd -``` - -and produces the following output: - -``` -Final sum = 100000.000000; sum/n = 1.000000 (should be ~1) -``` - -we would use the NVBit tool which performs instruction count as follow: - -``` -LD_PRELOAD=./tools/instr_count/instr_count.so ./test-apps/vectoradd/vectoradd -``` - -The output for this command should be the following: - -```no-highlight -------------- NVBit (NVidia Binary Instrumentation Tool) Loaded -------------- -NVBit core environment variables (mostly for nvbit-devs): - VERBOSE = 0 - if set, enables verbose messages level (1,2,3,...) - DUMP_SASS = 0 - if set, dumps SASS of original and patched kernel - NOINSPECT = 0 - if set, skips function inspection and instrumentation - WARNINGS = 0 - if set, enable warning prints - NVDISASM = nvdisasm - override default nvdisasm found in PATH - NOBANNER = 0 - if set, does not print this banner ------------------------------------------------------------------------------ - INSTR_BEGIN = 0 - Beginning of the instruction interval where to apply instrumentation - INSTR_END = 4294967295 - End of the instruction interval where to apply instrumentation - KERNEL_BEGIN = 0 - Beginning of the kernel launch interval where to apply instrumentation - KERNEL_END = 4294967295 - End of the kernel launch interval where to apply instrumentation - COUNT_WARP_LEVEL = 1 - Count warp level or thread level instructions - EXCLUDE_PRED_OFF = 0 - Exclude predicated off instruction from count - TOOL_VERBOSE = 0 - Enable verbosity inside the tool ----------------------------------------------------------------------------------------------------- -kernel 0 - vecAdd(double*, double*, double*, int) - #thread-blocks 98, kernel instructions 50077, total instructions 50077 -Final sum = 100000.000000; sum/n = 1.000000 (should be ~1) -``` - -As we can see, before the original output, there is a print showing the kernel -call index "0", the kernel function prototype -"vecAdd(double*, double*, double*, int)", total number of thread blocks launched - in this kernel "98", the number of executed instructions in the kernel "50077", - and for the all application "50077". - -When the application starts, also two banners are printed showing the environment -variables (and their current values) that can be used to control the NVBit core -or the specific NVBit Tool. -Mostly of the NVBit core environment variable are used for core -debugging/development purposes. -Set the environment value NOBANNER=1 to disable the core banner if that -information is not wanted. - -### Examples of NVBit Tools - -As explained above, inside the ```tools``` folder there are few example of -NVBit tools. Rather than describing all of them in this README file we refer -to comment in the source code of each one them. - -The natural order (in terms of complexity) to learn these tools is: - -1. instr_count: Perform thread level instruction count. Specifically, a -function is injected before each SASS instruction. Inside this function the -total number of active threads in a warp is computed and a global counter is -incremented. - -2. opcode_hist: Generate an histogram of all executed instructions. - -3. mov_replace: Replace each SASS instruction of type MOV with an equivalent -function. This tool make use of the read/write register functionality within -the instrumentation function. - -4. instr_countbb: Perform thread level instruction count by instrumenting -basic blocks. The final result is the same as instr_count, but mush faster -since less instructions are instrumented (only the first instruction in each -basic block is instrumented and the counter). - -5. mem_printf: Print memory reference addresses for each global LOAD/STORE -using the GPU side printf. This is accomplished by injecting an -instrumentation function before each SASS instruction performing global -LOAD/STORE, passing the register values and immediate used by that -instruction (used to compute the resulting memory address) and performing the -printf. - -6. mem_trace: Trace memory reference addresses. This NVBit tool works -similarly to the above example but instead of using a GPU side printf it uses -a communication channel (provided in utils/channel.hpp) to transfer data from -GPU-to-CPU and it performs the printf on the CPU side. - -We also suggest to take a look to nvbit.h (and comments in it) to get -familiar with the NVBit APIs. - -In general all the NVBit tools should meet the following requirements: -1. Include nvbit.h which provide all the main NVBit APIs declarations -2. Link libnvbit.a which provides the core functions of NVBit -3. Not use shared memory -4. Be compiled as a dynamic shared library (so it can be loaded with -LD_PRELOAD) -5. Use nvcc option "-Xptxas -cloning=no" to prevent nvcc from eliminating -device functions -6. Use only 16 registers to limit save/restore overhead (nvcc option -"-maxrregcount=16") - -A typical compilation line for an NVbit tool is the following: - -```no-highlight - nvcc -std=c++11 -I../../src -Xptxas -cloning=no -maxrregcount=16 -Xcompiler - -Wall -arch=sm_35 -O3 -Xcompiler -fPIC -shared - instr_count.cu -L../../src -lnvbit -lcuda -o instr_count.so -``` -The use "-arch=sm_35" is not required, but is typically done so the same -pre-compiled NVBit tool can be used across multiple GPU generations >= SM 3.5. -If that is not a requirement then the NVBit tool can be compiled for a specific -architecture. +* SM compute capability: >= 3.5 && SM <= 7.0 +* Host CPU: x86\_64, ppc64le +* OS: Linux +* GCC version: >= 5.3.0 +* CUDA version: <= 10.1 +* CUDA driver version: <= 430.31 +Currently no Embedded GPUs or ARM hosts are supported.