Update README.md

NVlabs · Oct 21, 2019 · a5a2bd1 · a5a2bd1
1 parent 8dcdbc9
commit a5a2bd1
Showing 1 changed file with 9 additions and 149 deletions.
diff --git a/README.md b/README.md
@@ -39,155 +39,15 @@ function.
 
 Because NVBit does not require application source code, any pre-compiled GPU 
 application should work regardless of which compiler (or version) has been 
-used (i.e. nvcc, pgicc, etc). NVBit supports Kepler, Maxwell, Pascal and 
-Volta architectures (specifically SM >= 3.5 && SM <= 7.0) and it is developed 
-for compute (tested with CUDA and OpenACC) under x86-64 and PPC64 Linux. 
+used (i.e. nvcc, pgicc, etc).
 
-## Getting Started with NVBit
+## Requirements
 
-NVBit is provided in a .tgz file containing this README file and three folders:
-1. A ```core``` folder, which contains the main static library 
-```libnvbit.a``` and various headers files (among which the ```nvbit.h``` 
-file which contains all the main NVBit APIs declarations).
-2. A ```tools``` folder, which contains various source code examples of NVBit 
-tools. A new user of NVBit, after familiarizing with these pre-existing tools 
-will typically make a copy of one of them and modify appropriately.
-3. A ```test-apps``` folder, which contains a simple application that can be 
-used to test NVBit tools. There is nothing special about this application, it 
-is a simple vector addition program.
-
-NVBit tools must be compiled with nvcc version >= 8.0 , using GCC 
-version >= 5.3.0. 
-
-To compile the NVBit tools simply type ```make``` from  inside the ```tools``` 
-folder (make sure ```nvcc``` is in your PATH).
-Compile the test application by typing ```make``` inside the ```test-apps``` 
-folder.
-
-## Using an NVBit tool
-
-Before running an NVBit tool, make sure ```nvdisasm``` is in your PATH. In 
-Ubuntu distributions this is typically done by adding /usr/local/cuda/bin or 
-/usr/local/cuda-"version"/bin to the PATH environment variable.
-
-To use an NVBit tool we simply LD_PRELOAD the tool before the application 
-execution command. 
-
-For instance if the application vector add runs natively as: 
-
-```
-./test-apps/vectoradd/vectoradd
-``` 
-
-and produces the following output: 
-
-```
-Final sum = 100000.000000; sum/n = 1.000000 (should be ~1)
-```
-
-we would use the NVBit tool which performs instruction count as follow:
-
-```
-LD_PRELOAD=./tools/instr_count/instr_count.so ./test-apps/vectoradd/vectoradd
-```
-
-The output for this command should be the following:
-
-```no-highlight
-------------- NVBit (NVidia Binary Instrumentation Tool) Loaded --------------
-NVBit core environment variables (mostly for nvbit-devs):
-             VERBOSE = 0 - if set, enables verbose messages level  (1,2,3,...)
-           DUMP_SASS = 0 - if set, dumps SASS of original and patched kernel
-           NOINSPECT = 0 - if set, skips function inspection and instrumentation
-            WARNINGS = 0 - if set, enable warning prints
-            NVDISASM = nvdisasm - override default nvdisasm found in PATH
-            NOBANNER = 0 - if set, does not print this banner
------------------------------------------------------------------------------
-         INSTR_BEGIN = 0 - Beginning of the instruction interval where to apply instrumentation
-           INSTR_END = 4294967295 - End of the instruction interval where to apply instrumentation
-        KERNEL_BEGIN = 0 - Beginning of the kernel launch interval where to apply instrumentation
-          KERNEL_END = 4294967295 - End of the kernel launch interval where to apply instrumentation
-    COUNT_WARP_LEVEL = 1 - Count warp level or thread level instructions
-    EXCLUDE_PRED_OFF = 0 - Exclude predicated off instruction from count
-        TOOL_VERBOSE = 0 - Enable verbosity inside the tool
-----------------------------------------------------------------------------------------------------
-kernel 0 - vecAdd(double*, double*, double*, int) - #thread-blocks 98,  kernel instructions 50077, total instructions 50077
-Final sum = 100000.000000; sum/n = 1.000000 (should be ~1)
-```
-
-As we can see, before the original output, there is a print showing the kernel 
-call index "0", the kernel function prototype 
-"vecAdd(double*, double*, double*, int)", total number of thread blocks launched
- in this kernel "98", the number of executed instructions in the kernel "50077", 
- and for the all application "50077".
-
-When the application starts, also two banners are printed showing the environment
-variables (and their current values) that can be used to control the NVBit core 
-or the specific NVBit Tool.
-Mostly of the NVBit core environment variable are used for core 
-debugging/development purposes. 
-Set the environment value NOBANNER=1 to disable the core banner if that 
-information is not wanted. 
-
-### Examples of NVBit Tools
-
-As explained above, inside the ```tools``` folder there are few example of 
-NVBit tools. Rather than describing all of them in this README file we refer 
-to comment in the source code of each one them. 
-
-The natural order (in terms of complexity) to learn these tools is:
-
-1. instr_count: Perform thread level instruction count. Specifically, a 
-function is injected before each SASS instruction. Inside this function the 
-total number of active threads in a warp is computed and a global counter is 
-incremented.
-
-2. opcode_hist: Generate an histogram of all executed instructions.
-
-3. mov_replace: Replace each SASS instruction of type MOV with an equivalent 
-function. This tool make use of the read/write register functionality within 
-the instrumentation function.
-
-4. instr_countbb: Perform thread level instruction count by instrumenting 
-basic blocks. The final result is the same as instr_count, but mush faster 
-since less instructions are instrumented (only the first instruction in each 
-basic block is instrumented and the counter).
-
-5. mem_printf: Print memory reference addresses for each global LOAD/STORE 
-using the GPU side printf. This is accomplished by injecting an 
-instrumentation function before each SASS instruction performing global 
-LOAD/STORE, passing the register values and immediate used by that 
-instruction (used to compute the resulting memory address) and performing the 
-printf. 
-
-6. mem_trace: Trace memory reference addresses. This NVBit tool works 
-similarly to the above example but instead of using a GPU side printf it uses 
-a communication channel (provided in utils/channel.hpp) to transfer data from 
-GPU-to-CPU and it performs the printf on the CPU side.
-
-We also suggest to take a look to nvbit.h (and comments in it) to get 
-familiar with the NVBit APIs.
-
-In general all the NVBit tools should meet the following requirements:
-1. Include nvbit.h which provide all the main NVBit APIs declarations
-2. Link libnvbit.a which provides the core functions of NVBit
-3. Not use shared memory
-4. Be compiled as a dynamic shared library (so it can be loaded with 
-LD_PRELOAD)
-5. Use nvcc option  "-Xptxas -cloning=no" to prevent nvcc from eliminating 
-device functions
-6. Use only 16 registers to limit save/restore overhead (nvcc option 
-"-maxrregcount=16")
-
-A typical compilation line for an NVbit tool is the following:
-
-```no-highlight
-	nvcc -std=c++11 -I../../src -Xptxas -cloning=no -maxrregcount=16 -Xcompiler 
-			-Wall -arch=sm_35 -O3 -Xcompiler -fPIC -shared  
-			instr_count.cu -L../../src -lnvbit -lcuda -o instr_count.so
-```
-The use "-arch=sm_35" is not required, but is typically done so the same 
-pre-compiled NVBit tool can be used across multiple GPU generations >= SM 3.5. 
-If that is not a requirement then the NVBit tool can be compiled for a specific 
-architecture.
+* SM compute capability:              >= 3.5 && SM <= 7.0
+* Host CPU:                           x86\_64, ppc64le
+* OS:                                 Linux
+* GCC version:                        >= 5.3.0
+* CUDA version:                       <= 10.1
+* CUDA driver version:                <= 430.31
 
+Currently no Embedded GPUs or ARM hosts are supported.