diff --git a/artifacts/run_all_ncu_cutlass.sh b/artifacts/run_all_ncu_cutlass.sh old mode 100644 new mode 100755 diff --git a/artifacts/run_all_ncu_pt.sh b/artifacts/run_all_ncu_pt.sh index 7cc9e65ae..c2a6d93db 100755 --- a/artifacts/run_all_ncu_pt.sh +++ b/artifacts/run_all_ncu_pt.sh @@ -4,12 +4,10 @@ ncu_dir="/home/sosp/env/spack/opt/spack/linux-ubuntu22.04-zen2/gcc-11.4.0/cuda-1 root_dir=$(pwd) log_dir="$root_dir/logs" benchmark_dir="FractalTensor/benchmarks" -mha_dir="$benchmark_dir/multi-head_attention/baseline" bigbird_dir="$benchmark_dir/blocked_sparse_attention/pytorch" # 2. ncu test the bigbird benchmark echo "NCU profiling BigBird benchmark" $ncu_dir/ncu --section "MemoryWorkloadAnalysis" \ - --csv --set full python3 $bigbird_dir/main.py > $log_dir/pt_bigbird_ncu.csv - + --csv --set full python3 $bigbird_dir/main.py > $log_dir/pt_bigbird_ncu.csv \ No newline at end of file diff --git a/artifacts/table6/README.md b/artifacts/table6/README.md index e7335d542..96bc4cfd2 100644 --- a/artifacts/table6/README.md +++ b/artifacts/table6/README.md @@ -37,3 +37,45 @@ The profiling results shown in Table 6 are based on [NVIDIA Nsight Compute (ncu) In the output file of the profile results, you will find the memory traffic behavior of the kernel of interest. You can then further process and analyze these results. We cannot pre-assign names due to libraries like Triton having internal implementations that call extra kernels. Filtering based on names is not feasible. To address this, we run profiling multiple times (e.g., three) to observe log outputs, then run the tested program several times (e.g., five) to identify patterns. This helps us pinpoint actual kernel calls and post-process the ncu profiling logs to compute network traffic over the memory hierarchy. + +### Run the test + +We have prepared a testing environment on the provided server to run the tests. + +>The following command should be executed in the `artifacts` directory of the project, instead of in the `table6` directory. + +1. The script [run_all_ncu_cutlass.sh](../run_all_ncu_cutlass.sh) is used to run the test for Flash Attention 2, implemented in CuTlass. + + ```bash + sudo -i # Switch to root account + cd /home/sosp/nnfusion/artifacts + ./run_all_ncu_cutlass.sh + ``` + +2. The script [run_all_ncu_flash2.sh](../run_all_ncu_pt.sh) is used to run the test for Flash Attention 2, implemented in PyTorch. + + ```bash + sudo -i # Switch to root account + cd /home/sosp/nnfusion/artifacts + # Choose the environment you want to test + source /home/sosp/env/torch_env.sh + ./run_all_ncu_flash2.sh + ``` + +3. The script [run_all_ncu_ft.sh](../run_all_ncu_ft.sh) is used to run the test for BigBird and Flash Attention, implemented in FractalTensor. + + ```bash + sudo -i # Switch to root account + cd /home/sosp/nnfusion/artifacts + ./run_all_ncu_pt.sh + ``` + +4. The script [run_all_ncu_pt.sh](../run_all_ncu_pt.sh) is used to run the test for BigBird, implemented in PyTorch. + + ```bash + sudo -i # Switch to root account + cd /home/sosp/nnfusion/artifacts + # Choose the environment you want to test + source /home/sosp/env/torch_env.sh + ./run_all_ncu_pt.sh + ``` \ No newline at end of file