Skip to content

Commit

Permalink
update README.
Browse files Browse the repository at this point in the history
  • Loading branch information
lcy-seso committed Sep 17, 2024
1 parent 95a4e06 commit f2ac744
Show file tree
Hide file tree
Showing 3 changed files with 43 additions and 3 deletions.
Empty file modified artifacts/run_all_ncu_cutlass.sh
100644 → 100755
Empty file.
4 changes: 1 addition & 3 deletions artifacts/run_all_ncu_pt.sh
Original file line number Diff line number Diff line change
Expand Up @@ -4,12 +4,10 @@ ncu_dir="/home/sosp/env/spack/opt/spack/linux-ubuntu22.04-zen2/gcc-11.4.0/cuda-1
root_dir=$(pwd)
log_dir="$root_dir/logs"
benchmark_dir="FractalTensor/benchmarks"
mha_dir="$benchmark_dir/multi-head_attention/baseline"

bigbird_dir="$benchmark_dir/blocked_sparse_attention/pytorch"

# 2. ncu test the bigbird benchmark
echo "NCU profiling BigBird benchmark"
$ncu_dir/ncu --section "MemoryWorkloadAnalysis" \
--csv --set full python3 $bigbird_dir/main.py > $log_dir/pt_bigbird_ncu.csv

--csv --set full python3 $bigbird_dir/main.py > $log_dir/pt_bigbird_ncu.csv
42 changes: 42 additions & 0 deletions artifacts/table6/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -37,3 +37,45 @@ The profiling results shown in Table 6 are based on [NVIDIA Nsight Compute (ncu)
In the output file of the profile results, you will find the memory traffic behavior of the kernel of interest. You can then further process and analyze these results.

We cannot pre-assign names due to libraries like Triton having internal implementations that call extra kernels. Filtering based on names is not feasible. To address this, we run profiling multiple times (e.g., three) to observe log outputs, then run the tested program several times (e.g., five) to identify patterns. This helps us pinpoint actual kernel calls and post-process the ncu profiling logs to compute network traffic over the memory hierarchy.

### Run the test

We have prepared a testing environment on the provided server to run the tests.

>The following command should be executed in the `artifacts` directory of the project, instead of in the `table6` directory.

1. The script [run_all_ncu_cutlass.sh](../run_all_ncu_cutlass.sh) is used to run the test for Flash Attention 2, implemented in CuTlass.

```bash
sudo -i # Switch to root account
cd /home/sosp/nnfusion/artifacts
./run_all_ncu_cutlass.sh
```

2. The script [run_all_ncu_flash2.sh](../run_all_ncu_pt.sh) is used to run the test for Flash Attention 2, implemented in PyTorch.

```bash
sudo -i # Switch to root account
cd /home/sosp/nnfusion/artifacts
# Choose the environment you want to test
source /home/sosp/env/torch_env.sh
./run_all_ncu_flash2.sh
```

3. The script [run_all_ncu_ft.sh](../run_all_ncu_ft.sh) is used to run the test for BigBird and Flash Attention, implemented in FractalTensor.

```bash
sudo -i # Switch to root account
cd /home/sosp/nnfusion/artifacts
./run_all_ncu_pt.sh
```

4. The script [run_all_ncu_pt.sh](../run_all_ncu_pt.sh) is used to run the test for BigBird, implemented in PyTorch.

```bash
sudo -i # Switch to root account
cd /home/sosp/nnfusion/artifacts
# Choose the environment you want to test
source /home/sosp/env/torch_env.sh
./run_all_ncu_pt.sh
```

0 comments on commit f2ac744

Please sign in to comment.