Added Kernel Launch Op names for AMD GPUs #171

spandoesai · 2024-12-03T02:01:24Z

Summary

Added Kernel Op names from HIP such as hipLaunchKernel, hipMemCpy, etc so that Chakra can work with AMD Traces.

Detailed Description

In the Chakra Trace Linker, the is_kernel_launch_op() checks if a given op is a kernel launch op by comparing the op name against a set of cuda specific launch kernel names. In order for Chakra to work with AMD Kineto Traces, this has been updated to include the corresponding HIP launch event names. This has been done in such a way that it does not disrupt the original execution for Nvidia devices.

Test Plan:

The commit passes all the GitHub automation tests.
For verifying correctness, three models were profiled for one inference run on an AMD Instinct MI250 GPU and the chakra linker was used on the generated ET and Kineto traces:

Matrix Multiplication example code from the Chakra Wiki found here
Toy example

class ToyModel(nn.Module):
  def __init__(self):
    super(ToyModel, self).__init__()
    self.layers = nn.ModuleList([nn.Linear(128, 128)])
    self.layers.append(nn.ReLU())
  
  def forward(self, x):
    for layer in self.layers:
      x = layer(x)
      return x

Longformer from Huggingface Transformers

Command:

chakra_trace_link --rank 0 --chakra-host-trace ./pytorch_et.json --chakra-device-trace ./kineto_trace.json --output-file ./linked.json

Final Output: (without any warnings)

...
trace_link.py:49 [INFO]: Linking process successful. Output file is available at ./linked.json.
trace_link.py:50 [INFO]: Please run the chakra_converter for further postprocessing.

github-actions · 2024-12-03T02:01:40Z

MLCommons CLA bot All contributors have signed the MLCommons CLA ✍️ ✅

JoongunPark · 2024-12-04T19:09:46Z

Thank you for sending this PR @spandoesai !
I also believe this PR does not harm existing support for NVIDIA GPUs.

But may I ask you to have a test plan with data or test cases?
Thank you!

spandoesai · 2024-12-05T00:04:13Z

Hey @JoongunPark, thanks for the comment.

I have updated the PR description with the test plan and the cases I have tried the code on. Let me know if you need any more information.

Thanks!

JoongunPark · 2024-12-05T05:02:13Z

Hey @JoongunPark, thanks for the comment.

I have updated the PR description with the test plan and the cases I have tried the code on. Let me know if you need any more information.

Thanks!

Thank you so much, @spandoesai !
It is possible to share the traces you got? If you can, I will test those in my local machine and share the result.
Thank you!

spandoesai · 2024-12-06T19:53:48Z

Hey @JoongunPark, you can find the traces along with the scripts that I used in this zip file:
models.zip

Let me know what you find.

Thanks!

JoongunPark · 2024-12-16T19:41:10Z

Hey @JoongunPark, you can find the traces along with the scripts that I used in this zip file: models.zip

Let me know what you find.

Thanks!

Hi @spandoesai , I got below error while testing.

Installation

git clone https://github.com/spandoesai/chakra/tree/spanmore/pytorch_amd
conda create -n "chakra-pr" python=3.10.0
conda activate chakra-pr
cd chakra
pip install .

cd ..
git clone [email protected]:facebookresearch/param.git
cd param/et_replay
git checkout 7b19f586dd8b267333114992833a0d7e0d601630
pip install .

Command

$ chakra_trace_link --chakra-host-trace mi250_longformer_et.json --chakra-device-trace mi250_longformer_kineto.json --rank 0 --output-file linked_test.json

Error

[2024-12-16 14:22:35,771] trace.py:328 [INFO]: /home/un/Project/PRs/models/models/longformer/logs
[2024-12-16 14:22:35,772] trace.py:474 [INFO]: ranks=[0]
[2024-12-16 14:22:36,201] trace_parser.py:107 [WARNING]: Parsed /home/un/Project/PRs/models/models/longformer/logs/mi250_longformer_kineto.json time = 0.43 seconds
[2024-12-16 14:22:36,237] trace_parser.py:317 [WARNING]: Rounding down ns resolution events due to issue with events overlapping. ts dtype = float64, dur dtype = float64.Please see https://github.com/pytorch/pytorch/pull/122425
[2024-12-16 14:22:36,317] trace_parser.py:430 [WARNING]: Parsed /home/un/Project/PRs/models/models/longformer/logs/mi250_longformer_kineto.json backend=json in 0.55 seconds; current PID:3295
[2024-12-16 14:22:36,344] trace.py:236 [WARNING]: Overall parsing of /home/un/Project/PRs/models/models/longformer/logs/mi250_longformer_kineto.json in 0.57 seconds; current PID:3295
[2024-12-16 14:22:36,349] trace.py:449 [WARNING]: leaving parse_multiple_ranks duration=0.58 seconds
[2024-12-16 14:22:36,349] trace.py:483 [WARNING]: leaving parse_traces duration=0.58 seconds
[2024-12-16 14:22:36,356] critical_path_analysis.py:1467 [WARNING]: Trace does not contain CUDA Synchronization events so the results of analysis could be inaccurate.
[2024-12-16 14:22:36,356] critical_path_analysis.py:1471 [WARNING]: Please see this PR to learn how to enable CUDA sync events https://github.com/pytorch/pytorch/pull/105187
[2024-12-16 14:22:36,356] critical_path_analysis.py:1493 [INFO]: Looking up events under [0, 0) instance(s) of 'ProfilerStep' annotation.
[2024-12-16 14:22:36,357] critical_path_analysis.py:1509 [INFO]: Looking up events within the window (0, 3336410.0)
[2024-12-16 14:22:36,367] critical_path_analysis.py:1533 [INFO]: Clipped dataframe has 13867 events
[2024-12-16 14:22:36,367] critical_path_analysis.py:1540 [INFO]: Preprocessing took 0.01 seconds
Traceback (most recent call last):
  File "/home/un/miniconda3/envs/chakra-pr/bin/chakra_trace_link", line 8, in <module>
    sys.exit(main())
  File "/home/un/miniconda3/envs/chakra-pr/lib/python3.10/site-packages/chakra/src/trace_link/trace_link.py", line 47, in main
    linker.link(args.rank, args.chakra_host_trace, args.chakra_device_trace, args.output_file)
  File "/home/un/miniconda3/envs/chakra-pr/lib/python3.10/site-packages/chakra/src/trace_link/trace_linker.py", line 70, in link
    sync_deps = self.load_sync_dependencies(rank, chakra_device_trace)
  File "/home/un/miniconda3/envs/chakra-pr/lib/python3.10/site-packages/chakra/src/trace_link/trace_linker.py", line 121, in load_sync_dependencies
    cp_graph, success = trace_analysis.critical_path_analysis(
  File "/home/un/miniconda3/envs/chakra-pr/lib/python3.10/site-packages/hta/trace_analysis.py", line 555, in critical_path_analysis
    return CriticalPathAnalysis.critical_path_analysis(
  File "/home/un/miniconda3/envs/chakra-pr/lib/python3.10/site-packages/hta/analyzers/critical_path_analysis.py", line 1542, in critical_path_analysis
    cp_graph = CPGraph(t_copy, t, rank)
  File "/home/un/miniconda3/envs/chakra-pr/lib/python3.10/site-packages/hta/analyzers/critical_path_analysis.py", line 183, in __init__
    self._construct_graph(cg)
  File "/home/un/miniconda3/envs/chakra-pr/lib/python3.10/site-packages/hta/analyzers/critical_path_analysis.py", line 384, in _construct_graph
    self._construct_graph_from_kernels()
  File "/home/un/miniconda3/envs/chakra-pr/lib/python3.10/site-packages/hta/analyzers/critical_path_analysis.py", line 843, in _construct_graph_from_kernels
    .join(q[["queue_length"]], on="index_correlation")
TypeError: 'NoneType' object is not subscriptable

The error is from HTA I think. But if we remove --rank option, will raise an error on Chakra size.

Added Kernel Launch Op names for AMD GPUs

df5204c

spandoesai requested a review from a team as a code owner December 3, 2024 02:01

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Added Kernel Launch Op names for AMD GPUs #171

Added Kernel Launch Op names for AMD GPUs #171

spandoesai commented Dec 3, 2024 •

edited

Loading

github-actions bot commented Dec 3, 2024

JoongunPark commented Dec 4, 2024 •

edited

Loading

spandoesai commented Dec 5, 2024

JoongunPark commented Dec 5, 2024 •

edited

Loading

spandoesai commented Dec 6, 2024

JoongunPark commented Dec 16, 2024 •

edited

Loading

Added Kernel Launch Op names for AMD GPUs #171

Are you sure you want to change the base?

Added Kernel Launch Op names for AMD GPUs #171

Conversation

spandoesai commented Dec 3, 2024 • edited Loading

Summary

Detailed Description

Test Plan:

github-actions bot commented Dec 3, 2024

JoongunPark commented Dec 4, 2024 • edited Loading

spandoesai commented Dec 5, 2024

JoongunPark commented Dec 5, 2024 • edited Loading

spandoesai commented Dec 6, 2024

JoongunPark commented Dec 16, 2024 • edited Loading

Installation

Command

Error

spandoesai commented Dec 3, 2024 •

edited

Loading

JoongunPark commented Dec 4, 2024 •

edited

Loading

JoongunPark commented Dec 5, 2024 •

edited

Loading

JoongunPark commented Dec 16, 2024 •

edited

Loading