Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added Kernel Launch Op names for AMD GPUs #171

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

spandoesai
Copy link

@spandoesai spandoesai commented Dec 3, 2024

Summary

Added Kernel Op names from HIP such as hipLaunchKernel, hipMemCpy, etc so that Chakra can work with AMD Traces.

Detailed Description

In the Chakra Trace Linker, the is_kernel_launch_op() checks if a given op is a kernel launch op by comparing the op name against a set of cuda specific launch kernel names. In order for Chakra to work with AMD Kineto Traces, this has been updated to include the corresponding HIP launch event names. This has been done in such a way that it does not disrupt the original execution for Nvidia devices.

Test Plan:

The commit passes all the GitHub automation tests.
For verifying correctness, three models were profiled for one inference run on an AMD Instinct MI250 GPU and the chakra linker was used on the generated ET and Kineto traces:

  1. Matrix Multiplication example code from the Chakra Wiki found here
  2. Toy example
class ToyModel(nn.Module):
  def __init__(self):
    super(ToyModel, self).__init__()
    self.layers = nn.ModuleList([nn.Linear(128, 128)])
    self.layers.append(nn.ReLU())
  
  def forward(self, x):
    for layer in self.layers:
      x = layer(x)
      return x
  1. Longformer from Huggingface Transformers

Command:

chakra_trace_link --rank 0 --chakra-host-trace ./pytorch_et.json --chakra-device-trace ./kineto_trace.json --output-file ./linked.json

Final Output: (without any warnings)

...
trace_link.py:49 [INFO]: Linking process successful. Output file is available at ./linked.json.
trace_link.py:50 [INFO]: Please run the chakra_converter for further postprocessing. 

@spandoesai spandoesai requested a review from a team as a code owner December 3, 2024 02:01
Copy link

github-actions bot commented Dec 3, 2024

MLCommons CLA bot All contributors have signed the MLCommons CLA ✍️ ✅

@JoongunPark
Copy link
Contributor

JoongunPark commented Dec 4, 2024

Thank you for sending this PR @spandoesai !
I also believe this PR does not harm existing support for NVIDIA GPUs.

But may I ask you to have a test plan with data or test cases?
Thank you!

@spandoesai
Copy link
Author

Hey @JoongunPark, thanks for the comment.

I have updated the PR description with the test plan and the cases I have tried the code on. Let me know if you need any more information.

Thanks!

@JoongunPark
Copy link
Contributor

JoongunPark commented Dec 5, 2024

Hey @JoongunPark, thanks for the comment.

I have updated the PR description with the test plan and the cases I have tried the code on. Let me know if you need any more information.

Thanks!

Thank you so much, @spandoesai !
It is possible to share the traces you got? If you can, I will test those in my local machine and share the result.
Thank you!

@spandoesai
Copy link
Author

Hey @JoongunPark, you can find the traces along with the scripts that I used in this zip file:
models.zip

Let me know what you find.

Thanks!

@JoongunPark
Copy link
Contributor

JoongunPark commented Dec 16, 2024

Hey @JoongunPark, you can find the traces along with the scripts that I used in this zip file: models.zip

Let me know what you find.

Thanks!

Hi @spandoesai , I got below error while testing.

Installation

git clone https://github.com/spandoesai/chakra/tree/spanmore/pytorch_amd
conda create -n "chakra-pr" python=3.10.0
conda activate chakra-pr
cd chakra
pip install .

cd ..
git clone [email protected]:facebookresearch/param.git
cd param/et_replay
git checkout 7b19f586dd8b267333114992833a0d7e0d601630
pip install .

Command

$ chakra_trace_link --chakra-host-trace mi250_longformer_et.json --chakra-device-trace mi250_longformer_kineto.json --rank 0 --output-file linked_test.json

Error

[2024-12-16 14:22:35,771] trace.py:328 [INFO]: /home/un/Project/PRs/models/models/longformer/logs
[2024-12-16 14:22:35,772] trace.py:474 [INFO]: ranks=[0]
[2024-12-16 14:22:36,201] trace_parser.py:107 [WARNING]: Parsed /home/un/Project/PRs/models/models/longformer/logs/mi250_longformer_kineto.json time = 0.43 seconds
[2024-12-16 14:22:36,237] trace_parser.py:317 [WARNING]: Rounding down ns resolution events due to issue with events overlapping. ts dtype = float64, dur dtype = float64.Please see https://github.com/pytorch/pytorch/pull/122425
[2024-12-16 14:22:36,317] trace_parser.py:430 [WARNING]: Parsed /home/un/Project/PRs/models/models/longformer/logs/mi250_longformer_kineto.json backend=json in 0.55 seconds; current PID:3295
[2024-12-16 14:22:36,344] trace.py:236 [WARNING]: Overall parsing of /home/un/Project/PRs/models/models/longformer/logs/mi250_longformer_kineto.json in 0.57 seconds; current PID:3295
[2024-12-16 14:22:36,349] trace.py:449 [WARNING]: leaving parse_multiple_ranks duration=0.58 seconds
[2024-12-16 14:22:36,349] trace.py:483 [WARNING]: leaving parse_traces duration=0.58 seconds
[2024-12-16 14:22:36,356] critical_path_analysis.py:1467 [WARNING]: Trace does not contain CUDA Synchronization events so the results of analysis could be inaccurate.
[2024-12-16 14:22:36,356] critical_path_analysis.py:1471 [WARNING]: Please see this PR to learn how to enable CUDA sync events https://github.com/pytorch/pytorch/pull/105187
[2024-12-16 14:22:36,356] critical_path_analysis.py:1493 [INFO]: Looking up events under [0, 0) instance(s) of 'ProfilerStep' annotation.
[2024-12-16 14:22:36,357] critical_path_analysis.py:1509 [INFO]: Looking up events within the window (0, 3336410.0)
[2024-12-16 14:22:36,367] critical_path_analysis.py:1533 [INFO]: Clipped dataframe has 13867 events
[2024-12-16 14:22:36,367] critical_path_analysis.py:1540 [INFO]: Preprocessing took 0.01 seconds
Traceback (most recent call last):
  File "/home/un/miniconda3/envs/chakra-pr/bin/chakra_trace_link", line 8, in <module>
    sys.exit(main())
  File "/home/un/miniconda3/envs/chakra-pr/lib/python3.10/site-packages/chakra/src/trace_link/trace_link.py", line 47, in main
    linker.link(args.rank, args.chakra_host_trace, args.chakra_device_trace, args.output_file)
  File "/home/un/miniconda3/envs/chakra-pr/lib/python3.10/site-packages/chakra/src/trace_link/trace_linker.py", line 70, in link
    sync_deps = self.load_sync_dependencies(rank, chakra_device_trace)
  File "/home/un/miniconda3/envs/chakra-pr/lib/python3.10/site-packages/chakra/src/trace_link/trace_linker.py", line 121, in load_sync_dependencies
    cp_graph, success = trace_analysis.critical_path_analysis(
  File "/home/un/miniconda3/envs/chakra-pr/lib/python3.10/site-packages/hta/trace_analysis.py", line 555, in critical_path_analysis
    return CriticalPathAnalysis.critical_path_analysis(
  File "/home/un/miniconda3/envs/chakra-pr/lib/python3.10/site-packages/hta/analyzers/critical_path_analysis.py", line 1542, in critical_path_analysis
    cp_graph = CPGraph(t_copy, t, rank)
  File "/home/un/miniconda3/envs/chakra-pr/lib/python3.10/site-packages/hta/analyzers/critical_path_analysis.py", line 183, in __init__
    self._construct_graph(cg)
  File "/home/un/miniconda3/envs/chakra-pr/lib/python3.10/site-packages/hta/analyzers/critical_path_analysis.py", line 384, in _construct_graph
    self._construct_graph_from_kernels()
  File "/home/un/miniconda3/envs/chakra-pr/lib/python3.10/site-packages/hta/analyzers/critical_path_analysis.py", line 843, in _construct_graph_from_kernels
    .join(q[["queue_length"]], on="index_correlation")
TypeError: 'NoneType' object is not subscriptable

The error is from HTA I think. But if we remove --rank option, will raise an error on Chakra size.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants