Reduce runtime dependency on torch #5490

stephen-huan · 2024-12-25T04:59:46Z

Currently, torch is required for importing triton and performing autotuning. This seems like a relatively heavy runtime dependency in the context of the cpu backend, as numpy can easily be used instead.

Opening here as suggested in triton-lang#205 to minimize future merge conflicts.

Ideally there would be a test for this, but with the cpu backend out-of-tree this seems hard to test.

See also triton-lang#204, triton-lang#205.

New contributor declaration

I am not making a trivial change, such as fixing a typo in a comment.
I have written a PR description following these
rules.
I have run pre-commit run --from-ref origin/main --to-ref HEAD.
Select one of the following.
- I have added tests.
  - /test for lit tests
  - /unittest for C++ tests
  - /python/test for end-to-end tests
- This PR does not need a test because not (currently) easy to test and basic functionality should be covered by existing tests.
Select one of the following.
- I have not added any lit tests.
- The lit tests I have added follow these best practices,
  including the "tests should be minimal" section. (Usually running Python code
  and using the instructions it generates is not minimal.)

Jokeren · 2024-12-25T15:05:20Z

As I have mentioned, what you submit now would be ad-hoc. I don't suggest creating PRs like this at this moment.

stephen-huan · 2024-12-25T23:49:51Z

Assuming the amd backend works without torch, would this PR be acceptable if a test was added that removes torch for the amd backend and sees that the execution / autotuning still works as expected?

As I have mentioned, what you submit now would be ad-hoc.

I agree, but I think ad-hoc changes are only bad if they're frequently subject to change or impossible to test. I think this change can be tested (and can write tests if necessary), and is morally similar to removing the circular dependencies with torch by moving all global torch imports to local (which as far as I know is also not explicitly tested and simply fixed whenever it occurs). The amount of changes in this PR is relatively small (relative to the number of torch imports in the code overall), so I doubt that this PR will change substantially with future changes.

Jokeren · 2024-12-26T00:39:01Z

python/triton/testing.py

@@ -110,7 +112,7 @@ def do_bench(fn, warmup=25, rep=100, grad_to_none=None, quantiles=None, return_m
    :param return_mode: The statistical measure to return. Options are "min", "max", "mean", "median", or "all" Default is "mean".    :type return_mode: str
    """
    assert return_mode in ["min", "max", "mean", "median", "all"]
-    import torch
+    import numpy as np

    di = runtime.driver.active.get_device_interface()


It doesn't really remove the torch dependency.

get_device_interface still imports troch

Depends on the driver. For cpu, it doesn't need torch, since there's currently only one device (the host cpu).

hmm. I thought we are still going to import torch for the CPU backend anyway.

@minjang can you confirm that you want the CPU runtime to be completely independent of torch?

@Jokeren No, triton-cpu doesn't have a plan or need to reduce/remove torch dependency.

@stephen-huan, Okay, I understand your intention. It's okay to reduce the dependency only for third_party/cpu. But, as you already had to change python/triton/testing.py, you will need to change many parts of the code outside third_party/cpu. And, due to (painful) rebasing and resolving merge conflicts, triton-cpu strongly wants to avoid such code changes. For example, test_core.py has heavy mixed usages of torch and numpy. Even if this is for testing, we still anyhow have both torch and numpy dependencies. So, right now, I agree with @Jokeren.

Jokeren · 2024-12-26T00:40:43Z

python/triton/testing.py

@@ -43,6 +44,7 @@ def do_bench_cudagraph(fn, rep=20, grad_to_none=None, quantiles=None, return_mod
    :type return_mode: str
    """
    import torch


torch is still here.

I would suggest you think more about what you actually want to achieve.

Is it really removing all torch dependencies or just making triton-cpu works better? The latter is probably more controllable and much easier.

Originally this was a triton-cpu PR, but they told me to upstream everything that wasn't cpu backend specific. See triton-lang#205. So the goal of this PR is to just (1) import triton (2) execute kernels and (3) autotune with the cpu backend without torch. Of course, removing torch entirely from nvidia/amd is currently out of scope because torch is used as a convenient gpu library from python.

Sorry, there seems to be some confusion around #5493, which I submitted around the same time as this PR. #5493 is more of a feature request/tracking issue where I propose (1) executing kernels on jax/numpy arrays directly, without needing a Pointer shim and (2) having the interpreter work on jax/numpy arrays without the Data + Pointer shims. This PR addresses neither of these concerns.

The goal of this PR is simply to be able to use triton on the cpu backend without importing torch.

In general, I'm very supportive about improving triton-cpu compatibility.

If this is the case, why there's torch->numpy replacement in this PR? Are these files not working in the triton-cpu repo?

Well, they work if torch is imported. But without torch, of course they don't work. And this doesn't meaningfully change anything for the gpu backends, since the statistics computations are done on cpu anyways and the numpy methods have (roughly) the same semantics as the torch methods.

With torch->numpy replacement, this package ends up having mixed use of torch and numpy. Seems to me it's in a middle state that does not completely address the problem if triton-cpu wants to be independent of torch.

Thus, I'm not quite sure why the changes are necessary.

Let's wait for triton-cpu maintainers to get involved. We probably need more context for further discussion.

Reduce runtime dependency on torch

4dd578c

stephen-huan requested review from antiagainst, zhanglx13 and ptillet as code owners December 25, 2024 04:59

This was referenced Dec 25, 2024

Reduce runtime dependency on torch triton-lang/triton-cpu#205

Open

A lot of boilerplate for TRITON_INTERPRET=1 without torch #5493

Open

Jokeren reviewed Dec 26, 2024

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Reduce runtime dependency on torch #5490

Reduce runtime dependency on torch #5490

stephen-huan commented Dec 25, 2024

Jokeren commented Dec 25, 2024

stephen-huan commented Dec 25, 2024

Jokeren Dec 26, 2024

stephen-huan Dec 26, 2024 •

edited

Loading

Jokeren Dec 26, 2024

minjang Dec 26, 2024

Jokeren Dec 26, 2024 •

edited

Loading

stephen-huan Dec 26, 2024 •

edited

Loading

stephen-huan Dec 26, 2024 •

edited

Loading

Jokeren Dec 26, 2024

stephen-huan Dec 26, 2024

Jokeren Dec 26, 2024

Reduce runtime dependency on torch #5490

Are you sure you want to change the base?

Reduce runtime dependency on torch #5490

Conversation

stephen-huan commented Dec 25, 2024

New contributor declaration

Jokeren commented Dec 25, 2024

stephen-huan commented Dec 25, 2024

Jokeren Dec 26, 2024

Choose a reason for hiding this comment

stephen-huan Dec 26, 2024 • edited Loading

Choose a reason for hiding this comment

Jokeren Dec 26, 2024

Choose a reason for hiding this comment

minjang Dec 26, 2024

Choose a reason for hiding this comment

Jokeren Dec 26, 2024 • edited Loading

Choose a reason for hiding this comment

stephen-huan Dec 26, 2024 • edited Loading

Choose a reason for hiding this comment

stephen-huan Dec 26, 2024 • edited Loading

Choose a reason for hiding this comment

Jokeren Dec 26, 2024

Choose a reason for hiding this comment

stephen-huan Dec 26, 2024

Choose a reason for hiding this comment

Jokeren Dec 26, 2024

Choose a reason for hiding this comment

stephen-huan Dec 26, 2024 •

edited

Loading

Jokeren Dec 26, 2024 •

edited

Loading

stephen-huan Dec 26, 2024 •

edited

Loading

stephen-huan Dec 26, 2024 •

edited

Loading