-
-
Notifications
You must be signed in to change notification settings - Fork 5.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Usage]: Running Tensor Parallel on TPUs on Ray Cluster #12058
Comments
how did you install vllm? |
Thanks for your quick reply! Since #11695 wasn't merged in 0.6.6.post1 yet, I have a bit of a hack to install the requirements-tpu.txt manually in my docker. Here are the docker steps:
|
Looks like Ray could recognize 'accelerator_type:TPU-V4', but somehow the 'TPU' resource count was not correctly auto detected. Maybe try debug like this: #10155 (comment) |
thanks for the help @ruisearch42, and hope you've been doing well! Some extra things that might help us debug. In the ray remote function itself, I added the following:
For the first line, I get |
Another update is, I spun up a new v4-8 instance (without Ray, I did this manually). It seems like running |
Your current environment
How would you like to use vllm
I want to run tensor-parallel inference using TPUs in a ray cluster. It seems like the Ray cluster picks up the accelerator that we need but then when vllm tries to initialize the ray cluster, it doesn't know that, so it doesn't reuse the TPUs that the cluster has already picked up. I was wondering how people would implement this? Thanks!
Code:
Error:
Before submitting a new issue...
The text was updated successfully, but these errors were encountered: