Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Is there an ability to automatically assign vf with GPU affinity to pods? #736

Closed
cyclinder opened this issue Jul 16, 2024 · 6 comments
Closed

Comments

@cyclinder
Copy link
Contributor

image

If the gpu and nic are on the same PCIe bridge or their topology distance is at least PHB, then communication between them can be accelerated by enabling GPU Direct RDMA.

@SchSeba
Copy link
Collaborator

SchSeba commented Jul 17, 2024

that is a kubernetes feature. you can configure device manager and check the topology type

https://kubernetes.io/docs/tasks/administer-cluster/topology-manager/#policy-single-numa-node

@cyclinder
Copy link
Contributor Author

Thanks for your reply, I think even if GPU and Nic are in the same NUMA nodes, they may still cross the PCIe bridge, as shown in the figure above, GPU0 and mlx5_3, so in this case, we cannot enable GPU Direct RDMA. The same NUMA nodes may be a large distance, we may need a smaller distance.

@adrianchiris
Copy link
Collaborator

currently there is no solution that im aware of which takes into account PCIe topology.

DRA (Dynamic Resource Allocation) aims to solve that, but there is still a way to go....

@aojea
Copy link

aojea commented Nov 15, 2024

This is on DRA roadmap as @adrianchiris mentions, it will be beta in 1.32

@SchSeba
Copy link
Collaborator

SchSeba commented Dec 11, 2024

Hi @cyclinder, do you think we can close this issue?
I don't think there is something the sriov-operator can do about this.

I think for now the only solution is to manually have a resourcePool for every pcie and do a static request for GPU pool and VF pool from the same pcie :(

@cyclinder
Copy link
Contributor Author

@SchSeba Yes we can close the issue, we are planning to implement this in these days.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants