You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
{{ message }}
This repository has been archived by the owner on Sep 28, 2024. It is now read-only.
I run multiple instances in Azure with Standard_NV and Standard_NC series virtual machines with more than one GPU device assigned to the VM. Without the LIS RPMs, the VM doesn't see all NVIDIA GPU devices assigned to the guest, and if there's a kernel+LIS mismatch, the results can be unpredictable (e.g. 0 GPU devices, or 1 GPU device)
The patching strategy I follow is to adopt new kernels within a week of release, but the LIS packages are usually not available to match new kernels that quickly (I'm in CentOS 7 land)
I was using the OpenLogic repository to manage the LIS installation process, but that repository hasn't been updated since version 4.2.6, and it's no longer possible to reliably execute yum install kmod-microsoft-hyper-v microsoft-hyper-v to install the LIS rpms, because there is a specific set of RPMs for each small patch-level of every kernel.
That brings me to the impracticality of having to download the >400mb .tar or ISO and run the shell scripts to install this set of packages. (also makes it more complicated in an airgapped environment where http://aka.ms/LIS)
My questions about LIS and how it relates to Azure VMs running Linux...
Can LIS be distributed as a single set of RPMs for each operating system distribution?
If so, can the packages be added to the microsoft-prod yum repository?
Can you rely on dkms to compile automatically, based on a kernel change (e.g. 3.10.0-957.10.1 vs 3.10.0-957.12.1) so the current installation could be streamlined?
Can you improve testing for Azure GPU VMs, ensuring that a Standard_NC12 host has 2 reported GPUs when LIS has been installed (e.g. LIS 4.3.0 was broken and only revealed 1 GPU)
The text was updated successfully, but these errors were encountered:
It does require switching to the kernel-azure package and associated kernel-azure-tools package (which appears to be the LIS software)
I was able to see multiple GPUs with no fuss on the latest 3.10.0-957.12.1 kernel release, which was not functional using a vanilla kernel + LIS 4.3.1, so that's an improvement. Is this the proposed path forward for CentOS/RHEL users?
@santoshx - I appreciate the smaller size download. That is definitely helpful. However, I have found that relying on the CentOS Virtualization SIG kernel-azure to be a much more reliable method of ensuring support for multi-GPU Linux VMs on Azure. I now have a more predictable (and less manual) approach. It reduces my dependency on another package, and has offered a cleaner experience with a much faster ability to patch after a CVE kernel update (a few extra days, as opposed to a couple weeks)
Sign up for freeto subscribe to this conversation on GitHub.
Already have an account?
Sign in.
I run multiple instances in Azure with Standard_NV and Standard_NC series virtual machines with more than one GPU device assigned to the VM. Without the LIS RPMs, the VM doesn't see all NVIDIA GPU devices assigned to the guest, and if there's a kernel+LIS mismatch, the results can be unpredictable (e.g. 0 GPU devices, or 1 GPU device)
The patching strategy I follow is to adopt new kernels within a week of release, but the LIS packages are usually not available to match new kernels that quickly (I'm in CentOS 7 land)
I was using the OpenLogic repository to manage the LIS installation process, but that repository hasn't been updated since version 4.2.6, and it's no longer possible to reliably execute
yum install kmod-microsoft-hyper-v microsoft-hyper-v
to install the LIS rpms, because there is a specific set of RPMs for each small patch-level of every kernel.That brings me to the impracticality of having to download the >400mb .tar or ISO and run the shell scripts to install this set of packages. (also makes it more complicated in an airgapped environment where http://aka.ms/LIS)
My questions about LIS and how it relates to Azure VMs running Linux...
microsoft-prod
yum repository?The text was updated successfully, but these errors were encountered: