Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

azslurm does not recognize GPUs for NDv5 #2

Open
vanzod opened this issue Dec 20, 2023 · 1 comment
Open

azslurm does not recognize GPUs for NDv5 #2

vanzod opened this issue Dec 20, 2023 · 1 comment
Assignees

Comments

@vanzod
Copy link
Owner

vanzod commented Dec 20, 2023

azslurm reports zero GPUs for NDv5. As a result gres.conf is not generated after running azslurm scale and no gres are specified for nodes in azure.conf.

# azslurm buckets
NODEARRAY PLACEMENT_GROUP              VM_SIZE                  VCPU_COUNT PCPU_COUNT MEMORY   AVAILABLE_COUNT NCPUS PCPUS NGPUS MEMGB    CCNODEID SLURM_MEMORY
dynamic                                Standard_F2s_v2          2          1          4.00g    50              2     1     0     4.00g             3.00g       
hpc                                    Standard_ND96isr_H100_v5 96         96         1900.00g 2               96    96    0     1900.00g          1805.00g    
hpc       Standard_ND96isr_H100_v5_pg0 Standard_ND96isr_H100_v5 96         96         1900.00g 2               96    96    0     1900.00g          1805.00g    
htc                                    Standard_F2s_v2          2          1          4.00g    10              2     1     0     4.00g             3.00g
@vanzod vanzod self-assigned this Dec 20, 2023
@vanzod
Copy link
Owner Author

vanzod commented Mar 7, 2024

CycleCloud 3.6 partially resolves the issue.
Although now the gres configuration is now created automatically when the Slurm cluster is created, the number of GPUs for NDv5 is still incorrect. Hence the fix_ndv5_gres project has been modified to apply the following workaround:

Azure/cyclecloud-slurm@f0c08b4

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant