FAQ | Troubleshooting | Glossary
This guide focuses on setting up a cloud Slurm cluster. With cloud, there are decisions that need to be made and certain considerations taken into account. This guide will cover them and their recommended solutions.
There are two deployment methods for cloud cluster management:
This deployment method leverages
GCP Marketplace to make setting up clusters a
breeze without leaving your browser. While this method is simpler and less
flexible, it is great for exploring what slurm-gcp
is!
See the Marketplace Guide for setup instructions and more information.
This deployment method leverages Terraform to deploy
and manage cluster infrastructure. While this method can be more complex, it is
a robust option. slurm-gcp
provides terraform modules that enables you to
create a Slurm cluster with ease.
See the slurm_cluster module for details.
If you are unfamiliar with terraform, then please checkout out the documentation and starter guide to get you familiar.
See the test cluster example for an extensible and robust example. It can be configured to handle creation of all supporting resources (e.g. network, service accounts) or leave that to you. Slurm can be configured with partitions and nodesets as desired.
NOTE: It is recommended to use the slurm_cluster module in your own terraform project. It may be useful to copy and modify one of the provided examples.
Alternatively, see HPC Blueprints for HPC Toolkit examples.