Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add possibility to install a custom compiled slurm package #32

Open
wants to merge 14 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
21 changes: 18 additions & 3 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,18 @@ Slurm
=====

Install and configure a Slurm cluster on RHEL/CentOS or Debian/Ubuntu servers
To configure a custom Debian repository, define `slurm_configure_repos: true`.

Then, define the APT repositories with the URL to the GPG key.

# Example apt repository
slurm_apt_repository: "deb [trusted=yes] http://127.0.0.1/ubuntu/22.04/amd64/ ./"
# Example GPG key
slurm_gpg_key: 'http://127.0.0.1/ubuntu/22.04/amd64/GPG-KEY-slurm'

Define `slurm_apt_priority` to pin the priority of the repository (APT only). This is optional.

slurm_apt_priority: 900

Role Variables
--------------
Expand All @@ -23,9 +35,10 @@ Partitions and nodes go in `slurm_partitions` and `slurm_nodes`, lists of hashes
of that partition or node.

Options for the additional configuration files [acct_gather.conf](https://slurm.schedmd.com/acct_gather.conf.html),
[cgroup.conf](https://slurm.schedmd.com/cgroup.conf.html) and [gres.conf](https://slurm.schedmd.com/gres.conf.html)
may be specified in the `slurm_acct_gather_config`, `slurm_cgroup_config` (both of them hashes) and
`slurm_gres_config` (list of hashes) respectively.
[cgroup.conf](https://slurm.schedmd.com/cgroup.conf.html), [gres.conf](https://slurm.schedmd.com/gres.conf.html)
and [job_container.conf](https://slurm.schedmd.com/job_container.conf.html) may be specified in the
`slurm_acct_gather_config`, `slurm_cgroup_config` (both of them hashes), `slurm_gres_config` (list of hashes) and
`slurm_job_container_config` (hashes) respectively.

Set `slurm_upgrade` to true to upgrade the installed Slurm packages.

Expand Down Expand Up @@ -88,6 +101,8 @@ More extensive example:
SelectType: "select/cons_res"
SelectTypeParameters: "CR_Core"
SlurmctldHost: "slurmctl"
# Use a list to configure master and backups Slurmctld hosts
# SlurmctldHost: ['slurmctl1', 'slurmctl2']
SlurmctldLogFile: "/var/log/slurm/slurmctld.log"
SlurmctldPidFile: "/var/run/slurmctld.pid"
SlurmdLogFile: "/var/log/slurm/slurmd.log"
Expand Down
8 changes: 7 additions & 1 deletion defaults/main.yml
Original file line number Diff line number Diff line change
Expand Up @@ -20,9 +20,16 @@ slurmdbd_service_name: slurmdbd
#Cluster name for slurm config. This is required to correctly setup slurmdbd and attune it to the slurm config.
__slurm_cluster_name: cluster
__cluster_not_setup: true #Default value. Is modified if cluster already exists.
slurm_setup_cluster: false

slurm_start_services: true


# install from custom debian repos
slurm_configure_repos: false
#to setup custom systemd unit files
slurm_configure_systemd: false

__slurm_user_name: "{{ (slurm_user | default({})).name | default('slurm') }}"
# TODO: this could be incorrect, use the group collection from galaxyproject.galaxy
__slurm_group_name: "{{ (slurm_user | default({})).group | default(omit) }}"
Expand Down Expand Up @@ -91,6 +98,5 @@ __slurmdbd_config_default:
AuthType: auth/munge
DbdPort: 6819
SlurmUser: "{{ __slurm_user_name }}"
SlurmctldPidFile: "{{ __slurm_run_dir ~ '/slurmdbd.pid' if __slurm_debian else omit }}"
LogFile: "{{ __slurm_log_dir ~ '/slurmdbd.log' if __slurm_debian else omit }}"
__slurmdbd_config_merged: "{{ __slurmdbd_config_default | combine(slurmdbd_config | default({})) }}"
15 changes: 15 additions & 0 deletions files/slurmctld.service
Original file line number Diff line number Diff line change
@@ -0,0 +1,15 @@
[Unit]
Description=Slurm controller daemon
After=network.target slurmdbd.service munge.service
ConditionPathExists=/etc/slurm/slurm.conf

[Service]
Type=simple
EnvironmentFile=-/etc/sysconfig/slurmctld
ExecStart=/opt/slurm/sbin/slurmctld $SLURMCTLD_OPTIONS
ExecReload=/bin/kill -HUP $MAINPID
PIDFile=/var/run/slurm/slurmctld.pid
RuntimeDirectory=slurm

[Install]
WantedBy=multi-user.target
23 changes: 23 additions & 0 deletions files/slurmd.service
Original file line number Diff line number Diff line change
@@ -0,0 +1,23 @@
[Unit]
Description=Slurm node daemon
After=network.target munge.service
Wants=network-online.target
ConditionPathExists=/etc/slurm/slurm.conf

[Service]
Type=simple
EnvironmentFile=-/etc/default/slurmd
ExecStartPre=/bin/mkdir -p /var/run/slurm
ExecStart=/opt/slurm/sbin/slurmd -D -s $SLURMD_OPTIONS
ExecReload=/bin/kill -HUP $MAINPID
PIDFile=/var/run/slurm/slurmd.pid
KillMode=process
LimitNOFILE=131072
LimitMEMLOCK=infinity
LimitSTACK=infinity
Delegate=yes
TasksMax=20000


[Install]
WantedBy=multi-user.target
18 changes: 18 additions & 0 deletions files/slurmdbd.service
Original file line number Diff line number Diff line change
@@ -0,0 +1,18 @@
[Unit]
Description=Slurm controller daemon
After=network.target munge.service mysql.service mysqld.service mariadb.service
Wants=network-online.target
ConditionPathExists=/etc/slurm/slurm.conf

[Service]
Type=simple
EnvironmentFile=-/etc/default/slurmdbd
ExecStartPre=-/usr/bin/ls /var/lib/slurm/slurmctld
ExecStart=/opt/slurm/sbin/slurmdbd -D -s $SLURMDBD_OPTIONS
ExecReload=/bin/kill -HUP $MAINPID
PIDFile=/var/run/slurm/slurmdbd.pid
LimitNOFILE=65536
RuntimeDirectory=slurm

[Install]
WantedBy=multi-user.target
48 changes: 36 additions & 12 deletions handlers/main.yml
Original file line number Diff line number Diff line change
Expand Up @@ -4,32 +4,56 @@
name: munge
state: restarted

- name: Restart slurmdbd
ansible.builtin.systemd:
name: "{{ slurmdbd_service_name }}"
state: restarted
masked: no
enabled: yes
daemon_reload: yes
when: "(slurm_start_services | bool) and ('slurmservers' in group_names or 'controller' in slurm_roles)"
register: slurmdbd_restart

- name: Reload slurmdbd
ansible.builtin.service:
name: "{{ slurmdbd_service_name }}"
state: reloaded
when: "slurm_start_services and ('slurmdbdservers' in group_names or 'dbd' in slurm_roles)"
when:
- slurm_start_services | bool
- ('slurmdbdservers' in group_names or 'dbd' in slurm_roles)
- slurmdbd_restart is not defined

- name: Restart slurmctld
ansible.builtin.systemd:
name: "{{ slurmctld_service_name }}"
state: restarted
masked: no
enabled: yes
daemon_reload: yes
when: "(slurm_start_services | bool) and ('slurmservers' in group_names or 'controller' in slurm_roles)"
register: slurmctld_restart

- name: Reload slurmctld
ansible.builtin.service:
name: "{{ slurmctld_service_name }}"
state: reloaded
when: "slurm_start_services and ('slurmservers' in group_names or 'controller' in slurm_roles)"
when:
- slurm_start_services | bool
- ('slurmservers' in group_names or 'controller' in slurm_roles)
- slurmctld_restart is not defined

- name: Restart slurmctld
- name: Restart slurmd
ansible.builtin.service:
name: "{{ slurmctld_service_name }}"
name: "{{ slurmd_service_name }}"
state: restarted
when: "slurm_start_services and ('slurmservers' in group_names or 'controller' in slurm_roles)"
when: "(slurm_start_services | bool) and ('slurmexechosts' in group_names or 'exec' in slurm_roles)"
register: slurmd_restart

- name: Reload slurmd
ansible.builtin.service:
name: "{{ slurmd_service_name }}"
state: reloaded
when: "slurm_start_services and ('slurmexechosts' in group_names or 'exec' in slurm_roles)"

- name: Restart slurmd
ansible.builtin.service:
name: "{{ slurmd_service_name }}"
state: restarted
when: "slurm_start_services and ('slurmexechosts' in group_names or 'exec' in slurm_roles)"
when:
- slurm_start_services | bool
- ('slurmexechosts' in group_names or 'exec' in slurm_roles)
- slurmd_restart is not defined
6 changes: 3 additions & 3 deletions meta/main.yml
Original file line number Diff line number Diff line change
@@ -1,11 +1,11 @@
galaxy_info:
role_name: slurm
namespace: galaxyproject
namespace: mila
author: The Galaxy Project
description: Install and manage the Slurm Workload Manager
company: The Galaxy Project
company: Mila
license: MIT
min_ansible_version: 2.5
min_ansible_version: '2.14'
github_branch: main
platforms:
- name: EL
Expand Down
3 changes: 3 additions & 0 deletions tasks/_inc_extra_configs.yml
Original file line number Diff line number Diff line change
Expand Up @@ -16,6 +16,9 @@
- name: gres.conf
config: slurm_gres_config
template: gres.conf.j2
- name: job_container.conf
config: slurm_job_container_config
template: generic.conf.j2
loop_control:
label: "{{ item.name }}"
when: item.config in vars
Expand Down
19 changes: 19 additions & 0 deletions tasks/common.yml
Original file line number Diff line number Diff line change
Expand Up @@ -16,6 +16,25 @@
mode: 0644
when: slurm_rotate_logs

- name: Install plugstack.conf
ansible.builtin.template:
src: "plugstack.conf.j2"
dest: "{{ slurm_config_dir }}/plugstack.conf"
owner: root
group: root
mode: 0444
notify:
- Restart slurmd
- Restart slurmctld

- name: Check that slurm plugin dir exists
ansible.builtin.file:
path: "{{ slurm_config_dir }}/plugstack.conf.d/"
state: directory
notify:
- Restart slurmd
- Restart slurmctld

- name: Install slurm.conf
ansible.builtin.template:
src: "slurm.conf.j2"
Expand Down
12 changes: 8 additions & 4 deletions tasks/main.yml
Original file line number Diff line number Diff line change
Expand Up @@ -3,6 +3,10 @@
- name: Include user creation tasks
ansible.builtin.include_tasks: user.yml
when: slurm_create_user

- name: Include Configure custom repositories
ansible.builtin.include_tasks: repositories-Debian.yml
when: slurm_configure_repos
Comment on lines +6 to +9

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this support RHEL-like distros as well?
At least, this task should not execute when a host's family OS is not Debian.


- name: Include controller installation tasks
ansible.builtin.include_tasks: slurmctld.yml
Expand All @@ -24,22 +28,22 @@
name: "{{ slurmdbd_service_name }}"
enabled: true
state: started
when: "slurm_start_services and ('slurmdbdservers' in group_names or 'dbd' in slurm_roles)"
when: "(slurm_start_services | bool) and ('slurmdbdservers' in group_names or 'dbd' in slurm_roles)"

- name: Ensure slurmctld is enabled and running
ansible.builtin.service:
name: "{{ slurmctld_service_name }}"
enabled: true
state: started
when: "slurm_start_services and ('slurmservers' in group_names or 'controller' in slurm_roles)"
when: "(slurm_start_services | bool) and ('slurmservers' in group_names or 'controller' in slurm_roles)"

- name: Ensure slurmd is enabled and running
ansible.builtin.service:
name: "{{ slurmd_service_name }}"
enabled: true
state: started
when: "slurm_start_services and ('slurmexechosts' in group_names or 'exec' in slurm_roles)"
when: "(slurm_start_services | bool) and ('slurmexechosts' in group_names or 'exec' in slurm_roles)"

- name: Setup cluster on slurmdb
include_tasks: slurmdbd_cluster.yml
when: "slurm_start_services and ('slurmdbdservers' in group_names or 'dbd' in slurm_roles)"
when: "(slurm_setup_cluster | bool ) and (slurm_start_services | bool) and ('slurmdbdservers' in group_names or 'dbd' in slurm_roles)"
26 changes: 26 additions & 0 deletions tasks/repositories-Debian.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,26 @@
---
- name: Install GPG-KEY
ansible.builtin.apt_key:
url: "{{ slurm_gpg_key }}"
keyring: /etc/apt/trusted.gpg.d/slurm.gpg
when: slurm_gpg_key is defined

- name: Configure Slurm repository
ansible.builtin.copy:
content: "{{ slurm_apt_repository }}\n"
dest: /etc/apt/sources.list.d/slurm.list
mode: 0644

- name: Configure APT preferences for Slurm repository
ansible.builtin.copy:
content: |
Package: *
Pin: release o=SLURM
Pin-Priority: {{ slurm_apt_priority }}
dest: /etc/apt/preferences.d/priority-slurm
mode: 0644
when: slurm_apt_priority is defined

- name: "Update repository cache"
ansible.builtin.apt:
update_cache: yes
13 changes: 13 additions & 0 deletions tasks/slurmctld.yml
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,8 @@
ansible.builtin.package:
name: "{{ __slurm_packages.slurmctld }}"
state: "{{ 'latest' if slurm_upgrade else 'present' }}"
notify:
- Restart slurmctld

- name: Create slurm state directory
ansible.builtin.file:
Expand All @@ -25,6 +27,17 @@
state: directory
when: slurm_create_dirs and __slurm_config_merged.SlurmctldLogFile != omit

- name: Add slurmctld service
ansible.builtin.copy:
src: slurmctld.service
dest: /etc/systemd/system/slurmctld.service
owner: root
group: root
mode: 0644
when: slurm_configure_systemd
notify:
- Restart slurmctld

- name: Include config dir creation tasks
ansible.builtin.include_tasks: _inc_create_config_dir.yml
when: slurm_create_dirs
Expand Down
13 changes: 13 additions & 0 deletions tasks/slurmd.yml
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,8 @@
ansible.builtin.package:
name: "{{ __slurm_packages.slurmd }}"
state: "{{ 'latest' if slurm_upgrade else 'present' }}"
notify:
- Restart slurmd

- name: Create slurm spool directory
ansible.builtin.file:
Expand All @@ -25,6 +27,17 @@
state: directory
when: slurm_create_dirs and __slurm_config_merged.SlurmdLogFile != omit

- name: Add slurmd service
ansible.builtin.copy:
src: slurmd.service
dest: /etc/systemd/system/slurmd.service
owner: root
group: root
mode: 0644
when: slurm_configure_systemd
notify:
- Restart slurmd

- name: Include config dir creation tasks
ansible.builtin.include_tasks: _inc_create_config_dir.yml
when: slurm_create_dirs
Expand Down
13 changes: 13 additions & 0 deletions tasks/slurmdbd.yml
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,8 @@
ansible.builtin.package:
name: "{{ __slurm_packages.slurmdbd }}"
state: "{{ 'latest' if slurm_upgrade else 'present' }}"
notify:
- Restart slurmdbd

- name: Create slurm log directory
ansible.builtin.file:
Expand All @@ -17,6 +19,17 @@
- name: Include config dir creation tasks
ansible.builtin.include_tasks: _inc_create_config_dir.yml
when: slurm_create_dirs

- name: Add slurmdbd service
ansible.builtin.copy:
src: slurmdbd.service
dest: /etc/systemd/system/slurmdbd.service
owner: root
group: root
mode: 0644
when: slurm_configure_systemd
notify:
- Restart slurmdbd

- name: Install slurmdbd.conf
ansible.builtin.template:
Expand Down
2 changes: 1 addition & 1 deletion tasks/slurmdbd_cluster.yml
Original file line number Diff line number Diff line change
Expand Up @@ -16,5 +16,5 @@
become: yes
become_user: root
notify:
- reload slurmdbd
- Reload slurmdbd
when: __cluster_not_setup
Loading
Loading