Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WIP: Add maintenance node playbook and vars file #614

Draft
wants to merge 14 commits into
base: master
Choose a base branch
from

Conversation

sanjaysrikakulam
Copy link
Member

Purpose:

  1. The plan for the EU is to have 2 head nodes in the (near) future. So to keep the galaxy code in sync between them we introduce a new node and call it maintenance.galaxyproject.eu. This new node will host all galaxy configs, codebase, etc. Once in production, the codebase will be synced to NFS and then both head nodes can sync from NFS to serve.
  2. In this scenario we also don't want to have redundant cron jobs, data being pushed to influxdb, clean-up tasks, etc. Therefore, we will use the maintenance node to perform those tasks. However, not all of them can be moved to this new node as some of them can/should only run on the head nodes.

Maintenance VM details:
See here and here

NOTES:

  1. Any role/task that changes the galaxy folder /opt/galaxy has been moved to run only on the maintenance node. Documentation on which role should run where is available in our operations repo (PR: Add head and maintenance node ansible roles doc operations#21)
  2. When things finally move to production we have to remove a whole lot of stuff (deduplication) from the headnodes playbook and keep it simple. The inline comments and the documentation from the operations repo will help us deduplicate.
  3. If I am not wrong the idea is that the maintenance node will sync its /opt/galaxy to NFS and then the head nodes should do a sync from NFS to its /opt/galaxy. Therefore, we would need a new rsync from NFS task on the head nodes such that they can fetch and keep /opt/galaxy in sync. Also, we should add a cron job of the same (just like the one we have that syncs to NFS).

Changes compared to the head nodes playbooks:

  1. Updated the versions of usegalaxy_eu.handy collection, and usegalaxy_eu.gie_proxy role in requirements.yaml.
  2. dj-wasabi.telegraf role: Telegraf changed the public key, so public key URL is updated
  3. Added a new host/group to the ansible inventory
  4. Inline documentations are added to the maintenance playbook and its vars file in group_vars folder.
    1. Many cron jobs, telegraf tasks, cleanup tasks, condor related tasks and roles are commented. These should be uncommented once moved to the production. These are commented currently to avoid redundancy.
    2. HTCondor needs to be installed and configured. @mira-miracoli please add your config and then uncomment role usegalaxy_eu.htcondor. Also update the requirements.yaml file with the latest version of that role.
  5. To reflect the current head node's (sn06) galaxy's venv, we use miniconda on the maintenance node and create a _galaxy_ conda env with Python 3.8.8 (this is the version that's currently installed on sn06 galaxy's venv) and then use the virtualenv and the python command to create galaxy's venv in /opt/galaxy/venv. Also, we use the same for other Ansible roles (for example: usegalaxy_eu.gie_proxy).
  6. Updated Node version to 18.14.0 for usegalaxy_eu.gie_proxy. Also introduced a variable that would enable us to install some parts of it on the maintenance node and some parts of it on the head node (for example, the systemd service unit file. This should be available and enabled only on the head nodes). So, on the maintenance node the variable gie_proxy_install should be set to true and the gie_proxy_setup_service variable to none. On our head nodes playbook we should set the following gie_proxy_install: false, gie_proxy_setup_service: systemd, gie_proxy_setup_nodejs: none.
  7. Updated the path of the toolbox/filters templates. Look here. We have moved it from "{{ galaxy_server_dir }}/lib/galaxy/tools/toolbox" to here "{{ galaxy_server_dir }}/lib/galaxy/tool_util/toolbox".
  8. Updated ansible role bashrc. PR: Update bashrc and postgres-connection roles #610
  9. Other related PRs:
    1. Fix file ownership of the compliance log #606
    2. Allow maintenance node to communicate to galaxy DB #600

Merge this PR only after these:

  1. Add Ansible-Galaxy release workflow ansible-gie-proxy#6
  2. Update bashrc and postgres-connection roles #610

Additionally fix telegraf's repo key, and update versions of a couple of
roles in the requirements file
Add role to the playbook
Add role's pip dependencies to the group_vars file
Pin TPV version (refer commiti: 25fd0d5)
1. Updates maintenance node playbook to new galaxy release
2. Adds new rsync role that would perform rsync to the NFS share and to
the headnodes
Updates the galaxy-rsync script
Includes vault file to the playbook
Uncomment telegraf role so VGCN-monitoring role can be installed and
comment the telegraf_plugins_extra dict in group_vars as these plugins
are currently active in the headnodes playbook. When its
disabled/removed there we can enable them here
Copy link
Contributor

@mira-miracoli mira-miracoli left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me, let's try it!

group_vars/maintenance.yml Show resolved Hide resolved
group_vars/maintenance.yml Outdated Show resolved Hide resolved
group_vars/maintenance.yml Outdated Show resolved Hide resolved
group_vars/maintenance.yml Outdated Show resolved Hide resolved
group_vars/maintenance.yml Outdated Show resolved Hide resolved
group_vars/maintenance.yml Show resolved Hide resolved
group_vars/maintenance.yml Show resolved Hide resolved
... requirements.yaml file
group_vars/maintenance.yml Outdated Show resolved Hide resolved
group_vars/maintenance.yml Outdated Show resolved Hide resolved
@sanjaysrikakulam sanjaysrikakulam changed the title Add maintenance node playbook and vars file WIP: Add maintenance node playbook and vars file May 25, 2023
@sanjaysrikakulam
Copy link
Member Author

Converting this PR to Draft as we have decided to prepare a minimalistic playbook that only contains the cron tasks and stuff from the sn06.yml instead of setting up Galaxy on the maintenance node. We will revisit this PR in the future.

@sanjaysrikakulam sanjaysrikakulam marked this pull request as draft May 25, 2023 06:57
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants