Skip to content

Commit

Permalink
Wiki with GitHub Pages (#103)
Browse files Browse the repository at this point in the history
* Added .gitignore

* Build wiki

* Added build

* Update ci.yaml

* Delete site directory

---------

Co-authored-by: Bohdan Legacy Laptop <[email protected]>
  • Loading branch information
Bohsav and Bohdan Legacy Laptop authored Oct 23, 2024
1 parent 8e4d368 commit a2dc9ee
Show file tree
Hide file tree
Showing 41 changed files with 1,807 additions and 0 deletions.
30 changes: 30 additions & 0 deletions .github/workflows/ci.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,30 @@
name: ci
on:
push:
branches:
- github-pages
- master
- main
permissions:
contents: write
jobs:
deploy:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- name: Configure Git Credentials
run: |
git config user.name github-actions[bot]
git config user.email 41898282+github-actions[bot]@users.noreply.github.com
- uses: actions/setup-python@v5
with:
python-version: 3.x
- run: echo "cache_id=$(date --utc '+%V')" >> $GITHUB_ENV
- uses: actions/cache@v4
with:
key: mkdocs-material-${{ env.cache_id }}
path: .cache
restore-keys: |
mkdocs-material-
- run: pip install mkdocs-material
- run: mkdocs gh-deploy --force
Empty file added .gitignore
Empty file.
64 changes: 64 additions & 0 deletions docs/Account-Codes.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,64 @@
In order to run jobs on Trixie, users need to specify which SLURM Account Code should be used for billing. This is handled by adding a line in the SLURM submission script which identifies the account

```
SBATCH --account=account_code
```

Users must be authorized to charge an account before they can use it.

# AI4Design
## ai4d-bio-01
AI for Drug Design, NRC-PI:Tchagang, Alain
## ai4d-bio-02
Precision Discovery in Bio Systems, NRC-PI:Shao, Xiaojian
## ai4d-bio-03
Multi-Targeted Therapeutics, NRC-PI:Fauteux, François
## ai4d-bio-04a
Protein Design Drugs & Gene, NRC-PI:Paquet, Eric
## ai4d-bio-04b
AI Simulation of Bio Systems, NRC-PI:Cuperlovic-Culf, Miroslav
## ai4d-bio-04c
Digital-Twining of Bioreactor, NRC-PI:Belacel, Nabil
## ai4d-core-01
AI-based Shape Optimization, NRC-PI:Shu, Chang
## ai4d-core-05
Design of Superconductive Tapes, NRC-PI:Valdes, Julio
## ai4d-core-06
Intelligent Design, NRC-PI:Guo, Hong Yu
## ai4d-mat-02
Automated Material Synthesis using Deep Reinforcement Learning, NRC-PI:Tamblyn, Isaac
## ai4d-mat-03
Simulation & Design of Materials, NRC-PI:Tchagang, Alain
## ai4d-mat-04
Spectroscopic Signatures, NRC-PI:Tamblyn, Isaac
## ai4d-photo-01a
Miniaturization HP Components, NRC-PI:Grinberg, Yuri
## ai4d-photo-01c
AI-assisted Inverse Design, NRC-PI:Grinberg, Yuri

# COVID
## covid-01
NRC-PI:Ebadi, Ashkan; Xi, Pengchengi

# DT Digital Technologies / Technologies Numériques

## dt-dac
Data Analytics Centre / Données Analytiques

## dt-dscs
Data Science for Complex Systems / Science des Données pour les Systèmes Complexes

## dt-mtp
Multilingual Text Processing / Traitement Multilingue de Texte

## dt-ta
Text Analytics / Analyse de textes

# SDT Security and Disruptive Technologies / Technologies de sécurité et de rupture
## sdt-clean
Computational Laboratory for Energy And Nanoscience





49 changes: 49 additions & 0 deletions docs/Automatically-Resuming-Requeueing.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,49 @@
# How to I get my job to requeue after my time limit?

Here’s a skeleton of what our jobs look like. Please check your job once it is running to dial down the number of cpus and memory needed. If we don’t use the node’s full resources, it would be nice to be able to submit other cpu-only jobs, aka none gpu jobs on those nodes.

Important steps in order to get automatic requeueing working:
* Ask slurm to send you a signal 30 seconds before the end of your time limit `--signal=B:USR1@30`
* Have a thread listen to the requested signal `trap _requeue USR1`
* **Send your MAIN process in the background and wait for it otherwise your `_requeue` function will NEVER get a chance to run.**


```
#SBATCH --job-name=WMT21.training
#SBATCH --comment="Thanks Samuel Larkin for showing me how to work with Slurm"
#SBATCH --partition=TrixieMain
#SBATCH --account=dt-mtp
#SBATCH --gres=gpu:4
#SBATCH --time=12:00:00 #SBATCH --exclude=cn125
#SBATCH --nodes=1
#SBATCH --ntasks-per-node=1 #SBATCH --cpus-per-task=24
#SBATCH --mem=40G
# To reserve a whole node for yourself
####SBATCH --exclusive
#SBATCH --open-mode=append
#SBATCH --requeue
#SBATCH --signal=B:USR1@30
#SBATCH --output=%x-%j.out
# Requeueing on Trixie
# [source](https://www.sherlock.stanford.edu/docs/user-guide/running-jobs/)
# [source](https://hpc-uit.readthedocs.io/en/latest/jobs/examples.html#how-to-recover-files-before-a-job-times-out)
function _requeue {
echo "BASH - trapping signal 10 - requeueing $SLURM_JOBID"
date
# This would allow to generically requeue any job but since we are using XLM
# which is slurm aware, XLM could save its model before requeueing.
scontrol requeue $SLURM_JOBID
}
if [[ -n "$SLURM_JOBID" ]]; then
trap _requeue USR1
fi
time python –m sockeye.train …. &
wait
```

where `time python –m sockeye.train ….` is the process you want to run.
33 changes: 33 additions & 0 deletions docs/Available-Software.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,33 @@
The most up-to-date way to see which software has been preinstalled on Trixie is by using the module command. When in double, it is the definitive list.

Software on Trixie is organized using the ``module`` service. Users can load, unload, and swap libraries within their environment and job submission scripts via ``modules``.

If there is a piece of software you would like to use but it is not available from the list (and you can't figure out how to build it yourself in your home directory), you may create a request using the issues tab (https://github.com/ai4d-iasc/trixie/issues). Please do not create duplicate requests for software (but feel free to comment on a thread to ''upvote'' the priority list is clear.

# Compilers

* gcc
* intel (not currently available, in procurement)

# Numerical libraries

* intel-mkl (not currently available, in procurement)

# Deep learning frameworks

* tensorflow
* pytorch

# Python

* system python
* CC stack python (now default):
* [virtualenvs](jobs-python-virtualenv.md)
* Anaconda, miniconda:
* System-wide Anaconda
* User miniconda

# Scientific simulation software

* [abinit](jobs-abinit.md)
* lumerical
67 changes: 67 additions & 0 deletions docs/External-Access-Advanced-Configuration.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,67 @@
# Overview

To ease usage of the bastion host to easily connect to Trixie, there are some steps which can be taken, especially making use of the SSH **ProxyJump** and **ControlMaster** parameters. Basically, you need to configure SSH to automatically connect with the Trixie server using the bastion host as a connector between your local computer and the Trixie server.

**Important Note:** Before proceeding with this configuration, please ensure that you have performed the [External Access Setup](External-Access-Setup.md) procedure.

# Mac OSX / Linux

To configure SSH to automatically connect to the Trixie server, please open your ``.ssh/config`` file with your preferred text editor and add the following lines on your local machine – not the servers – while substituting your given usernames in the **User** directive. You will also need to create the folder ``.ssh/sockets`` to complete the configuration.

```
Host trixie-bastion
HostName trixie.nrc-cnrc.gc.ca
User <firstname>.<lastname>@pub.nrc-cnrc.gc.ca
ControlMaster auto
ControlPath ~/.ssh/sockets/%r@%h-%p
Host trixie
HostName trixie.res.nrc.gc.ca
User admin.<firstname>.<lastname>
ProxyJump trixie-bastion
```

Once your settings are configured, you will be able to login to the Trixie server with the following command

``ssh trixie``

Please note that you will be prompted as follows

1. *LoginTC* prompt – enter 1
2. Prompt for your **PUB** password
3. Prompt for your **RES** admin password

# Windows – Putty

To configure SSH to automatically connect to the Trixie server, please set the following settings in your Putty application, substituting your username where applicable.

1. Under **Connection -> SSH**
1. Set **Remote command**: ``ssh –A –Y admin.<firstname>.<lastname>@trixie.res.nrc.gc.ca``
2. Select the option **Share SSH connections if possible** – this will enable you to establish multiple connections to Trixie

![trixie putty](images/trixie-putty-1.png)</br>
2. Under **Connection -> SSH -> X11**

1. Select the option **Enable X11 forwarding**

![putty](images/trixie-putty-2.png)</br>
3. Under **Session**

1. Set **Host Name (or IP address)**: *<firstname.lastname>`<span>`@pub.nrc-cnrc.gc.ca `<span>`@trixie.nrc-cnrc.gc.ca*
2. Set **Port**: *22*
3. Add a name for **Saved Sessions** – perhaps *Trixie*

![putty](images/trixie-putty-3.png)</br>
4. Click **Save**

Once the settings have been saved, you can double click on the name in the list of **Saved Sessions** to open a session to the Trixie server. Please note that you will be prompted as follows

1. *LoginTC* prompt – enter 1
2. Prompt for your **PUB** password
3. Prompt for your **RES** admin password

# Related Topics

[External Access Setup](External-Access-Setup.md)

[File Transfers](File-Transfers.md)
122 changes: 122 additions & 0 deletions docs/External-Access-Setup.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,122 @@
# Overview

As an external NRC collaborator, you can access the AI for Design (Trixie) Cluster using the Bastion Host. External collaborators include non-NRC researchers, industrial partners, and vendors.

You can access only those folders on Trixie that are required for your project. Requests for access to Trixie and specific projects must be made by your NRC research contact; you cannot request access to a system yourself.

Once granted access, you will have two sets of credentials issued to access the cluster:

| Account | Purpose | User name format (example: John Doe) |
| ----------------------- | ----------------------------------------------------------------------------------------------------- | ---------------------------------------------------------------------------- |
| **PUB** | Provides access to the external bastion host and used for the*LoginTC* second factor authentication | A combination of your first and last name. E.g.: [email protected] |
| **Trixie System** | Provides access to Trixie | admin.firstname.lastname E.g.: admin.john.doe |

Your NRC contact, or an NRC system administrator, will provide you with the PUB and Admin user names and passwords that you require to access the NRC systems. Note that on first login, you will be required to change your password. Please note: during the password change, the first prompt asks for a confirmation of your existing password prior to requesting a new one.

# LoginTC Application Setup

Before you attempt your first login, the following initial installation and configuration of LoginTC must be implemented.

* Upon user creation, you will receive an email to setup and initialize the *LoginTC* application (for iOS, Android, or the Chrome web browser) which is used as a second factor authentication into Trixie
* Set up *LoginTC* using the directions provided to you by email

# Accessing Trixie with LoginTC 2-Factor Authentication

In order to access Trixie, you will need to use an SSH client. Please note that you cannot access Trixie using a web browser. On Mac OSX and Linux, SSH is installed by default. On Windows you will need to install Putty if it is not installed already. You can download Putty from the following website:

https://www.putty.org/

## Initialize SSH Connection with Mac OSX / Linux

For Mac OSX and Linux you can open a new terminal and connect to ``trixie.nrc-cnrc.gc.ca`` via ssh using your **PUB** account and the following command

``ssh <firstname.lastname>@pub.nrc-cnrc.gc.ca trixie.nrc-cnrc.gc.ca``

## Initialize SSH Connection with Windows

For Windows, you can create a Putty profile to SSH into the bastion server

Under **Session**

1. Set **Host Name (or IP address)**: *<firstname.lastname>`<span>`@pub.nrc-cnrc.gc.ca `<span>`@trixie.nrc-cnrc.gc.ca*
2. Set **Port**: *22*
3. Add a name for **Saved Sessions** – perhaps *Bastion*

![img](images/bastion-putty-1.png)</br>
4. Click **Save**

Once the settings have been saved, you can double click on the name in the list of **Saved Sessions** to open a session to the bastion server.

## Logging in for the First Time

When you login for the first time you will be forced to change your password for both your **Pub** account and your Trixie **admin** account. Please note that when you do this, you will **be prompted for your original (or current) password first** and then you will be prompted to enter your new password twice.

In the following procedure, the information printed in the images may not be the same as what you will see when you login. However the steps will be the same.

Please perform the following steps to access Trixie.

1. When you login using one of the methods above, you will be prompted to authenticate with your *LoginTC* application. The message should appear as follows:

![putty](images/trixie-putty-loginTC.png)
2. Press **1** followed by the **Enter** key and then check your *LoginTC* device as setup above to approve the login request
3. If a message similar to the one below appears, then simply type in **yes** to the prompt as shown below

![login](images/login2.png)
4. After you complete the two-factor authentication process in *LoginTC* you will be prompted to enter your **PUB** account password and then you will be forced to change your password. You should see a message similar to the one below – remember to enter your original password first and then enter your new password twice.

![login3](images/login3.png)
5. The system will automatically log you out, thus, you will need to login again using your new password
6. Once you have successfully logged in, you will be logged into the bastion server – your screen should look similar to the following

![login4](images/login4.png)
7. If you have your credentials for the ``trixie.res.nrc.gc.ca`` server you can skip this step. Otherwise, you will now need to contact the administrator who provided you with your credentials for the bastion server to obtain your credentials for the Trixie server
8. You will need to login to Trixie next. From the bash prompt, use SSH to log into ``trixie.res.nrc.gc.ca`` with your Trixie **admin.<firstname.lastname>** account and password with a similar command as the following.

``ssh admin.<firstname.lastname>@trixie.res.nrc.gc.ca``
9. If a message similar to the one below appears, then simply type in **yes** to the prompt as shown below

![login2](images/login2.png)
10. You will be prompted to enter your Trixie **admin** account password and then you will be forced to change your password. You should see a message similar to the one below – remember to enter your original password first and then enter your new password twice.

![images/login3.png](images/login3.png)
11. The system will automatically log you out, thus, you will need to login again using your new password
12. Once you have successfully logged in, you will be logged into Trixie – your screen should look similar to the following

![login5](images/login5.png)

After successful authentication, you should see the Trixie cluster login banner with terms and be placed in a shell in your home directory on the cluster, similar to the image above.

Note that you will be placed in your home directory which only you have access to. For more information on the cluster and its usage, please see the:

[Home](index.md)

# Changing passwords

Passwords on the **PUB** and **RES** accounts expire after 90 days and must be changed. If you do not change your password, you will be locked out of the system.

Watch for the pop-up message notifying you to change your password, or set yourself a reminder to change your password before the 90-day expiry.

If you get locked out of your account due to an expired password for any account, notify your NRC contact who can have the password reset.

## Change Your PUB Password

You can change your PUB password by logging into the following website. The site allows you to manage your PUB account. Please use the following format for your username ``john.doe@pub``

[PUB Account Management](https://login-connexion.nrc-cnrc.gc.ca)

Please note that the **Reset Password** feature will not work if you do not fill in the security questions on the website. Therefore it is **strongly recommended** that you fill in the security questions so that you can reset your password if necessary.

## Change Your Admin Password via Linux Terminal

1. Ensure you are logged into the Trixie server (trixie.res.nrc.gc.ca)
2. Type **passwd** then hit **Enter**
3. You will be prompted for your original (or current) password first and then you will be prompted to enter your new password twice. You should see a message similar to the one below – remember to enter your original password first and then enter your new password twice.

![login6](images/login6.png)</br>
4. The system will automatically log you out, thus, you will need to login again using your new password

# Related Topics

[External Access Advanced Configuration](External-Access-Advanced-Configuration.md)

[File Transfers](File-Transfers.md)
20 changes: 20 additions & 0 deletions docs/External-HPC-Systems.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,20 @@
# Overview

There may be instances where researchers require connectivity to external HPC systems from Trixie. However, network access to and from Trixie is restricted to maintain a high level of security. Therefore, connections to external systems need to be approved before the connection can be opened.

This page provides instructions for requesting a connection to an external system, as well as a list of approved systems that already have an open connection.

# Request a Connection to an External System

In order to submit a request to open a network flow between Trixie and an external HPC system, please post your request in [the issues section](https://github.com/ai4d-iasc/trixie/issues) of this site.

# Approved External Systems

| Institution | System URL |
| ----------------------- | ----------------------- |
| Compute Canada - Cedar | cedar.computecanada.ca |
| Compute Canada - Beluga | beluga.computecanada.ca |
| Compute Canada - Niagra | niagra.computecanada.ca |
| Compute Canada - Graham | graham.computecanada.ca |
| Vector Institute | v.vectorinstitute.ca |
| NERSC.gov - Cori | cori.nersc.gov |
Loading

0 comments on commit a2dc9ee

Please sign in to comment.