Merge branch 'main' of https://github.com/UPPMAX/UPPMAX-documentation

UPPMAX · May 6, 2024 · 5cf9bc8 · 5cf9bc8
2 parents f333faa + 1a1674d
commit 5cf9bc8
Show file tree

Hide file tree

Showing 5 changed files with 120 additions and 35 deletions.
diff --git a/docs/cluster_guides/screen.md b/docs/cluster_guides/screen.md
@@ -0,0 +1,79 @@
+# Running a detachable screen process in a job
+
+When you run the `interactive` command, you get a command prompt in the _screen_ program.
+
+!!! warning
+    When running the screen program in other environments, you can detach from your screen and later reattach to it. Within the environment of the `interactive` command, you lose this ability: Your job is terminated when you detach. (This is a design decision and not a bug.)
+
+In case you want the best of both worlds, i.e. to be able to detach and reattach to your screen program within a job, you need to start a job in some other way and start your screen session from a separate ssh login. Here is an example of how you can do this:
+```bash
+$ salloc -A project_ID -t 15:00  -n 1 --qos=short --bell --no-shell
+salloc: Pending job allocation 46964140
+salloc: job 46964140 queued and waiting for resources
+salloc: job 46964140 has been allocated resources
+salloc: Granted job allocation 46964140
+salloc: Waiting for resource configuration
+salloc: Nodes r174 are ready for job
+```
+Check the queue manager for the allocated node. In the example bellow, one core was allocated on `r174` compute node.
+```bash
+$ squeue -j 46964140
+             JOBID PARTITION     NAME     USER ST       TIME  NODES NODELIST(REASON)
+          46964140      core no-shell     user  R       0:44      1 r174
+```
+You can start `xterm` terminal in this allocated session like this:
+```bash
+$ xterm -e ssh -AX r174 &
+```
+
+`salloc` command gives you a job allocation of one node for 15 minutes (the "--no-shell" option is important here). Instead you can log in to any node of any of your running jobs, started with e.g. the `sbatch` command.
+
+You get a job number and from that you can find out the node name, in this example r174.
+
+When you log in to the node with the `ssh` command, start the screen program:
+
+```bash
+$ screen
+```
+When you detach from the screen program, with e.g. the "d" command, you can later in the same ssh session or in another ssh session reattach to your screen session:
+```bash
+$ screen -r
+```
+When your job has terminated, you can neither reattach to your screen session nor log in to the node.
+
+The screen session of the `interactive` command is integrated into your job, so e.g. all environment variables for the job is correctly assigned. For a separate ssh session, as in this example, that is not the case.
+
+Please note that it is the job allocation that determines your core hour usage and not your ssh or screen sessions.
+
+## Tips
+
+- Start a new screen session with a command:
+   ```
+   screen -dm your_command
+   ```
+   This will start a new screen session, run the command, and then detach from the session.
+
+- If you want to run multiple commands, you can do so like this:
+   ```
+   screen -dm bash -c "command1; command2"
+   ```
+   This will run `command1` and `command2` in order.
+
+- To reattach to the screen session, use:
+   ```
+   screen -r
+   ```
+   If you have multiple sessions, you'll need to specify the session ID.
+
+- To list your current screen sessions, use:
+   ```
+   screen -ls
+   ```
+
+Please note that when a program terminates, `screen` (by default) kills the window that contained it. If you don't want your session to get killed after the script is finished, add `exec sh` at the end. For example:
+```
+screen -dm bash -c 'your_command; exec sh'
+```
+This will keep the screen session alive after `your_command` has finished executing.
+
+YouTube : [How to use GNU SCREEN - the Terminal Multiplexer](https://www.youtube.com/watch?v=I4xVn6Io5Nw)
diff --git a/docs/software/directly-from-IG/GAMESS_US.md → docs/software/games_us.md b/docs/software/directly-from-IG/GAMESS_US.md → docs/software/games_us.md
@@ -1,44 +1,46 @@
 # GAMESS-US user guide
 
-<https://www.uppmax.uu.se/support/user-guides/gamess-us-user-guide/>
-
 GAMESS-US versions 20170930 is installed on Rackham. Newer versions can be installed on request to UPPMAX support. Snowy currently lacks GAMESS-US.
 
-Citing GAMESS papers
-It is essential that you read the GAMESS manual thoroughly to properly reference the papers specified in the instructions. All publications using gamess should cite at least the following paper:
-
-@Article{GAMESS,
-author={M.W.Schmidt and K.K.Baldridge and J.A.Boatz and S.T.Elbert and
-M.S.Gordon and J.J.Jensen and S.Koseki and N.Matsunaga and
-K.A.Nguyen and S.Su and T.L.Windus and M.Dupuis and J.A.Montgomery},
-journal={J.~Comput.~Chem.},
-volume=14,
-pages={1347},
-year=1993,
-comment={The GAMESS program}}
-If you need to obtain GAMESS yourself, please visit the GAMESS website for further instructions.
-
-Running GAMESS
+## Running GAMESS
 
 Load the module using
+```bash
 module load gamess/20170930
-
+```
 Below is an example submit script for Rackham, running on 40 cores (2 nodes with 20 cores each). It is essential to specify the project name:
-
+```slurm
 #!/bin/bash -l
 #SBATCH -J jobname
 #SBATCH -p node -n 40
 #SBATCH -A PROJECT
 #SBATCH -t 03:00:00
- 
+
 module load gamess/20170930
  
 rungms gms >gms.out
-Memory specification
+```
+
+## Memory specification
 GAMESS uses two kinds of memory: replicated memory and distributed memory. Both kinds of memory should be given in the $SYSTEM specification. Replicated memory is specified using the MWORDS keyword and distributed memory with the MEMDDI keyword. It is very important that you understand the uses of these keywords. Check the GAMESS documentation for further information.
 
 If your job requires 16MW (mega-words) of replicated memory and 800MW of distributed memory, as in the example below, the memory requirements per CPU core varies as 16+800/N where N is the number of cores. Each word is 8 bytes of memory, why the amount of memory per core is (16+800/N)*8. The amount of memory per node depends on the number of cores per node. Rackham has 20 cores per node, most nodes have 128 GB of memory, but 30 nodes have 512 GB and 4 nodes at 1 TB.
 
-
-Communication
+## Communication
 For intra-node communication shared memory is used. For inter-node communication MPI is used which uses the Infiniband interconnect.
+
+## Citing GAMESS papers
+It is essential that you read the GAMESS manual thoroughly to properly reference the papers specified in the instructions. All publications using GAMESS should cite at least the following paper:
+
+```bibtex
+@Article{GAMESS,
+author={M.W.Schmidt and K.K.Baldridge and J.A.Boatz and S.T.Elbert and
+M.S.Gordon and J.J.Jensen and S.Koseki and N.Matsunaga and
+K.A.Nguyen and S.Su and T.L.Windus and M.Dupuis and J.A.Montgomery},
+journal={J.~Comput.~Chem.},
+volume=14,
+pages={1347},
+year=1993,
+comment={The GAMESS program}}
+```
+If you need to obtain GAMESS yourself, please visit the GAMESS website for further instructions.
diff --git a/...e/directly-from-IG/img/pytorch-nvidia.png → docs/software/img/pytorch-nvidia.png b/...e/directly-from-IG/img/pytorch-nvidia.png → docs/software/img/pytorch-nvidia.png
diff --git a/...rom-IG/nvidia-deep-learning-frameworks.md → ...ftware/nvidia-deep-learning-frameworks.md b/...rom-IG/nvidia-deep-learning-frameworks.md → ...ftware/nvidia-deep-learning-frameworks.md
@@ -1,16 +1,15 @@
 # NVIDIA Deep Learning Frameworks
-https://www.uppmax.uu.se/support/user-guides/nvidia-deep-learning-frameworks/
 
-Here is how easy one can use an NVIDIA environment for deep learning with all the following tools preset. A screenshot of that page is shown below.
+Here is how easy one can use an NVIDIA [environment](https://docs.nvidia.com/deeplearning/frameworks/pytorch-release-notes/rel_22-03.html) for deep learning with all the following tools preset. A screenshot of that page is shown below.
 
-pytorch
-
-Pull the container (6.5GB).
+![web screenshot](./img/pytorch-nvidia.png)
 
+First - pull the container (6.5GB).
+```bash
 singularity pull docker://nvcr.io/nvidia/pytorch:22.03-py3
-
+```
 Get an interactive shell.
-
+```bash
 singularity shell --nv ~/external_1TB/tmp/pytorch_22.03-py3.sif
 
 Singularity> python3
@@ -34,9 +33,9 @@ True
 # test torch
 >>> torch.zeros(1).to('cuda')
 tensor([0.], device='cuda:0')
-
+```
 From the container shell, check what else is available...
-
+```bash
 Singularity> nvcc -V
 nvcc: NVIDIA (R) Cuda compiler driver
 Copyright (c) 2005-2022 NVIDIA Corporation
@@ -60,19 +59,21 @@ Singularity> jupyter-lab
 [I 13:35:46.616 LabApp] http://hostname:8888/?token=d6e865a937e527ff5bbccfb3f150480b76566f47eb3808b1
 [I 13:35:46.616 LabApp] Use Control-C to stop this server and shut down all kernels (twice to skip confirmation).
 ...
-
+```
 You can use this container to add more packages.
-
+```singularity
 Bootstrap: docker
 From: nvcr.io/nvidia/pytorch:22.03-py3
 ...
+```
 
 Just keep in mind that "upgrading" the build-in torch package might install a package that is compatible with less GPU architectures and it might not work anymore on your hardware.
-
+```bash
 Singularity> python3 -c "import torch; print(torch.__version__); print(torch.cuda.is_available()); print(torch.cuda.get_arch_list()); torch.zeros(1).to('cuda')"
 
 1.10.0+cu102
 True
 ['sm_37', 'sm_50', 'sm_60', 'sm_70']
 NVIDIA A100-PCIE-40GB with CUDA capability sm_80 is not compatible with the current PyTorch installation.
 The current PyTorch install supports CUDA capabilities sm_37 sm_50 sm_60 sm_70.
+```
diff --git a/mkdocs.yml b/mkdocs.yml
@@ -33,6 +33,7 @@ nav:
       - Checking and optimizing jobs: cluster_guides/optimizing_jobs.md
       - Runtime tips: cluster_guides/running_jobs/runtime_tips.md
       - Storage and compression: cluster_guides/running_jobs/storage_compression.md
+      - Screen: cluster_guides/screen.md
     - Data transfer: 
       - Transfer to/from Bianca: cluster_guides/transfer_bianca.md
       - Migrate to Dardel: cluster_guides/dardel_migration.md
@@ -49,6 +50,7 @@ nav:
     - Software-specific documentation:      
       - Whisper: software/whisper.md
       - Chemistry/physics:
+        - GAMESS_US: software/games_us.md
         - Gaussian: software/gaussian.md
         - GROMACS: software/gromacs.md
         - Molcas: software/openmolcas.md
@@ -65,6 +67,7 @@ nav:
         - MetONTIIME: software/metontiime.md
         - Tracer: software/tracer.md
       - Machine Learning:
+        - NVIDIA DLF: software/nvidia-deep-learning-frameworks.md
         - TensorFlow: software/tensorflow.md
       - Programming languages:
         - Julia: software/julia.md