diff --git a/README.md b/README.md index 266e7c572..c75b23c47 100644 --- a/README.md +++ b/README.md @@ -56,7 +56,7 @@ python3 scripts/sw_table_md_creator.py -i https://export.uppmax.uu.se/staff/soft Dependencies: ```bash -pip install beautifulsoup4\>=4.11.1 gTTS\>=2.2.4 +pip install beautifulsoup4\>=4.11.1 gTTS\>=2.2.4 ``` The script `md_to_speech.py` takes an `.md` file, parses the text and generates @@ -79,5 +79,5 @@ Filename |Descriptions The website is created using [mkdocs-material](https://squidfunk.github.io/mkdocs-material). The landing page and layout was inspired by the documentation of the HPC cluster -[LUMI](https://docs.lumi-supercomputer.eu/). +[LUMI](https://docs.lumi-supercomputer.eu/). diff --git a/docs/cluster_guides/backup.md b/docs/cluster_guides/backup.md index c0566f289..4b8112cd0 100644 --- a/docs/cluster_guides/backup.md +++ b/docs/cluster_guides/backup.md @@ -7,13 +7,13 @@ As PI, you and your academic institution are ultimately responsible for your dat While UPPMAX systems may have backup, these are not designed to act as the sole repository of primary data, e.g. raw data or originals. ## What does "backup" mean for my data? -The type of backup that is generally available for project storage at UPPMAX is incremental backup with 30 day retention. This means that any file that was deleted more than 30 days ago is irretrievably gone. Changes in a file are kept for 30 days, so we can potentially retrieve an old version up to a month after you edited it. +The type of backup that is generally available for project storage at UPPMAX is incremental backup with 30 day retention. This means that any file that was deleted more than 30 days ago is irretrievably gone. Changes in a file are kept for 30 days, so we can potentially retrieve an old version up to a month after you edited it. The backup service tries to backup all changes as often as they occur, but rapid changes will not register. Due to the large amounts of files in the file systems, a single backup session may take upwards of a week or more. This means that if you create a file and delete it the next day, it will probably not be backed up. Backups are sent off-site to either KTH or LiU, depending on the storage system. -To ensure timely backups, it is very important to reduce the workload of the backup system as much as possible. Create directories with "nobackup" in their name or use the pre-existing nobackup directory in /proj/XYZ to store data that does not need backup. +To ensure timely backups, it is very important to reduce the workload of the backup system as much as possible. Create directories with "nobackup" in their name or use the pre-existing nobackup directory in /proj/XYZ to store data that does not need backup. - It is especially important that temporary files and files that are changed often are placed in nobackup directories. @@ -32,11 +32,11 @@ Backup is done on: ## What should I not put in directories with backup? - Directories where you are actively working, especially if you are creating or modifying many files. - The backup mechanisms cannot keep up with large amounts of files changing on a rapid basis. + The backup mechanisms cannot keep up with large amounts of files changing on a rapid basis. ## How robust is uppmax storage? -- All UPPMAX storage systems use RAID technology to make storage more robust through redundancy. -- This means that two or more disks must fail in the same "RAID volume" before there is a risk of data loss. +- All UPPMAX storage systems use RAID technology to make storage more robust through redundancy. +- This means that two or more disks must fail in the same "RAID volume" before there is a risk of data loss. - However, this technology does not protect against user error (e.g. "rm -rf * in your project directory) or in case of a significant disaster (e.g. fire in computer hall). - Off-site backup is crucial. diff --git a/docs/cluster_guides/bianca_file_transfer_using_filezilla.md b/docs/cluster_guides/bianca_file_transfer_using_filezilla.md index c51992d11..68592f48c 100644 --- a/docs/cluster_guides/bianca_file_transfer_using_filezilla.md +++ b/docs/cluster_guides/bianca_file_transfer_using_filezilla.md @@ -53,7 +53,7 @@ In the 'Site Manager' dialog, click 'New site' In the 'New Site' dialog, create a name for the site, e.g. `bianca-sens123456`. -## 6. Configure site +## 6. Configure site In the 'New Site' dialog, use all standards, except: diff --git a/docs/cluster_guides/bianca_file_transfer_using_gui.md b/docs/cluster_guides/bianca_file_transfer_using_gui.md index 26fbe2ce9..6f8a25cdd 100644 --- a/docs/cluster_guides/bianca_file_transfer_using_gui.md +++ b/docs/cluster_guides/bianca_file_transfer_using_gui.md @@ -31,7 +31,7 @@ one needs [to be inside of SUNET](../getting_started/get_inside_sunet.md). See the 'get inside the university networks' page [here](../getting_started/get_inside_sunet.md) -When a tool is setup, one can only transfer files +When a tool is setup, one can only transfer files between you local computer and [your Bianca `wharf` folder](wharf.md). ## Bianca's constraints diff --git a/docs/cluster_guides/lftp_with_bianca.md b/docs/cluster_guides/lftp_with_bianca.md index 33f75474c..09929630d 100644 --- a/docs/cluster_guides/lftp_with_bianca.md +++ b/docs/cluster_guides/lftp_with_bianca.md @@ -3,9 +3,9 @@ `lftp` is a command-line program to [transfer files to/from Bianca](transfer_bianca.md). -With the command line SFTP client `lftp`, -you need to "set net:connection_limit 1". -`lftp` may also defer the actual connection +With the command line SFTP client `lftp`, +you need to "set net:connection_limit 1". +`lftp` may also defer the actual connection until it's really required unless you end your connect URL with a path. [When inside of SUNET](../getting_started/get_inside_sunet.md) @@ -15,7 +15,7 @@ until it's really required unless you end your connect URL with a path. lftp sftp://[user_name]-[project_id]@bianca-sftp.uppmax.uu.se/[user_name]-[project_id]/ ``` -where +where * `[project_id]` is the ID of your [NAISS project](../getting_started/project.md) * `[user_name]` is the name of your [UPPMAX user account](../getting_started/user_account.md) diff --git a/docs/cluster_guides/project_management.md b/docs/cluster_guides/project_management.md index 4078607bc..1b2cc08a3 100644 --- a/docs/cluster_guides/project_management.md +++ b/docs/cluster_guides/project_management.md @@ -11,13 +11,13 @@ ???- question "What is this 'glob' folder in my home folder?" - - The glob directory found in your home has been deprecated since early 2017. - - It is now a normal directory and shared your default 32GByte sized home. + - The glob directory found in your home has been deprecated since early 2017. + - It is now a normal directory and shared your default 32GByte sized home. - The glob directory remains to not interfere with scripts who might reference ~/glob in the source code. - - Historically, the glob directory was the main storage area for storage of user data. - - It was shared by all nodes. - - The directory was used for files needed by all job instances and could house files exceeding the quota of the home directory. + - Historically, the glob directory was the main storage area for storage of user data. + - It was shared by all nodes. + - The directory was used for files needed by all job instances and could house files exceeding the quota of the home directory. - Job input and output files was (and can still be) stored here. ## Members diff --git a/docs/cluster_guides/rsync_on_bianca.md b/docs/cluster_guides/rsync_on_bianca.md index c292a0f40..d19581563 100644 --- a/docs/cluster_guides/rsync_on_bianca.md +++ b/docs/cluster_guides/rsync_on_bianca.md @@ -1,6 +1,6 @@ # `rsync` on Bianca -[`rsync`](../software/rsync.md) is a command-line tool +[`rsync`](../software/rsync.md) is a command-line tool for [file transfer](../cluster_guides/file_transfer.md). This page describes how to use [`rsync`](../software/rsync.md) on [Bianca](bianca.md). @@ -12,18 +12,18 @@ One cannot `rsync` directly to `wharf`. One cannot `rsync` directly to `wharf`. However, this is how it looks like: - + ``` richel@richel-N141CU:~$ rsync my_local_file.txt richel-sens2016001@bianca-sftp.uppmax.uu.se:/richel-sens2016001 Hi! - You are connected to the bianca wharf (sftp service) at + You are connected to the bianca wharf (sftp service) at bianca-sftp.uppmax.uu.se. Note that we only support SFTP, which is not exactly the - same as SSH (rsync and scp will not work). + same as SSH (rsync and scp will not work). Please see our homepage and the Bianca User Guide for more information: @@ -36,7 +36,7 @@ One cannot `rsync` directly to `wharf`. Best regards, UPPMAX - richel-sens2016001@bianca-sftp.uppmax.uu.se's password: + richel-sens2016001@bianca-sftp.uppmax.uu.se's password: protocol version mismatch -- is your shell clean? (see the rsync manpage for an explanation) rsync error: protocol incompatibility (code 2) at compat.c(622) [sender=3.2.7] diff --git a/docs/cluster_guides/running_jobs/storage_compression.md b/docs/cluster_guides/running_jobs/storage_compression.md index daadc916e..3c6cef8ac 100644 --- a/docs/cluster_guides/running_jobs/storage_compression.md +++ b/docs/cluster_guides/running_jobs/storage_compression.md @@ -7,16 +7,16 @@ ???- question "How does automatic backup of project areas work at UPPMAX?" [Backup](../backup.md) - + ???- question "What is this 'glob' folder in my home folder?" - - The glob directory found in your home has been deprecated since early 2017. - - It is now a normal directory and shared your default 32GByte sized home. + - The glob directory found in your home has been deprecated since early 2017. + - It is now a normal directory and shared your default 32GByte sized home. - The glob directory remains to not interfere with scripts who might reference ~/glob in the source code. - - Historically, the glob directory was the main storage area for storage of user data. - - It was shared by all nodes. - - The directory was used for files needed by all job instances and could house files exceeding the quota of the home directory. + - Historically, the glob directory was the main storage area for storage of user data. + - It was shared by all nodes. + - The directory was used for files needed by all job instances and could house files exceeding the quota of the home directory. - Job input and output files was (and can still be) stored here. - You might also be interested in our [disk storage guide](../storage/disk_storage_guide.md). diff --git a/docs/cluster_guides/sftp_with_bianca.md b/docs/cluster_guides/sftp_with_bianca.md index af51b51bb..71379ace1 100644 --- a/docs/cluster_guides/sftp_with_bianca.md +++ b/docs/cluster_guides/sftp_with_bianca.md @@ -16,7 +16,7 @@ to [transfer files to/from Bianca](transfer_bianca.md). sftp [user_name]-[project_id]@bianca-sftp.uppmax.uu.se:/[user_name]-[project_id] ``` -where +where * `[project_id]` is the ID of your [NAISS project](../getting_started/project.md) * `[user_name]` is the name of your [UPPMAX user account](../getting_started/user_account.md) @@ -30,12 +30,12 @@ sftp sven-sens2016001@bianca-sftp.uppmax.uu.se:/sven-sens2016001 `sftp` will ask for a password: ``` -sven-sens2016001@bianca-sftp.uppmax.uu.se's password: +sven-sens2016001@bianca-sftp.uppmax.uu.se's password: ``` The password is your normal UPPMAX password directly followed by the six digits from the [the `UPPMAX` 2-factor authentication](https://www.uu.se/en/centre/uppmax/get-started/2-factor). -For example, if your password is `VerySecret` and the second factor code is `123456` +For example, if your password is `VerySecret` and the second factor code is `123456` you would type `VerySecret123456` as the password in this step. After typing in the password and 2FA one sees a welcome message @@ -48,11 +48,11 @@ and the `sftp` prompt. ``` Hi! - You are connected to the bianca wharf (sftp service) at + You are connected to the bianca wharf (sftp service) at bianca-sftp.uppmax.uu.se. Note that we only support SFTP, which is not exactly the - same as SSH (rsync and scp will not work). + same as SSH (rsync and scp will not work). Please see our homepage and the Bianca User Guide for more information: @@ -65,9 +65,9 @@ and the `sftp` prompt. Best regards, UPPMAX - richel-sens2016001@bianca-sftp.uppmax.uu.se's password: + richel-sens2016001@bianca-sftp.uppmax.uu.se's password: Connected to bianca-sftp.uppmax.uu.se. - sftp> + sftp> ``` ???- question "How do I get rid of the welcome message?" @@ -81,7 +81,7 @@ and the `sftp` prompt. The last line, `sftp> ` is the `sftp` prompt. -Once connected you will have to type the `sftp` commands to upload/download files. +Once connected you will have to type the `sftp` commands to upload/download files. See [the UPPMAX page on `sftp`](../software/sftp.md) how to do so. With `sftp` you only have access to [your wharf folder](wharf.md). diff --git a/docs/cluster_guides/software_on_transit.md b/docs/cluster_guides/software_on_transit.md index 868e0c0d8..3fc3c2d26 100644 --- a/docs/cluster_guides/software_on_transit.md +++ b/docs/cluster_guides/software_on_transit.md @@ -1,15 +1,15 @@ # Software on Transit -[Transit](../cluster_guides/transit.md) +[Transit](../cluster_guides/transit.md) is an UPPMAX service that can be used to securely transfer files. This page describes the software on [Transit](../cluster_guides/transit.md). After [logging in to Transit](../cluster_guides/login_transit.md), -you cannot make lasting changes to anything, -except for mounted [wharf](../cluster_guides/wharf.md) directories. -However, anything you have added to your [Rackham](../cluster_guides/rackham.md) home directory -is available on [Transit](../cluster_guides/transit.md). +you cannot make lasting changes to anything, +except for mounted [wharf](../cluster_guides/wharf.md) directories. +However, anything you have added to your [Rackham](../cluster_guides/rackham.md) home directory +is available on [Transit](../cluster_guides/transit.md). In addition, some modules are available. @@ -24,18 +24,18 @@ In addition, some modules are available. ``` ![dds-cli](../img/dds-cli.png) -To download data from TCGA, -[log in to Rackham](../getting_started/login_rackham.md) -and install the GDC client to your home directory. -Then [log in to Transit](../cluster_guides/login_transit.md), -mount the [wharf](../cluster_guides/wharf.md), +To download data from TCGA, +[log in to Rackham](../getting_started/login_rackham.md) +and install the GDC client to your home directory. +Then [log in to Transit](../cluster_guides/login_transit.md), +mount the [wharf](../cluster_guides/wharf.md), and run `./gdc-client`. !!! warning "2FA on transit" - If you connect from abroad and - you are asked for the **2FA** (_two factor authentication_), - there is a grace period (_about 5 minutes_) in which you can - `ssh`/`scp`/`rsync`/`sftp` to **transit** without the need for **2FA**. - This allows you to use these and other tools + If you connect from abroad and + you are asked for the **2FA** (_two factor authentication_), + there is a grace period (_about 5 minutes_) in which you can + `ssh`/`scp`/`rsync`/`sftp` to **transit** without the need for **2FA**. + This allows you to use these and other tools that might experience problems with the **2FA**. diff --git a/docs/cluster_guides/storage/compress_guide.md b/docs/cluster_guides/storage/compress_guide.md index 07a53b80b..def76ca62 100644 --- a/docs/cluster_guides/storage/compress_guide.md +++ b/docs/cluster_guides/storage/compress_guide.md @@ -10,7 +10,7 @@ We have several compression programs installed and you are free to chose whichev gzip also has a parallel version (pigz) that will let the program use multiple cores, making it much faster. If you want to run multithreaded you should make a reservation in the queue system, as the login nodes will throttle your programs if they use too much resources. ``` -# compress a file +# compress a file $ gzip file.txt # single threaded $ pigz -p 4 file.txt # using 4 threads # decompress a file @@ -23,7 +23,7 @@ $ unpigz -p 4 file.txt # using 4 threads (4 is max) bzip2 also has a parallel version (pbzip2) that will let the program use multiple cores, making it much faster. If you want to run multithreaded you should make a reservation in the queue system, as the login nodes will throttle your programs if they use too much resources. ``` -# compress a file +# compress a file $ bzip2 file.txt # single threaded $ pbzip2 -p4 file.txt # using 4 threads # decompress a file @@ -36,7 +36,7 @@ $ pbunzip2 -p4 file.txt.gz # using 4 threads zstd has built in support for using multiple threads when compressing data only, making it much faster. If you want to run multithreaded you should make a reservation in the queue system, as the login nodes will throttle your programs if they use too much resources. ``` -# compress a file +# compress a file $ zstd --rm file.txt # single threaded $ zstd --rm -T4 file.txt # using 4 threads # decompress a file, only single threaded @@ -48,7 +48,7 @@ The commands above work on a single file at a time, and if you have 1000s of fil ``` # to compress a folder (folder/) -# and all files/folder inside it, +# and all files/folder inside it, # creating a archive file named files.tar.gz $ tar -czvf files.tar.gz folder/ # to decompress the archive later @@ -76,7 +76,7 @@ There are some compression algorithms that have become standard practice to use ### fastq files ``` -# compress sample.fq +# compress sample.fq $ gzip sample.fq # single threaded $ pigz -p 4 sample.fq # using 4 threads ``` @@ -103,7 +103,7 @@ $ module load bioinfo-tools htslib # compress sample.vcf / sample.g.vcf $ bgzip sample.vcf # single threaded $ bgzip -@ 4 sample.vcf # using 4 threads -# index sample.vcf.gz / sample.g.vcf.gz +# index sample.vcf.gz / sample.g.vcf.gz $ tabix sample.vcf.gz ``` diff --git a/docs/cluster_guides/transfer_bianca.md b/docs/cluster_guides/transfer_bianca.md index ce8f45729..3ced220f8 100644 --- a/docs/cluster_guides/transfer_bianca.md +++ b/docs/cluster_guides/transfer_bianca.md @@ -16,7 +16,7 @@ flowchart LR end ``` -[File transfer](file_transfer.md) is the process of getting files +[File transfer](file_transfer.md) is the process of getting files from one place to the other. This page shows how to do [file transfer](file_transfer.md) to/from the [Bianca](bianca.md) UPPMAX cluster. @@ -61,8 +61,8 @@ See [using `lftp` with Bianca](lftp_with_bianca.md). ## Transit server -To facilitate secure data transfers to, from, -and within the system for computing on sensitive data a special service is available +To facilitate secure data transfers to, from, +and within the system for computing on sensitive data a special service is available via SSH at `transit.uppmax.uu.se`. ![A user that is logged in to Transit](./img/logged_in_transit.png) @@ -103,10 +103,10 @@ rsync -avh my_user@rackham.uppmax.uu.se:path/my_files ~/sens2023531/ ### Moving data between projects -- You can use transit to transfer data between projects - by mounting the wharfs for the different projects +- You can use transit to transfer data between projects + by mounting the wharfs for the different projects and transferring data with `rsync`. -- Note that you may of course only do this if this is allowed +- Note that you may of course only do this if this is allowed (agreements, permissions, etc.) ## Mounting `wharf` on your local computer diff --git a/docs/cluster_guides/transit.md b/docs/cluster_guides/transit.md index 68aab947e..55025f85f 100644 --- a/docs/cluster_guides/transit.md +++ b/docs/cluster_guides/transit.md @@ -1,6 +1,6 @@ # Transit -[Transit](../cluster_guides/transit.md) +[Transit](../cluster_guides/transit.md) is an UPPMAX service that can be used to securely transfer files. ???- question "Is Transit a file server?" diff --git a/docs/cluster_guides/wharf.md b/docs/cluster_guides/wharf.md index b41f66e4f..74c2a682d 100644 --- a/docs/cluster_guides/wharf.md +++ b/docs/cluster_guides/wharf.md @@ -1,6 +1,6 @@ # `wharf` -`wharf` is a folder on [Bianca](bianca.md) used +`wharf` is a folder on [Bianca](bianca.md) used for [file transfer on Bianca](transfer_bianca.md). He it is described: @@ -12,9 +12,9 @@ He it is described: ## What is `wharf`? -The `wharf` is like a "postbox" :postbox: for data/file exchange -between the Internet restricted Bianca cluster -and the remaining of the World Wide Internet. +The `wharf` is like a "postbox" :postbox: for data/file exchange +between the Internet restricted Bianca cluster +and the remaining of the World Wide Internet. This "postbox" is reachable to transfer data from two internal servers - `bianca-sftp.uppmax.uu.se` and `transit.uppmax.uu.se`. @@ -26,7 +26,7 @@ The path to this special folder is: /proj/[project_id]/nobackup/wharf/[user_name]/[user_name]-[project_id] ``` -where +where * `[project_id]` is the ID of your [NAISS project](../getting_started/project.md) * `[user_name]` is the name of your [UPPMAX user account](../getting_started/user_account.md) @@ -39,13 +39,13 @@ For example: ## `wharf` use -To [transfer data from/to Bianca](transfer_bianca.md), +To [transfer data from/to Bianca](transfer_bianca.md), `wharf` is to folder where files are sent to/from. Do not keep files in `wharf`, as this folder is connected to the outside world and hence is a security risk. Instead, move your data to your project folder. -You have full access to your `wharf` and read-only access +You have full access to your `wharf` and read-only access to other users' `wharf` folders in that same project. `wharf` is only accessible when [inside the university networks](../getting_started/get_inside_sunet.md). diff --git a/docs/getting_started/project_apply.md b/docs/getting_started/project_apply.md index 3e195de4c..d33f2e0b1 100644 --- a/docs/getting_started/project_apply.md +++ b/docs/getting_started/project_apply.md @@ -5,7 +5,7 @@ !!! note "Apply for a project here" - See [here](https://www.uu.se/en/centre/uppmax/get-started/create-account-and-apply-for-project/apply-for-projects) + See [here](https://www.uu.se/en/centre/uppmax/get-started/create-account-and-apply-for-project/apply-for-projects) to apply for a project. To use [UPPMAX](../cluster_guides/uppmax.md) resources, one needs: diff --git a/docs/software/jobstats.md b/docs/software/jobstats.md index a36c76f65..c2cc89418 100644 --- a/docs/software/jobstats.md +++ b/docs/software/jobstats.md @@ -14,10 +14,10 @@ At this page, it is described: * [Examples](#example): Examples of ineffective resource use plots * Other `jobstats` functionality * Using `jobstats --help` - + ## `jobstats --plot` -With the `--plot` (or `-p`) option, +With the `--plot` (or `-p`) option, a plot is produced showing the resource use per node for a job that completed successfully and took longer than 5 minutes. @@ -93,7 +93,7 @@ flowchart TD done(Done) blue_line_close_to_top --> |yes| done blue_line_close_to_top --> |no| black_line_close_to_top - black_line_close_to_top --> |yes| done + black_line_close_to_top --> |yes| done black_line_close_to_top --> |no| can_decrease_number_of_cores can_decrease_number_of_cores --> |yes| decrease_number_of_cores can_decrease_number_of_cores --> |no| done @@ -126,11 +126,11 @@ flowchart TD Because CPU is more flexible. - For example, imagine a job with a short CPU spike, + For example, imagine a job with a short CPU spike, that can be processed by 16 CPUs. If 1 core of memory is enough, use 1 core or memory: the spike will be turned into a 100% CPU use (of that one core) - for a longer duration. + for a longer duration. ???- question "Need a worked-out example?" @@ -188,7 +188,7 @@ inefficiently, see [the examples below](#examples) ## Examples -Here are some examples of how inefficient jobs can look +Here are some examples of how inefficient jobs can look and what you can do to make them more efficient. ### Inefficient job example 1: booking too much cores @@ -217,7 +217,7 @@ This means booking 5 cores is recommended. ![jobstats showing a single-node job](./img/jobstats_c_555912-l_1-k_bad_job_05_with_border.png) -This is one of the grayer areas: +This is one of the grayer areas: booking 2-9 cores is all considered reasonable. > Pick the number of cores to have enough memory @@ -268,7 +268,7 @@ This is around 6 cores (600%), as with that amount of cores: that is 0 too much - only rarely, there is a little spike up or a bigger spike down -There are no signs of anything slowing them down, as the line is very even. +There are no signs of anything slowing them down, as the line is very even. This jobs should either have been booked with 6 cores, or the program running should be told to use all 8 cores. @@ -276,44 +276,44 @@ This jobs should either have been booked with 6 cores, or the program running sh ![jobstats showing a single-node job](./img/jobstats_c_555912-l_1-k_bad_job_02_with_border.png) -This job is using almost all of the cores it has booked, -but there seems to be something holding them back. -The uneven blue curve tells us that something is slowing down the analysis, -and it's not by a constant amount. +This job is using almost all of the cores it has booked, +but there seems to be something holding them back. +The uneven blue curve tells us that something is slowing down the analysis, +and it's not by a constant amount. -Usually this is how it looks when the filesystem is the cause of a slowdown. -Since the load of the filesystem is constantly changing, -so will the speed by which a job can read data from it also change. +Usually this is how it looks when the filesystem is the cause of a slowdown. +Since the load of the filesystem is constantly changing, +so will the speed by which a job can read data from it also change. -This job should try to copy all the files it will be working -with to the nodes local harddrive before running the analysis, -and by doing so not be affected by the speed of the filesystem. +This job should try to copy all the files it will be working +with to the nodes local harddrive before running the analysis, +and by doing so not be affected by the speed of the filesystem. -Please see the guide How to use the nodes own hard drive -for analysis for more information. +Please see the guide How to use the nodes own hard drive +for analysis for more information. -You basically just add 2 more commands to your script file +You basically just add 2 more commands to your script file and the problem should be solved. ### Inefficient job example 5 ![jobstats showing a single-node job](./img/jobstats_c_555912-l_1-k_bad_job_04_with_border.png) -This job has the same problem as the example above, -but in a more extreme way. - -It's not uncommon that people book whole nodes out of habit -and only run single threaded programs that use almost no memory. -This job is a bit special in the way that it's being run on a high memory node, -as you can see on the left Y-axis, that it goes up to 256 GB RAM. -A normal node on Milou only have 128GB. -These high memory nodes are only bookable of you book the whole node, -so you can't book just a few cores on them. -That means that if you need 130GB RAM and the program is only single threaded, -your only option is to book a whole high memory node. -The job will look really inefficient, -but it's the only way to do it on our system. -The example in the plot does not fall into this category though, +This job has the same problem as the example above, +but in a more extreme way. + +It's not uncommon that people book whole nodes out of habit +and only run single threaded programs that use almost no memory. +This job is a bit special in the way that it's being run on a high memory node, +as you can see on the left Y-axis, that it goes up to 256 GB RAM. +A normal node on Milou only have 128GB. +These high memory nodes are only bookable of you book the whole node, +so you can't book just a few cores on them. +That means that if you need 130GB RAM and the program is only single threaded, +your only option is to book a whole high memory node. +The job will look really inefficient, +but it's the only way to do it on our system. +The example in the plot does not fall into this category though, as it uses only ~15GB of RAM, which you could get by booking 2-3 normal cores. ## `jobstats --help` @@ -419,25 +419,25 @@ jobstats --help produced will reflect the scheduled end time of the job. -A project Project valid on the cluster. [finishedjobinfo](finishedjobinfo.md) is used to - discover jobs for the project. See further comments + discover jobs for the project. See further comments under 'Mode 4' above. -M cluster Cluster on which jobs were run [default current cluster] -n node[,node...] Cluster node(s) on which the job was run. If specified, then the [finishedjobinfo](finishedjobinfo.md) script is not run and discovery - is restricted to only the specified nodes. Nodes can be - specified as a comma-separated list of complete node + is restricted to only the specified nodes. Nodes can be + specified as a comma-separated list of complete node names, or using the [finishedjobinfo](finishedjobinfo.md) syntax: m78,m90,m91,m92,m100 or m[78,90-92,100] Nonsensical results will occur if the syntaxes are mixed. - - | --stdin Accept input on stdin formatted like [finishedjobinfo](finishedjobinfo.md) - output. The short form of this option is a single dash + - | --stdin Accept input on stdin formatted like [finishedjobinfo](finishedjobinfo.md) + output. The short form of this option is a single dash '-'. - - -m | --memory Always include memory usage flags in output. Default - behaviour is to include memory usage flags only if CPU + + -m | --memory Always include memory usage flags in output. Default + behaviour is to include memory usage flags only if CPU usage flags are also present. -v | --verbose Be wordy when describing flag values. @@ -477,8 +477,8 @@ jobstats --help //extended_uppmax_jobstats// //uppmax_jobstats// - -X directory Hard directory prefix to use for jobstats files. - Jobstats files are assumed available directly: + -X directory Hard directory prefix to use for jobstats files. + Jobstats files are assumed available directly: '/' --no-multijobs Run [finishedjobinfo](finishedjobinfo.md) separately for each jobid, rather than all jobids bundled into one -j option (for debugging) @@ -498,7 +498,7 @@ jobstats --help Unless the -q/--quiet option is provided, a table is also produces containing lines with the following tab-separated fields: - jobid cluster jobstate user project endtime runtime flags booked cores node[,node...] jobstats[,jobstats...] + jobid cluster jobstate user project endtime runtime flags booked cores node[,node...] jobstats[,jobstats...] Field contents: @@ -514,9 +514,9 @@ jobstats --help maxmem : Maximum memory used as reported by SLURM (if unavailable, this is '.') cores : Number of cores represented in the discovered jobstats files. node : Node(s) booked for the job, expanded into individual node names, - separated by commas; if no nodes were found, this is '.'. + separated by commas; if no nodes were found, this is '.'. The nodes for which jobstats files are available are listed first. - jobstats : jobstats files for the nodes, in the same order the nodes are + jobstats : jobstats files for the nodes, in the same order the nodes are listed, separated by commas; if no jobstats files were discovered, this is '.' @@ -557,7 +557,7 @@ jobstats --help overbooked : % used (if < 80%) The maximum percentage of booked cores and/or memory that was used !!half_overbooked - No more than 1/2 of both cores and memory of a node was used; consider booking + No more than 1/2 of both cores and memory of a node was used; consider booking half a node instead. !!severely_overbooked No more than 1/4 of both cores and memory of a node was used, examine your job @@ -609,14 +609,14 @@ Discovery by job number for a completed job: ``` jobstats --plot jobid1 jobid2 jobid3 ``` -The job numbers valid on the cluster. -[finishedjobinfo](finishedjobinfo.md) is used -to determine further information for each job. -This can be rather slow, -and a message asking for your patience is printed for each job. - -If multiple queries are expected it would be quicker -to run [finishedjobinfo](finishedjobinfo.md) yourself separately, +The job numbers valid on the cluster. +[finishedjobinfo](finishedjobinfo.md) is used +to determine further information for each job. +This can be rather slow, +and a message asking for your patience is printed for each job. + +If multiple queries are expected it would be quicker +to run [finishedjobinfo](finishedjobinfo.md) yourself separately, see Mode 4 below. See Mode 2 for a currently running job. ### `jobstats` discovery mode 2: discovery by job number for a currently running job @@ -627,8 +627,8 @@ Discovery by job number for a currently running job. jobstats --plot -r jobid1 jobid2 jobid3 ``` -Job numbers of jobs currently running on the cluster. -[The Slurm schedule](../cluster_guides/slurm.md) is used to determine +Job numbers of jobs currently running on the cluster. +[The Slurm schedule](../cluster_guides/slurm.md) is used to determine further information for each running job. ### `jobstats` discovery mode 3: discovery by node and job number, for a completed or running job @@ -652,16 +652,16 @@ Discovery by project. jobstats --plot -A project ``` -When providing a project name that is valid for the cluster, -[finishedjobinfo](finishedjobinfo.md) is used -to determine further information on jobs run within the project. -As for Mode 1, this can be rather slow, -and a message asking for your patience is printed. +When providing a project name that is valid for the cluster, +[finishedjobinfo](finishedjobinfo.md) is used +to determine further information on jobs run within the project. +As for Mode 1, this can be rather slow, +and a message asking for your patience is printed. -Furthermore only [finishedjobinfo](finishedjobinfo.md) defaults -for time span etc. are used for job discovery. -If multiple queries are expected -or additional [finishedjobinfo](finishedjobinfo.md) options are desired, +Furthermore only [finishedjobinfo](finishedjobinfo.md) defaults +for time span etc. are used for job discovery. +If multiple queries are expected +or additional [finishedjobinfo](finishedjobinfo.md) options are desired, see Mode 5 below. ### `jobstats` discovery mode 5: discovery via information provided on stdin