- Overview
- Login via ssh
- Interactive session
- Access folders
- Basic unix commands
- Batch job
- More information for the curious
Locus website (must be on NIHVPN) - if you don't have a Locus server account, the first time you log into the website (using your regular NIH creds), you will get an email telling you how to request an account. Can also email [email protected]. Locus is intended for use by NIAID staff.
Locus is a High Performance Computing cluster. Why do you need it?
- You need resources (software, processing power, memory/RAM) not available on your local laptop
- You need to run a job that can make use of parallel computing or CPU or GPU or other hardware not available locally
- You need to run a program (job) that will run for a very long time
- You need to run an analysis in a consistent environment that will be available in the future* or that you can share with others
- Need a command line shell program:
- Connect to NIH network/VPN.
- From the command line:
ssh [email protected]
## enter password
- To do work on Locus, open an interactive session:
qrsh -l h_vmem=16G
h_vmem
is the maximum amount of memory you will be allowed to use. You should set this parameter based on how big the files are that you will be working with - here using 16 gigabytes.
Home directory
- Mac: in Finder, Cmd+K smb://locusdata.niaid.nih.gov/username
- Windows: \\locusdata.niaid.nih.gov\username
Other folders on Locus: /hpcdata/rest/of/path
. When you mount, replace /hpcdata/
:
- Mac: smb://locusfileserver.niaid.nih.gov/rest/of/path
- Windows: \\locusfileserver.niaid.nih.gov\rest\of\path
-
alternate way to access folders if mounting doesn't work
-
Cyberduck -> Open Connection (icon top left of the window)
## list files in directory
ls
## long (more info) format with human readable file sizes
ls -lh
## folder sizes in the current working directory
du -sh .
## get the path to your current directory
pwd
## change to another directory (replace directoryname with name of directory)
cd directoryname
## change to the directory above
cd ..
## change to your home directory
cd
## copy a file
cp file path/to/new/location
# copy a directory
cp -r directory newdirectory
# copy file to the directory you are currently in. '.' is shortcut for current directory
cp file .
## move a file
mv file path/to/new/location
## make a new directory
mkdir newdirectory
## look at a file (replace filename)
less filename
## to quit less - type `q`
## get help - replace command with the command/program you need help with
man command
command -h | less
## print the content of a file to the screen
cat file
## delete files - CAUTION! There is no recycling bin. Files removed are gone forever (well, technically, Locus makes backups, but only once a day)
rm file
rm -r directory
- also see batch job section below and more on modules.
## load module - replace modulename with the program (e.g. fastqc)
module load modulename
## unload one module
module unload modulename
## unload all modules (in case you get an error when you load one)
module purge; module load uge
# uge allows qsub and qrsh to run - always load after purging
## list all loaded modules
module list
## search for module versions
module avail modulename
## get info on module
module info modulename
## close your interactive session or log out of locus
exit
-
Alternative to interactive session - run a script of commands in a batch job - after submit job, can close laptop and walk away while it runs on Locus. Emails you when done.
-
job_submit.sh - Create/edit script on laptop using plain text editor like TextEdit or Notepad or programming editor like Atom or VSCode and transfer by mounting Locus folder. (don't use a document editor like Word to edit scripts)
-
Submit job
qsub job_submit.sh
-
Check on job while running
qstat -u username qstat -j jobid ## filter for just usage info qstat -j jobid | grep usage
-
Get info about job after it's done running
qacct -j jobid ## all jobs run by you in the last 2 days qacct -u username -d 2 -j | less
-
Delete a runing job that you don't need it anymore
qdel jobid
-
Check the running processes on the current node
htop
- Locus website has more detailed information on commands and usage.
- Under the hood: Operating system is RedHatEnterpriseServer (related to CentOS and Fedora)
- For job scheduling system, Locus uses Univa Grid Engine (UGE) - older man pages - formerly SGE/Sun Grid Engine.
- Biowulf, NIH's agency-wide cluster, uses a system called Slurm, so the commands may not be the same
- For job scheduling system, Locus uses Univa Grid Engine (UGE) - older man pages - formerly SGE/Sun Grid Engine.
-
advanced_job_submit.sh - example script with more options for qsub allowing finer control of the job and output
-
For more on interactive qrsh and qsub: qrsh tutorial. Other qrsh/qsub options - halfway down page; especially for parallel processing.
-
For mac, to run GUI applications on Locus, need XQuartz. Needs admin permissions to install; email NIAID IT, and they can install for you.
-
make a new text file and edit it directly in the command line shell on Locus with nano
nano file
## example command: replace username with your username!
scp -r [email protected]:/classhome/username ~/Desktop
## basic command structure
## `-r` means "recursive" so we copy the folder and everything in it.
scp -r fromfoldername tofoldername
## fromfolder is on remote server and tofolder is on local laptop
scp -r username@servername:serverfolderpath laptopfolderpath
## You could also use rsync
rsync -r [email protected]:/classhome/username ~/Desktop/myfolder
Locus is a shared resource. While it is very large and powerful, there are still only a finite number of cpus/job slots and a finite amount of memory.
- You cannot run jobs on the login/submit/head nodes. Do NOT run computational or memory-intensive tasks on the head nodes! It slows them down for your colleagues.
- Be conscious of the number of job slots and amount of memory you are using at any one time. Locus has some limits, but they are pretty generous. If you think you will be using resources close to the limit for days at a time or you need more, contact Locus staff.
- Learn how to submit array jobs and how to limit them (
-tc
is array job limit flag).
Other limitations
- If you want to run Rstudio on Locus, the best way is to use the NoMachine Virtue Machine, to do that you need to get a LDAP account and set up a connection following instruction here.
- Interactive 3D ploting using the "rgl" package is not possible because a OpenGL library is lacking on Locus.
-
JupyterHub is a way to interact with Locus through a browser. It is based on the popular Jupyter notebooks.
-
Some tips:
To install python packages, you need to login to Locus and qrsh to a node.
module load Anaconda3/5.3.0
source activate /sysapps/cluster/software/Anaconda3/5.3.0/envs/jupyterenv
pip install --user myPackage
Then the package will be available to your jupyterhub.
To install packages to the R kernel (3.6.1) on Jupyter Hub, however, you have to call for the module-Anaconda2/5.3.0
module load Anaconda2/5.3.
install.packages("package_name")
## or for bioconductor
BiocManager::install("package_name")