This repository holds several bioinformatics pipelines developed at McGill University and Génome Québec Innovation Centre (MUGQIC), as part of the GenAP project.
MUGQIC pipelines consist of Python scripts which create a list of jobs running Bash commands. Those scripts support dependencies between jobs and smart restart mechanism if some jobs fail during pipeline execution. Jobs can be submitted in different ways: by being sent to a PBS scheduler like Torque or by being run as a series of commands in batch through a Bash script. Job commands and parameters can be modified through several configuration files.
On this page:
[TOC]
MUGQIC pipelines have been tested with Python 2.7 and R 3.2.
Genomes and modules used by the pipelines are already installed on a CVMFS partition mounted on all those clusters in /cvmfs/soft.mugqic/CentOS6
.
To access them, add the following lines to your $HOME/.bash_profile:
#!bash
umask 0002
## MUGQIC genomes and modules
export MUGQIC_INSTALL_HOME=/cvmfs/soft.mugqic/CentOS6
module use $MUGQIC_INSTALL_HOME/modulefiles
For MUGQIC analysts, add the following lines to your $HOME/.bash_profile:
#!bash
umask 0002
## MUGQIC genomes and modules for MUGQIC analysts
HOST=`hostname`;
DNSDOMAIN=`dnsdomainname`;
export MUGQIC_INSTALL_HOME=/cvmfs/soft.mugqic/CentOS6
if [[ $HOST == abacus* || $DNSDOMAIN == ferrier.genome.mcgill.ca ]]; then
export MUGQIC_INSTALL_HOME_DEV=/lb/project/mugqic/analyste_dev
elif [[ $HOST == lg-* || $DNSDOMAIN == guillimin.clumeq.ca ]]; then
export MUGQIC_INSTALL_HOME_DEV=/gs/project/mugqic/analyste_dev/phase2
elif [[ $BQMAMMOUTH == "mp2" ]]; then
export MUGQIC_INSTALL_HOME_DEV=/nfs3_ib/bourque-mp2.nfs/tank/nfs/bourque/nobackup/share/mugqic_dev
fi
module use $MUGQIC_INSTALL_HOME/modulefiles $MUGQIC_INSTALL_HOME_DEV/modulefiles
Also, set JOB_MAIL
in your $HOME/.bash_profile to receive PBS job logs:
#!bash
export JOB_MAIL=<[email protected]>
MUGQIC pipelines and compatible Python version are already installed as modules on those clusters. To use them by default, add in your $HOME/.bash_profile:
#!bash
module load mugqic/python/2.7.8
module load mugqic/mugqic_pipelines/<latest_version>
(find out the latest version with: "module avail 2>&1 | grep mugqic/mugqic_pipelines
").
Set your RAP_ID
(Resource Allocation Project ID from Compute Canada) in your $HOME/.bash_profile:
#!bash
export RAP_ID=<my-rap-id>
Visit our Download page to get the latest stable release.
If you want to use the most recent development version:
#!bash
git clone [email protected]:mugqic/mugqic_pipelines.git
Set MUGQIC_PIPELINES_HOME
to your local copy path, in your $HOME/.bash_profile:
#!bash
export MUGQIC_PIPELINES_HOME=/path/to/your/local/mugqic_pipelines
MUGQIC Pipelines require genomes and modules resources to run properly.
First, set MUGQIC_INSTALL_HOME
to the directory where you want to install those resources, in your $HOME/.bash_profile:
#!bash
## MUGQIC genomes and modules
export MUGQIC_INSTALL_HOME=/path/to/your/local/mugqic_resources
module use $MUGQIC_INSTALL_HOME/modulefiles
Reference genomes and annotations must be installed in $MUGQIC_INSTALL_HOME/genomes/
.
Default genome installation scripts are already available in $MUGQIC_PIPELINES_HOME/resources/genomes/
.
To install all of them at once, use the script $MUGQIC_PIPELINES_HOME/resources/genomes/install_all_genomes.sh
.
All species-related files are in:
$MUGQIC_INSTALL_HOME/genomes/species/<species_scientific_name>.<assembly>/
e.g. for Homo sapiens assembly GRCh37, the directory has the following (incomplete) hierarchy:
#!text
$MUGQIC_INSTALL_HOME/genomes/species/Homo_sapiens.GRCh37/
├── annotations/
│ ├── gtf_tophat_index/
│ ├── Homo_sapiens.GRCh37.dbSNP142.vcf.gz
│ ├── Homo_sapiens.GRCh37.dbSNP142.vcf.gz.tbi
│ ├── Homo_sapiens.GRCh37.Ensembl75.geneid2Symbol.tsv
│ ├── Homo_sapiens.GRCh37.Ensembl75.genes.length.tsv
│ ├── Homo_sapiens.GRCh37.Ensembl75.genes.tsv
│ ├── Homo_sapiens.GRCh37.Ensembl75.GO.tsv
│ ├── Homo_sapiens.GRCh37.Ensembl75.gtf
│ ├── Homo_sapiens.GRCh37.Ensembl75.ncrna.fa
│ ├── Homo_sapiens.GRCh37.Ensembl75.rrna.fa
│ ├── Homo_sapiens.GRCh37.Ensembl75.transcript_id.gtf
│ ├── Homo_sapiens.GRCh37.Ensembl75.vcf.gz
│ ├── ncrna_bwa_index/
│ └── rrna_bwa_index/
├── downloads/
│ ├── ftp.1000genomes.ebi.ac.uk/
│ ├── ftp.ensembl.org/
│ └── ftp.ncbi.nih.gov/
├── genome/
│ ├── bowtie2_index/
│ ├── bwa_index/
│ ├── Homo_sapiens.GRCh37.dict
│ ├── Homo_sapiens.GRCh37.fa
│ ├── Homo_sapiens.GRCh37.fa.fai
│ └── star_index/
├── Homo_sapiens.GRCh37.ini
└── log/
The assembly name is the one used by the download source e.g. "GRCh37" for Ensembl.
Each species directory contains a <scientific_name>.<assembly>.ini
file
which lists among other things, the assembly synonyms e.g. "hg19":
Homo_sapiens.GRCh37.ini
#!ini
[DEFAULT]
scientific_name=Homo_sapiens
common_name=Human
assembly=GRCh37
assembly_synonyms=hg19
source=Ensembl
version=75
dbsnp_version=142
New genomes and annotations can be installed semi-automatically from Ensembl (vertebrate species), EnsemblGenomes (other species) or UCSC (genome and indexes only; no annotations).
Example for Chimpanzee:
-
Retrieve the species scientific name on Ensembl or UCSC: "Pan troglodytes"
-
Retrieve the assembly name:
- Ensembl: "CHIMP2.1.4"
- UCSC: "panTro4"
-
Retrieve the source version:
- Ensembl: "78"
- UCSC: unfortunately, UCSC does not have version numbers. Use panTro4.2bit date formatted as "YYYY-MM-DD": "2012-01-09"
-
cp $MUGQIC_PIPELINES_HOME/resources/genomes/GENOME_INSTALL_TEMPLATE.sh $MUGQIC_PIPELINES_HOME/resources/genomes/<scientific_name>.<assembly>.sh
e.g.:-
Ensembl:
cp $MUGQIC_PIPELINES_HOME/resources/genomes/GENOME_INSTALL_TEMPLATE.sh $MUGQIC_PIPELINES_HOME/resources/genomes/Pan_troglodytes.CHIMP2.1.4.sh
-
UCSC:
cp $MUGQIC_PIPELINES_HOME/resources/genomes/GENOME_INSTALL_TEMPLATE.sh $MUGQIC_PIPELINES_HOME/resources/genomes/Pan_troglodytes.panTro4.sh
-
-
Modify
$MUGQIC_PIPELINES_HOME/resources/genomes/<scientific_name>.<assembly>.sh
(ASSEMBLY_SYNONYMS
can be left empty but if you know that 2 assemblies are identical apart fromchr
sequence prefixes, document it):-
Ensembl:
SPECIES=Pan_troglodytes # With "_"; no space! COMMON_NAME=Chimpanzee ASSEMBLY=CHIMP2.1.4 ASSEMBLY_SYNONYMS=panTro4 SOURCE=Ensembl VERSION=78
-
UCSC:
SPECIES=Pan_troglodytes # With "_"; no space! COMMON_NAME=Chimpanzee ASSEMBLY=panTro4 ASSEMBLY_SYNONYMS=CHIMP2.1.4 SOURCE=UCSC VERSION=2012-01-09
-
-
Running
bash $MUGQIC_PIPELINES_HOME/resources/genomes/<scientific_name>.<assembly>.sh
will install the genome in $MUGQIC_INSTALL_HOME_DEV (by default). This will download and install genomes, indexes and, for Ensembl only, annotations (GTF, VCF, etc.). [ADMINS ONLY] To install it in $MUGQIC_INSTALL_HOME runbash $MUGQIC_PIPELINES_HOME/resources/genomes/<scientific_name>.<assembly>.sh MUGQIC_INSTALL_HOME
.If the genome is big, separate batch jobs will be submitted to the cluster for bwa, bowtie/tophat, star indexing. Check that jobs are completed OK.
-
[ADMINS ONLY] If the new genome has been installed in
$MUGQIC_INSTALL_HOME_DEV
, to deploy in$MUGQIC_INSTALL_HOME
:rsync -va --no-o --no-g --no-p --size-only -I -O --ignore-times $MUGQIC_INSTALL_HOME_DEV/genomes/species/<scientific_name>.<assembly> $MUGQIC_INSTALL_HOME/genomes/species/
-
Add the newly created INI file to the genome config files for further usage in pipeline command:
cp $MUGQIC_INSTALL_HOME/genomes/species/<scientific_name>.<assembly>/<scientific_name>.<assembly>.ini $MUGQIC_PIPELINES_HOME/resources/genomes/config/
Software tools and associated modules must be installed in $MUGQIC_INSTALL_HOME/software/
and $MUGQIC_INSTALL_HOME/modulefiles/
.
Default software/module installation scripts are already available in $MUGQIC_PIPELINES_HOME/resources/modules/
.
New software tools and associated modules can be installed semi-automatically:
-
cp $MUGQIC_PIPELINES_HOME/resources/modules/MODULE_INSTALL_TEMPLATE.sh $MUGQIC_PIPELINES_HOME/resources/modules/<my_software>.sh
-
Modify
$MUGQIC_PIPELINES_HOME/resources/modules/<my_software>.sh
following the instructions inside. -
Run
$MUGQIC_PIPELINES_HOME/resources/modules/<my_software>.sh
with no arguments. By default, it will download and extract the remote software archive, build the software and create the associated module, all in$MUGQIC_INSTALL_HOME_DEV
if it is set. -
If everything is OK, to install it in production, run:
$MUGQIC_PIPELINES_HOME/resources/modules/<my_software>.sh MUGQIC_INSTALL_HOME
(no
$
beforeMUGQIC_INSTALL_HOME
!). -
Check if the module is available with:
module avail 2>&1 | grep mugqic/<my_software>/<version>
For each pipeline, get help about usage, arguments and steps with:
- if you use a
mugqic/mugqic_pipelines/<version>
module on our clusters, simply:
#!bash
<pipeline_name>.py --help
- if you use your own local install:
#!bash
$MUGQIC_PIPELINES_HOME/pipelines/<pipeline_name>/<pipeline_name>.py --help
Pipelines require as input one Readset File, one or more Configuration File(s) and possibly one Design File, all described below.
For more information about a specific pipeline, visit:
The Readset File is a TAB-separated values plain text file with one line per readset and the following columns in any order:
- Sample: must contain letters A-Z, numbers 0-9, hyphens (-) or underscores (_) only; BAM files will be merged into a file named after this value; mandatory;
- Readset: a unique readset name with the same allowed characters as above; mandatory;
- Library: optional;
- RunType:
PAIRED_END
orSINGLE_END
; mandatory; - Run: optional;
- Lane: optional;
- QualityOffset: quality score offset integer used for trimming; optional;
- BED: relative or absolute path to BED file; optional;
- FASTQ1: relative or absolute path to first FASTQ file for paired-end readset or single FASTQ file for single-end readset; mandatory if BAM value is missing;
- FASTQ2: relative or absolute path to second FASTQ file for paired-end readset; mandatory if RunType value is "
PAIRED_END
"; - BAM: relative or absolute path to BAM file which will be converted into FASTQ files if they are not available; mandatory if FASTQ1 value is missing, ignored otherwise.
Example:
Sample Readset Library RunType Run Lane QualityOffset BED FASTQ1 FASTQ2 BAM
sampleA readset1 lib0001 PAIRED_END run100 1 33 path/to/file.bed path/to/readset1.paired1.fastq.gz path/to/readset1.paired2.fastq.gz path/to/readset1.bam
sampleA readset2 lib0001 PAIRED_END run100 2 33 path/to/file.bed path/to/readset2.paired1.fastq.gz path/to/readset2.paired2.fastq.gz path/to/readset2.bam
sampleB readset3 lib0002 PAIRED_END run200 5 33 path/to/file.bed path/to/readset3.paired1.fastq.gz path/to/readset3.paired2.fastq.gz path/to/readset3.bam
sampleB readset4 lib0002 PAIRED_END run200 6 33 path/to/file.bed path/to/readset4.paired1.fastq.gz path/to/readset4.paired2.fastq.gz path/to/readset4.bam
- Sample: must contain letters A-Z, numbers 0-9, hyphens (-) or underscores (_) only; mandatory;
- Readset: a unique readset name with the same allowed characters as above; mandatory;
- Smartcell: mandatory;
- NbBasePairs: total number of base pairs for this readset; mandatory;
- EstimatedGenomeSize: estimated genome size in number of base pairs used to compute seeding read length cutoff; mandatory;
- BAS: comma-separated list of relative or absolute paths to BAS files (old PacBio format); mandatory if BAX value is missing, ignored otherwise;
- BAX: comma-separated list of relative or absolute paths to BAX files; BAX file list is used first if both BAX/BAS lists are present; mandatory if BAS value is missing.
Example:
Sample Readset Smartcell NbBasePairs EstimatedGenomeSize BAS BAX
sampleA readset1 F_01_1 122169744 150000 path/to/readset1.bas.h5 path/to/readset1.1.bax.h5,path/to/readset1.2.bax.h5,path/to/readset1.3.bax.h5
sampleA readset2 F_01_2 105503472 150000 path/to/readset2.bas.h5 path/to/readset2.1.bax.h5,path/to/readset2.2.bax.h5,path/to/readset2.3.bax.h5
sampleB readset3 G_01_1 118603200 150000 path/to/readset3.bas.h5 path/to/readset3.1.bax.h5,path/to/readset3.2.bax.h5,path/to/readset3.3.bax.h5
sampleB readset4 G_01_2 104239488 150000 path/to/readset4.bas.h5 path/to/readset4.1.bax.h5,path/to/readset4.2.bax.h5,path/to/readset4.3.bax.h5
If your readsets belong to a Nanuq project, use $MUGQIC_PIPELINES_HOME/utils/nanuq2mugqic_pipelines.py
script to automatically create a Readset File and symlinks to your readsets on abacus.
Pipeline command parameters and cluster settings can be customized using Configuration Files (.ini
extension).
Those files have a structure similar to Microsoft Windows INI files e.g.:
#!ini
[DEFAULT]
module_trimmomatic=mugqic/trimmomatic/0.32
[trimmomatic]
min_length=50
A parameter value is first searched in its specific section, then, if not found, in the special DEFAULT
section.
The example above would resolve parameter module_trimmomatic
value from section trimmomatic
to mugqic/trimmomatic/0.32
.
Configuration files support interpolation. For example:
#!ini
scientific_name=Homo_sapiens
assembly=GRCh37
assembly_dir=$MUGQIC_INSTALL_HOME/genomes/species/%(scientific_name)s.%(assembly)s
genome_fasta=%(assembly_dir)s/genome/%(scientific_name)s.%(assembly)s.fa
would resolve genome_fasta
value to $MUGQIC_INSTALL_HOME/genomes/species/Homo_sapiens.GRCh37/genome/Homo_sapiens.GRCh37.fa
.
Each pipeline has several configuration files in:
#!bash
$MUGQIC_PIPELINES_HOME/pipelines/<pipeline_name>/<pipeline_name>.*.ini
A default configuration file (.base.ini
extension) is set for running on abacus cluster using Homo sapiens reference genome
and must always be passed first to the --config
option.
You can also add a list of other configuration files to --config
.
Files are read in the list order and each parameter value is overwritten if redefined in the next file.
This is useful to customize settings for a specific cluster or genome.
Each pipeline has a special configuration file for guillimin and mammouth clusters (.guillimin.ini
and .mammouth.ini
extensions respectively) in the same directory.
And various genome settings are available in $MUGQIC_PIPELINES_HOME/resources/genomes/config/
.
For example, to run the DNA-Seq pipeline on guillimin cluster with Mus musculus reference genome:
#!bash
$MUGQIC_PIPELINES_HOME/pipelines/dnaseq/dnaseq.py --config $MUGQIC_PIPELINES_HOME/pipelines/dnaseq/dnaseq.base.ini $MUGQIC_PIPELINES_HOME/pipelines/dnaseq/dnaseq.guillimin.ini $MUGQIC_PIPELINES_HOME/resources/genomes/config/Mus_musculus.GRCm38.ini ...
RNA-Seq, RNA-Seq De Novo Assembly and ChIP-Seq pipelines can perform differential expression analysis if they are provided with an input Design File.
The Design File is a TAB-separated values plain text file with one line per sample and the following columns:
- Sample: first column; must contain letters A-Z, numbers 0-9, hyphens (-) or underscores (_) only; the sample name must match a sample name in the readset file; mandatory;
- : each of the following columns defines an experimental design contrast; the column name defines the contrast name, and the following values represent the sample group membership for this contrast:
- '0' or '': the sample does not belong to any group;
- '1': the sample belongs to the control group;
- '2': the sample belongs to the treatment test case group.
Example:
Sample Contrast1 Contrast2 Contrast3
sampleA 1 1 1
sampleB 2 0 1
sampleC 0 2 0
sampleD 0 0 2
Peak calling type must be specified by adding to the contrast name either ,N
for Narrow peak calling, or ,B
for Broad peak calling.
Example:
Sample Contrast1,N Contrast2,B
sampleA 1 1
sampleB 2 0
sampleC 0 2
Warning for ChIP-Seq pipeline users: the values '1' for control and '2' for treatment are reversed compared to the old Perl version.
While pipelines are run, some jobs create a partial analysis report in Markdown format in
<output_dir>/report/<pipeline_name>.<step_name>.md
e.g. <output_dir>/report/DnaSeq.bwa_mem_picard_sort_sam.md
.
At any time during the pipeline processing, you can run the same pipeline command and add the option --report
.
This will create a bash script calling the Pandoc converter to aggregate all partial Markdown reports already created into one single HTML document, which you can view in <output_dir>/report/index.html
.
Thus, if the last pipeline steps fail, you will still get an HTML report containing sections for the first steps only.
The report title value can be overwritten in your copy of $MUGQIC_PIPELINES_HOME/pipelines/<pipeline_name>/<pipeline_name>.base.ini
in section [report]
.
You can also edit the partial Markdown reports before running the pandoc script, to add custom comments in your HTML report.
For developers: if you want to modify the Markdown report templates, they are all located in $MUGQIC_PIPELINES_HOME/bfx/report/
.
When pipelines are run in PBS (Portable Batch System) job scheduler mode (default), a job list file is created in <output_dir>/job_output/<PipelineName>_job_list_<timestamp>
and subsequent job log files are placed in <output_dir>/job_output/<step_name>/<job_name>_<timestamp>.o
e.g.:
#!text
my_output_dir/job_output/
├── RnaSeqDeNovoAssembly_job_list_2014-09-30T19.52.29
├── trimmomatic
│ ├── trimmomatic.readset1_2014-09-30T19.52.29.o
│ └── trimmomatic.readset2_2014-09-30T19.52.29.o
├── trinity
│ └── trinity_2014-10-01T14.17.02.o
└── trinotate
└── trinotate_2014-10-22T14.05.58.o
To view a TAB-separated values log report, use $MUGQIC_PIPELINES_HOME/utils/log_report.pl
script by typing:
#!bash
$MUGQIC_PIPELINES_HOME/utils/log_report.pl <output_dir>/job_output/<PipelineName>_job_list_<timestamp>
which will output e.g.:
#!text
# Number of jobs: 41
#
# Number of successful jobs: 4
# Number of active jobs: 0
# Number of inactive jobs: 36
# Number of failed jobs: 1
#
# Execution time: 2014-09-30T19:52:58 - 2014-09-30T22:38:04 (2 h 45 min 6 s)
#
# Shortest job: merge_trimmomatic_stats (1 s)
# Longest job: insilico_read_normalization_readsets.readset2 (1 h 33 min 53 s)
#
# Lowest memory job: merge_trimmomatic_stats (0.00 GiB)
# Highest memory job: insilico_read_normalization_readsets.readset2 (31.32 GiB)
#
#JOB_ID JOB_FULL_ID JOB_NAME JOB_DEPENDENCIES STATUS JOB_EXIT_CODE CMD_EXIT_CODE REAL_TIME START_DATE END_DATE CPU_TIME CPU_REAL_TIME_RATIO PHYSICAL_MEM VIRTUAL_MEM EXTRA_VIRTUAL_MEM_PCT LIMITS QUEUE USERNAME GROUP SESSION ACCOUNT NODES PATH
2100213.abacus2.ferrier.genome.mcgill.ca 2100213.abacus2.ferrier.genome.mcgill.ca trimmomatic.readset1 SUCCESS N/A 0 01:08:45 (1 h 8 min 45 s) 2014-09-30T19:52:58 2014-09-30T21:01:48 02:39:34 (2 h 39 min 34 s) 2.32 1.71 GiB 3.73 GiB 118.2 % neednodes=1:ppn=6,nodes=1:ppn=6,walltime=24:00:00 sw jfillon analyste 2465764 N/A f3c10 /path/to/output_dir/job_output/trimmomatic/trimmomatic.readset1_2014-09-30T19.52.29.o
2100214.abacus2.ferrier.genome.mcgill.ca 2100214.abacus2.ferrier.genome.mcgill.ca trimmomatic.readset2 SUCCESS N/A 0 01:08:59 (1 h 8 min 59 s) 2014-09-30T19:52:58 2014-09-30T21:02:01 02:40:05 (2 h 40 min 5 s) 2.32 1.41 GiB 3.73 GiB 164.0 % neednodes=1:ppn=6,nodes=1:ppn=6,walltime=24:00:00 sw jfillon analyste 2465669 N/A f3c10 /path/to/output_dir/job_output/trimmomatic/trimmomatic.readset2_2014-09-30T19.52.29.o
2100215.abacus2.ferrier.genome.mcgill.ca 2100215.abacus2.ferrier.genome.mcgill.ca merge_trimmomatic_stats 2100213.abacus2.ferrier.genome.mcgill.ca:2100214.abacus2.ferrier.genome.mcgill.ca SUCCESS N/A 0 00:00:01 (1 s) 2014-09-30T21:04:06 2014-09-30T21:04:12 00:00:00 (0 s) 0.00 0.00 GiB 0.00 GiB N/A neednodes=1:ppn=1,nodes=1:ppn=1,walltime=120:00:00 sw jfillon analyste 3343994 N/A f3c11 /path/to/output_dir/job_output/merge_trimmomatic_stats/merge_trimmomatic_stats_2014-09-30T19.52.29.o
2100216.abacus2.ferrier.genome.mcgill.ca 2100216.abacus2.ferrier.genome.mcgill.ca insilico_read_normalization_readsets.readset1 2100213.abacus2.ferrier.genome.mcgill.ca FAILED N/A N/A 00:38:16 (38 min 16 s) 2014-09-30T21:02:02 2014-09-30T21:40:23 04:50:10 (4 h 50 min 10 s) 7.58 30.71 GiB 32.32 GiB 5.3 % neednodes=1:ppn=6,nodes=1:ppn=6,walltime=120:00:00 sw jfillon analyste 3343745 N/A f3c11 /path/to/output_dir/job_output/insilico_read_normalization_readsets/insilico_read_normalization_readsets.readset1_2014-09-30T19.52.29.o
...
When pipeline jobs are submitted, a call home feature is invoked to collect some usage data. Those data are used to compute statistics and justify grant applications for funding support.
Data collected:
- Date and time
- Host and IP address
- Pipeline name
- Number of samples
- Pipeline steps
Please visit our mailing list to find questions and answers about MUGQIC Pipelines.
To subscribe to the mailing list and receive other people's messages, send an e-mail at [email protected]. You will receive an invitation which you must accept.
To use it, send us an e-mail at [email protected].
You can also report bugs at [email protected].
- Messages should not be sent directly to our team members. The generic e-mail addresses above are viewable by all of us and facilitate the follow-up of your request.
- Choose a meaningful subject for your message.
- Include the pipeline version number in your message (and the commit number if applicable).
- Provide the following information relevant to the problem encountered: the python command, the bash submission script, the output (job_outputs//.o) file,
- An error message or code snippet illustrating your request is normally very useful.