Skip to content

Commit

Permalink
Update README and input data (#4)
Browse files Browse the repository at this point in the history
### Updated
Clarified installation and setup instructions
Data input and README
Default time configurations
Turned off default hostStats step that is not used

### Added
ErrorStrategy retry
ErrorStrategy ignore for the assembly process
  • Loading branch information
talnor authored May 25, 2022
1 parent a25af58 commit 6b6d0e9
Show file tree
Hide file tree
Showing 1,358 changed files with 322,259 additions and 137,659 deletions.
54 changes: 38 additions & 16 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -11,17 +11,22 @@ PLoS Comput Biol 13(10): e1005775. https://doi.org/10.1371/journal.pcbi.1005775

## Installation and set up

### Install nextflow
`conda create -n time_analysis nextflow`
### Install required software

### Set up container image
A container image is available from Docker at `talnor/hiv_time_analysis`.
By default, this image will be used when using the docker or singularity profiles.
Make sure the following is installed, or install them:
- Singularity or Docker
- Nextflow or conda (install nextflow using conda as described below)

Alternatively, the container can be manually downloaded, or rebuilt from the Dockerfile in this
repo. If so update the settings in the `nextflow.config`.
* manually pull singularity image: `singularity pull path/to/hiv_time_analysis.sif docker://talnor/hiv_time_analysis:<version>`
* manually pull docker image: `docker pull talnor/hiv_time_analysis:<version>`
```
conda create -n time_analysis nextflow
conda activate time_analysis
```

### Install TIME_pipeline

```
git clone https://github.com/talnor/TIME_pipeline.git
```

### Configure pipeline options in the nextflow config file
Default parameters and settings for running the pipeline are specified in `nextflow.config`.
Expand All @@ -38,26 +43,34 @@ In addition, default settings for primers, adapters and similar configurations a
detail [here](data/README.md).

### Download host reference genome
Set `nextflow.config` parameter `hostFasta` to the path of the host reference genome of your choice.

For example, the following command can be run to download the human reference genome:
```
wget https://ftp.ncbi.nlm.nih.gov/genomes/all/GCF/000/001/405/GCF_000001405.39_GRCh38.p13/GCF_000001405.39_GRCh38.p13_genomic.fna.gz
```

Then set up the host database with the following command. The database will be placed in the `hostGenome` directory
and will be namned as `hostGenomeBase`.
```
nextflow run main.nf --setup -profile slurm,singularity --outdir <outdir>
nextflow run main.nf --setup -profile slurm,singularity --hostFasta <path_to_genome> --outdir <outdir>
```

### Run Shiver initialisation
Shiver initilisation directories are included in this repository. Information on these are
available [here](data/README.md). To create your own initilisation directory, run the following command:
The Shiver initialization directory includes the set of primers used during the amplification of the samples as well as a
reference dataset to be used in the analysis. Several options are included in this repository. Information on these are
available [here](data/README.md). To create your own initialization directory, run the following command:
```
nextflow run main.nf --init -profile slurm,singularity --primers <primers.fasta> --adapters <adapters.fasta> --config <shiver_config.sh> --references <references.fasta> --outdir <outdir>
```

## Usage
`conda activate time_analysis`
- Ensure the settings in the `nextflow.config` are correct for your samples. Importantly, the primer set and the
initialization directory needs to match the primers used during amplification of the samples. Or override the
default values by supplying them as parameters in the command below.

Basic usage:

```
conda activate time_analysis
nextflow run main.nf -profile slurm,singularity --input 'path/to/*_R{1,2}.fastq.gz' --outdir path/to/results/ --ticket <batch_name>
```

Expand All @@ -68,4 +81,13 @@ The pipeline can be executed on your **local** computer or with a **slurm** reso
Check the command help for more info and options.
```
nextflow run main.nf --help
```
```

## Optional installation steps

### Set up container image
A container image is available from Docker at `talnor/hiv_time_analysis`.
By default, this image will be used when using the docker or singularity profiles. If this doesn't work, the container can be manually downloaded, or rebuilt from the Dockerfile in this
repo. If so update the settings in the `nextflow.config`.
* manually pull singularity image: `singularity pull path/to/hiv_time_analysis.sif docker://talnor/hiv_time_analysis:<version>`
* manually pull docker image: `docker pull talnor/hiv_time_analysis:<version>`
3 changes: 2 additions & 1 deletion bin/calculate_eti.py
Original file line number Diff line number Diff line change
Expand Up @@ -171,7 +171,8 @@ def calculate_ETI(pairwise_distance, eti_m, eti_c):
)
for sample_info in samples:
sample_info = sample_info.strip().lstrip("[").strip("]").strip(",")
sample = sample_info.split("_")[1]
#sample = sample_info.split("_")[1]
sample = sample_info
base_frequency_file = glob.glob(
os.path.join(
inputdir, "{}_remap_BaseFreqs_WithHXB2.csv".format(sample_info)
Expand Down
2 changes: 1 addition & 1 deletion configs/executor_options.config
Original file line number Diff line number Diff line change
Expand Up @@ -12,7 +12,7 @@ process {
withLabel: buildDatabase {
cpus = 1
memory = '8 GB'
time = '30m'
time = '2h'
}
withLabel: trimming {
cpus = 6
Expand Down
10 changes: 5 additions & 5 deletions configs/executor_options_large.config
Original file line number Diff line number Diff line change
Expand Up @@ -12,7 +12,7 @@ process {
withLabel: buildDatabase {
cpus = 1
memory = '8 GB'
time = '30m'
time = '2h'
}
withLabel: trimming {
cpus = 6
Expand All @@ -31,13 +31,13 @@ process {
}
withLabel: assembly {
cpus = 6
memory = '20 GB'
time = '3h'
memory = '60 GB'
time = '8h'
}
withLabel: shiver {
cpus = 1
memory = '10 GB'
time = '1h'
time = '4h'
}
withLabel: infectionEstimation {
cpus = 1
Expand All @@ -49,4 +49,4 @@ process {
memory = '3 GB'
time = '10m'
}
}
}
87 changes: 17 additions & 70 deletions data/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -18,18 +18,15 @@ run command:
Below follows a description of the files that are made available in this directory.

### Primers
| Version | Primers | Description |
| ----------- | ----------- | ----------- |
| 1 | Primers_A_elife-11282-supp2-v2_PCR1-2_primers_A_primers_RC.fasta | |
| 2 | Primers_A_elife-11282-supp2-v2_PCR1_primers_A_primers_RC.fasta | |
| 3 | Primers_A_elife-11282-supp2-v2_PCR1_primers_A_primers_RC_Bprimers_fragment4.fasta | |
| 4 | Primers_A_elife-11282-supp2-v2_PCR2_primers_A_primers.fasta | |
| 5 | primers_1_amplicon_PCR1-2_190620.fasta | |
| 6 | primers_1_amplicon_PCR1_190620.fasta | Full genome amplified with 1 primer pair |
| 7 | primers_1_amplicon_PCR2_190620.fasta | |
| 8 | primers_B1_180119.fasta | |
| 9 | primers_B1_201203.fasta | |
| 10 | primers_B1_201203_PCR1.fasta | |
| Version | File | Primer type | Description |
| ----------- | ----------- | ----------- | ----------- |
| 1 | A_primers_6AMP_PCR1-2.fasta | A | 6 primer pairs, PCR1+PCR2 |
| 2 | A_primers_6AMP_PCR1.fasta | A | 6 primer pairs, PCR1 |
| 3 | A_primers_6AMP_PCR1_F4-Bprimers.fasta | A+B | 6 primer pairs, PCR1, B primers used for fragment 4 |
| 4 | B_primers_1AMP_PCR1-2.fasta | B | 1 primer pair, PCR1+PCR2 |
| 5 | AB_primers_1AMP_PCR1.fasta | A, B | 1 primer pair, PCR1, A and B primers are identical |
| 6 | B_primers_6AMP_PCR1-2.fasta | B | 6 primer pairs, PCR1+PCR2 |
| 7 | B_primers_6AMP_PCR1.fasta | B | 6 primer pairs, PCR1 |

### Adapters
| Version | Adapters | Description |
Expand All @@ -40,8 +37,7 @@ Below follows a description of the files that are made available in this directo
| Version | Configurations | Description |
| ----------- | ----------- | ----------- |
| 1 | original_config.sh | Default settings used in Shiver |
| 2 | shiver_config_BQ30_notrimming.sh | TIME-study settings |
| 3 | config_BQ30.sh | Older settings |
| 2 | shiver_config_BQ20_notrimming.sh | TIME-study settings |

#### Configuration file 2

Expand All @@ -51,67 +47,18 @@ The following options in Shiver are altered. For the full list of options see th
| ----------- | ----------- | ----------- | ----------- |
| TrimReadsForAdaptersAndQual | false | true | Trim adapaters and low quality bases from reads using trimmomatic? |
| TrimReadsForPrimers | false | true | Trim exact matches to PCR primers from the end of reads using fastaq? |
| mpileupOptions | --min-BQ 30 | --min-BQ 5 | Higher quality threshold for individual bases |
| deduplicate | true | false | Remove read pairs marked as duplicates? This can cause loss of diversity in the reads due to true biological variation as well sequencing error. |

#### Configuration file 3

The following options in Shiver are altered. For the full list of options see the default config.

| Parameter | Value | Default | Description |
| ----------- | ----------- | ----------- | ----------- |
| mpileupOptions | --min-BQ 30 | --min-BQ 5 | Higher quality threshold for individual bases |
| mpileupOptions | --min-BQ 20 | --min-BQ 5 | Higher quality threshold for individual bases |
| deduplicate | true | false | Remove read pairs marked as duplicates? This can cause loss of diversity in the reads due to true biological variation as well sequencing error. |


### Shiver init directory
| Version | InitDir | Description |
| ----------- | ----------- | ----------- |
| 1 | InitDirShiver220223_BQ30_1amp | 1 amplicon primers, 2020 references, no UTRs |
| 2 | InitDirShiver220128_BQ30_1amp | 1 amplicon primers, 2020 references |
| 3 | InitDirShiver190405_BQ30 | |
| 4 | InitDirShiver191022_BQ30_PANHIV | 1 amplicon primers, 2018 references |

#### Shiver init directory 1

**Name**: InitDirShiver220223_BQ30_1amp
**Created**: 2022-02-23

| Content | Description |
| ----------- | ----------- |
| Primer | primers_1_amplicon_PCR1_190620.fasta |
| Adapter | NexteraPE-PE.fa |
| Configurations | shiver_config_BQ30_notrimming.sh |
| References | HIV1_COM_2020_547-9592_DNA.fasta |

#### Shiver init directory 2

**Name**: InitDirShiver220128_BQ30_1amp
**Created**: 2022-01-28

| Content | Description |
| ----------- | ----------- |
| Primer | primers_1_amplicon_PCR1_190620.fasta |
| Adapter | NexteraPE-PE.fa |
| Configurations | shiver_config_BQ30_notrimming.sh |
| References | HIV1_COM_2020_genome_DNA.fasta |

#### Shiver init directory 3

**Name**: InitDirShiver190405_BQ30
**Created**: 2019-04-05

#### Shiver init directory 4

**Name**: InitDirShiver191022_BQ30_PANHIV
**Created**: 2019-10-22

| Content | Description |
| ----------- | ----------- |
| Primer | primers_1_amplicon_PCR1-2_190620.fasta |
| Adapter | NexteraPE-PE.fa |
| Configurations | config_BQ30.sh |
| References | HIV1_COM_2017_547-9592_DNA_2018Compendium.fasta |
| 1 | InitDirShiver220516_BQ20_1AMP_Bprimers_PCR1 | 1 amplicon primers, 2020 references (no UTRs), primers: B, PCR1 |
| 2 | InitDirShiver220516_BQ20_1AMP_Bprimers_PCR2 | 1 amplicon primers, 2020 references (no UTRs), primers: B, PCR1+PCR2 |
| 3 | InitDirShiver220516_BQ20_6AMP_ABprimers_PCR1 | 6 amplicon primers, 2020 references (no UTRs), primers: A+B(F4), PCR1 |
| 4 | InitDirShiver220516_BQ20_6AMP_Aprimers_PCR1-2 | 6 amplicon primers, 2020 references (no UTRs) primers: A, PCR1+2 |
| 5 | InitDirShiver220516_BQ20_6AMP_Aprimers_PCR1 | 6 amplicon primers, 2020 references (no UTRs), primers: A, PCR1 |
| 6 | InitDirShiver220525_BQ20_6AMP_Bprimers_PCR1 | 6 amplicon primers, 2020 references (no UTRs), primers: B, PCR1 |

### References to use in Shiver alignments
Reference compendiums with representative genomes can be downloaded from the
Expand Down
Loading

0 comments on commit 6b6d0e9

Please sign in to comment.