Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add profile for ARM compatibility #1425

Merged
merged 49 commits into from
Oct 23, 2024
Merged
Show file tree
Hide file tree
Changes from 40 commits
Commits
Show all changes
49 commits
Select commit Hold shift + click to select a range
d59ecf7
appy changes for ARM
pabloaledo May 16, 2024
991637c
Merge branch 'arm_3.16.1' into dev
pinin4fjords Oct 17, 2024
4bdae48
Fix up config
pinin4fjords Oct 17, 2024
c9d616b
Reset environment ymls
pinin4fjords Oct 17, 2024
9f640ab
Reset main.nfs with software changes only
pinin4fjords Oct 17, 2024
00cff4c
Link sw configs to arm profile
pinin4fjords Oct 17, 2024
f281e0b
Add process block
pinin4fjords Oct 17, 2024
f5208dd
note pins
pinin4fjords Oct 17, 2024
ea8b9e8
Remove conda overrides that were having no impact
pinin4fjords Oct 17, 2024
db02d99
Use different RSEQC pin
pinin4fjords Oct 18, 2024
4a8ed2b
update BEDGRAPHTOBIGWIG pin
pinin4fjords Oct 18, 2024
b86a6f9
Fix annotation
pinin4fjords Oct 18, 2024
1b96eb8
Don't think we need the RSEQC pin
pinin4fjords Oct 18, 2024
4e5d7a3
correct ucsc
pinin4fjords Oct 18, 2024
bbd3452
Merge pull request #1 from pinin4fjords/sw_to_conf
pabloaledo Oct 18, 2024
902367e
Triage arm deps
pinin4fjords Oct 18, 2024
328d02d
Non-igenomes STAR can use the latest star
pinin4fjords Oct 18, 2024
2f92bfc
remove rogue comment
pinin4fjords Oct 18, 2024
0a563dc
update arm conf
pinin4fjords Oct 21, 2024
0be2d6a
Cut out module redundant software overrides
pinin4fjords Oct 22, 2024
bba6b92
Merge pull request #2 from pinin4fjords/triage_arm
pabloaledo Oct 22, 2024
56f1a47
Merge pull request #1414 from pabloaledo/dev
pinin4fjords Oct 22, 2024
a5e831f
Merge branch 'dev' into arm_3.16.1
pinin4fjords Oct 22, 2024
ed026be
Add frozen ARM builds
pinin4fjords Oct 22, 2024
a8d514f
Container directives for conda overrides
pinin4fjords Oct 22, 2024
8e14711
Temporary Trimgalore override, singularity fixes for rsem processes
pinin4fjords Oct 22, 2024
161cf95
update docs
pinin4fjords Oct 22, 2024
caad7fa
Fix argument conflict between STAR versions
pinin4fjords Oct 22, 2024
cdfb98b
Fix config error
pinin4fjords Oct 22, 2024
524e19e
Correction to star rsem
pinin4fjords Oct 22, 2024
7448aed
Merge branch 'arm_3.16.1' of github.com:nf-core/rnaseq into arm_3.16.1
pinin4fjords Oct 22, 2024
4f8de94
Refine config setup
pinin4fjords Oct 22, 2024
a1d558e
Bump trimgalore module
pinin4fjords Oct 22, 2024
dae2899
Fix up ARM profile for updated trim-galore
pinin4fjords Oct 22, 2024
d3580a4
Apply suggestions from code review
pinin4fjords Oct 23, 2024
32f7d32
Apply suggestions from code review
pinin4fjords Oct 23, 2024
7df162f
Add notes for kraken2/ braken
pinin4fjords Oct 23, 2024
642d2a8
Update docs
pinin4fjords Oct 23, 2024
d9e534d
prettier
pinin4fjords Oct 23, 2024
d4fcb56
lint fix
pinin4fjords Oct 23, 2024
896dd74
[skip ci] Reorg ARM bits
pinin4fjords Oct 23, 2024
314ea85
Update arm.config
drpatelh Oct 23, 2024
251481f
Fix syntax error
pinin4fjords Oct 23, 2024
cde5fef
Merge branch 'arm_3.16.1' of github.com:nf-core/rnaseq into arm_3.16.1
pinin4fjords Oct 23, 2024
f59132a
revert trimgalore tweak to appease linter
pinin4fjords Oct 23, 2024
40f05c5
Update pipeline-level tests
pinin4fjords Oct 23, 2024
2156f5d
Merge branch 'arm_3.16.1' of https://github.com/nf-core/rnaseq into a…
pinin4fjords Oct 23, 2024
79370d3
Update CHANGELOG
pinin4fjords Oct 23, 2024
424137a
Merge branch 'arm_3.16.1' of github.com:nf-core/rnaseq into arm_3.16.1
pinin4fjords Oct 23, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
271 changes: 271 additions & 0 deletions conf/arm.config

Large diffs are not rendered by default.

34 changes: 34 additions & 0 deletions docs/usage.md
Original file line number Diff line number Diff line change
Expand Up @@ -116,6 +116,10 @@ If you would like to reduce the number of reads used in the analysis, for exampl

## Alignment options

:::note
The `--aligner hisat2` option is not currently supported using ARM architecture ('-profile arm')
:::

By default, the pipeline uses [STAR](https://github.com/alexdobin/STAR) (i.e. `--aligner star_salmon`) to map the raw FastQ reads to the reference genome, project the alignments onto the transcriptome and to perform the downstream BAM-level quantification with [Salmon](https://salmon.readthedocs.io/en/latest/salmon.html). STAR is fast but requires a lot of memory to run, typically around 38GB for the Human GRCh37 reference genome. Since the [RSEM](https://github.com/deweylab/RSEM) (i.e. `--aligner star_rsem`) workflow in the pipeline also uses STAR you should use the [HISAT2](https://ccb.jhu.edu/software/hisat2/index.shtml) aligner (i.e. `--aligner hisat2`) if you have memory limitations.

You also have the option to pseudoalign and quantify your data directly with [Salmon](https://salmon.readthedocs.io/en/latest/salmon.html) or [Kallisto](https://pachterlab.github.io/kallisto/) by specifying `salmon` or `kallisto` to the `--pseudo_aligner` parameter. The selected pseudoaligner will then be run in addition to the standard alignment workflow defined by `--aligner`, mainly because it allows you to obtain QC metrics with respect to the genomic alignments. However, you can provide the `--skip_alignment` parameter if you would like to run Salmon or Kallisto in isolation. By default, the pipeline will use the genome fasta and gtf file to generate the transcripts fasta file, and then to build the Salmon index. You can override these parameters using the `--transcript_fasta` and `--salmon_index` parameters, respectively.
Expand Down Expand Up @@ -298,6 +302,10 @@ By default, the input GTF file will be filtered to ensure that sequence names co

## Contamination screening options

:::note
The `--contaminant_screening` option is not currently available using ARM architecture ('-profile arm')
:::

The pipeline provides the option to scan unaligned reads for contamination from other species using [Kraken2](https://ccb.jhu.edu/software/kraken2/), with the possibility of applying corrections from [Bracken](https://ccb.jhu.edu/software/bracken/). Since running Bracken is not computationally expensive, we recommend always using it to refine the abundance estimates generated by Kraken2.

It is important to note that the accuracy of Kraken2 is [highly dependent on the database](https://doi.org/10.1099/mgen.0.000949) used. Specifically, it is [crucial](https://doi.org/10.1128/mbio.01607-23) to ensure that the host genome is included in the database. If you are particularly concerned about certain contaminants, it may be beneficial to use a smaller, more focused database containing primarily those contaminants instead of the full standard database. Various pre-built databases [are available for download](https://benlangmead.github.io/aws-indexes/k2), and instructions for building a custom database can be found in the [Kraken2 documentation](https://github.com/DerrickWood/kraken2/blob/master/docs/MANUAL.markdown). Additionally, genomes of contaminants detected in previous sequencing experiments are available on the [OpenContami website](https://openlooper.hgc.jp/opencontami/help/help_oct.php).
Expand Down Expand Up @@ -356,6 +364,26 @@ genome: 'GRCh37'

You can also generate such `YAML`/`JSON` files via [nf-core/launch](https://nf-co.re/launch).

### Running on Linux ARM architectures

The pipeline can be executed in an ARM compatible mode by specifying the ARM profile, for example:

```bash
nextflow run \
nf-core/rnaseq \
--input <SAMPLESHEET> \
--outdir <OUTDIR> \
--gtf <GTF> \
--fasta <GENOME FASTA> \
-profile docker,arm
```

This will use ARM-compatible containers, and apply a small number of overrides to Conda definitions to support ARM operation.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
This will use ARM-compatible containers, and apply a small number of overrides to Conda definitions to support ARM operation.
This will use ARM-compatible containers, and apply a small number of overrides to Conda definitions to support ARM operations.


:::warning
Please note that the ARM profile is experimental. It is expected to function correctly in all cases unless explicitly indicated otherwise—currently, exceptions include the use of the hisat2 aligner and contaminant screening via kraken2. However, because testing is presently conducted manually, we cannot guarantee its reliability.
:::

pinin4fjords marked this conversation as resolved.
Show resolved Hide resolved
### Updating the pipeline

When you run the above command, Nextflow automatically pulls the pipeline code from GitHub and stores it as a cached version. When running the pipeline after this, it will always use the cached version if available - even if the pipeline has been updated since. To make sure that you're running the latest version of the pipeline, make sure that you regularly update the cached version of the pipeline:
Expand Down Expand Up @@ -420,6 +448,12 @@ If `-profile` is not specified, the pipeline will run locally and expect all sof
- A generic configuration profile to enable [Wave](https://seqera.io/wave/) containers. Use together with one of the above (requires Nextflow ` 24.03.0-edge` or later).
- `conda`
- A generic configuration profile to be used with [Conda](https://conda.io/docs/). Please only use Conda as a last resort i.e. when it's not possible to run the pipeline with Docker, Singularity, Podman, Shifter, Charliecloud, or Apptainer.
- `arm`
- A configuration profile that will set `docker.runOptions` appropriately for ARM architectures, and apply overrides supplying ARM-compatible containers and Conda environments.
pinin4fjords marked this conversation as resolved.
Show resolved Hide resolved

:::warning
Please note that the ARM profile is experimental. It is expected to function correctly in all cases unless explicitly indicated otherwise—currently, exceptions include the use of the hisat2 aligner and contaminant screening via kraken2. However, because testing is presently conducted manually, we cannot guarantee its reliability.
:::

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
:::warning
Please note that the ARM profile is experimental. It is expected to function correctly in all cases unless explicitly indicated otherwise—currently, exceptions include the use of the hisat2 aligner and contaminant screening via kraken2. However, because testing is presently conducted manually, we cannot guarantee its reliability.
:::

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we just link to the new section we created in the arm profile above? Means we don't need to repeat the text.

### `-resume`

Expand Down
4 changes: 2 additions & 2 deletions modules.json
Original file line number Diff line number Diff line change
Expand Up @@ -245,7 +245,7 @@
},
"trimgalore": {
"branch": "master",
"git_sha": "49f4e50534fe4b64101e62ea41d5dc43b1324358",
"git_sha": "8c5eeedd45e295fc9a4f164631da6a8b37e6b9c6",
"installed_by": ["fastq_fastqc_umitools_trimgalore"]
},
"tximeta/tximport": {
Expand Down Expand Up @@ -333,7 +333,7 @@
},
"fastq_fastqc_umitools_trimgalore": {
"branch": "master",
"git_sha": "49f4e50534fe4b64101e62ea41d5dc43b1324358",
"git_sha": "8c5eeedd45e295fc9a4f164631da6a8b37e6b9c6",
"installed_by": ["fastq_qc_trim_filter_setstrandedness", "subworkflows"]
},
"fastq_qc_trim_filter_setstrandedness": {
Expand Down
2 changes: 1 addition & 1 deletion modules/nf-core/trimgalore/environment.yml

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

4 changes: 2 additions & 2 deletions modules/nf-core/trimgalore/main.nf

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

20 changes: 10 additions & 10 deletions modules/nf-core/trimgalore/tests/main.nf.test.snap

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

1 change: 1 addition & 0 deletions nextflow.config
Original file line number Diff line number Diff line change
Expand Up @@ -186,6 +186,7 @@ profiles {
}
arm {
docker.runOptions = '-u $(id -u):$(id -g) --platform=linux/amd64'
includeConfig 'conf/arm.config'
}
singularity {
singularity.enabled = true
Expand Down
75 changes: 47 additions & 28 deletions subworkflows/local/align_star/nextflow.config
Original file line number Diff line number Diff line change
@@ -1,36 +1,55 @@
def generateStarAlignArgs(save_unaligned, contaminant_screening, extra_star_align_args) {
def argsToMap = { String args ->
args.split(/\s(?=--)/).collectEntries {
def parts = it.trim().split(/\s+/, 2)
[(parts[0]): parts.size() > 1 ? parts[1] : '']
}
}

def base_args = """
--quantMode TranscriptomeSAM
--twopassMode Basic
--outSAMtype BAM Unsorted
--readFilesCommand zcat
--runRNGseed 0
--outFilterMultimapNmax 20
--alignSJDBoverhangMin 1
--outSAMattributes NH HI AS NM MD
--outSAMstrandField intronMotif
""".trim()

if (save_unaligned || contaminant_screening) {
base_args += "\n--outReadsUnmapped Fastx"
}

def final_args_map = argsToMap(base_args) + (extra_star_align_args ? argsToMap(extra_star_align_args) : [:])
final_args_map.collect { key, value -> "${key} ${value}".trim() }.join(' ')
}

if (!params.skip_alignment && params.aligner == 'star_salmon') {
process {
withName: '.*:ALIGN_STAR:STAR_ALIGN|.*:ALIGN_STAR:STAR_ALIGN_IGENOMES' {
ext.args = {
// Function to convert argument strings into a map
def argsToMap = { String args ->
args.split("\\s(?=--)").collectEntries {
def parts = it.trim().split(/\s+/, 2)
[(parts.first()): parts.last()]
}
}

// Initialize the map with preconfigured values
def preset_args_map = argsToMap("""
--quantMode TranscriptomeSAM
--twopassMode Basic
--outSAMtype BAM Unsorted
--readFilesCommand zcat
--runRNGseed 0
--outFilterMultimapNmax 20
--alignSJDBoverhangMin 1
--outSAMattributes NH HI AS NM MD
--quantTranscriptomeSAMoutput BanSingleEnd
--outSAMstrandField intronMotif
${params.save_unaligned || params.contaminant_screening ? '--outReadsUnmapped Fastx' : ''}
""".trim())

// Consolidate the extra arguments
def final_args_map = preset_args_map + (params.extra_star_align_args ? argsToMap(params.extra_star_align_args) : [:])
// We have to condition this, because the args are slightly different between the latest STAR and the one compatible with iGenomes

// Convert the map back to a list and then to a single string
final_args_map.collect { key, value -> "${key} ${value}" }.join(' ').trim()
withName: '.*:ALIGN_STAR:STAR_ALIGN' {
ext.args = {
generateStarAlignArgs(
params.save_unaligned,
params.contaminant_screening,
(params.extra_star_align_args ?: '') + ' --quantTranscriptomeSAMoutput BanSingleEnd'
)
}
}
withName: '.*:ALIGN_STAR:STAR_ALIGN_IGENOMES' {
ext.args = {
generateStarAlignArgs(
params.save_unaligned,
params.contaminant_screening,
(params.extra_star_align_args ?: '') + ' --quantTranscriptomeBan Singleend'
)
}
}
withName: '.*:ALIGN_STAR:STAR_ALIGN|.*:ALIGN_STAR:STAR_ALIGN_IGENOMES' {

publishDir = [
[
Expand Down
Loading
Loading