Draft: Add feature for sample demultiplexing followed by immune profiling #365

herpov · 2024-08-16T07:12:04Z

Description of changes

The current nf-core/scrnaseq version (v1.7.0) does not handle the use case where data is both sample multiplexed and requires immune profiling. The current version can however handle either situation separately.

The cellranger software provided by 10x does not handle this situation either, but this tutorial guides the user to follow 3 steps:

Run cellranger multi to demultiplex the samples
Convert the output .bam files to .fq files
Run cellranger multi with immune profiling for each sample using the generated .fq files

This PR serves to enable that for the workflow. This has required an additional tool (nf-core/bamtofastq10x), some new code, and a bit of rearrangement of existing code.

Rearrangement of code

Since cellranger multi is to be run multiple times during a nextflow run, I moved the code for preparing the reference genome from subworkflows/local/align_cellrangermulti.nf and generated a new subworkflow, cellrangermulti_ref.nf.

Added tool

I added the tool nf-core/bamtofastq10x.

New code

I added the subworkflow align_cellrangermulti_vdj.nf based on align_cellrangermulti.nf which contains the above described steps: 1,2, and 3. For step 1 and 3 the nf-core cellranger multi module is invoked as in align_cellrangermulti.nf.

The main changes lie in channel operations in scrnaseq.nf and align_cellrangermulti_vdj.nf.

PR checklist

…kflows

…ng. additionally changing input channels

…by immune profiling

… cellranger multi or cellranger multi+vdj

…r to bamtofastq process

conf/modules.config

herpov · 2024-08-16T07:28:06Z

modules/nf-core/bamtofastq10x/main.nf

I have made several changes to this script, but idk if I should make a PR for the module itself?

The changes you made seem pretty specific to your use-case, so it doesn't make sense to update the central module.
It is also not allows to change modules in a pipeline (-> linting error).

The preferred way would be to make it somehow work with the nf-core module unchanged. If this is not possible, you can make a copy of the module in the "local" folder and adapt it as needed.

@grst now the changes are minimal to this module. The rest I could fix by manipulating the output channel. Should I make a PR for bamtofastq or should I still just move this to "local" dir?

Should I make a PR for bamtofastq

Yes please, the changes look like everyone will benefit from them.

modules/nf-core/bamtofastq10x/main.nf

herpov · 2024-08-16T07:33:26Z

modules/nf-core/bamtofastq10x/main.nf

+    bamtofastq \\
+        $args \\
+        $bam \\
+        $prefix


Changed the output path from ${prefix}.fastq.gz.

bamtofastq generates a directory containing two folders: one for GEX and one for CMO .fastq files.
The two folders are prefixed with the .bam prefix.
All files are automatically prefixed with bamtofastq.

modules/nf-core/bamtofastq10x/main.nf

conf/modules.config

grst · 2024-08-20T07:21:06Z

modules/nf-core/bamtofastq10x/main.nf

The changes you made seem pretty specific to your use-case, so it doesn't make sense to update the central module.
It is also not allows to change modules in a pipeline (-> linting error).

The preferred way would be to make it somehow work with the nf-core module unchanged. If this is not possible, you can make a copy of the module in the "local" folder and adapt it as needed.

grst · 2024-08-20T07:25:33Z

subworkflows/local/align_cellrangermulti_idx.nf

+include { CELLRANGER_MKVDJREF               } from "../../modules/nf-core/cellranger/mkvdjref/main.nf"
+
+// Define workflow to subset and index a genome region fasta file
+workflow CELLRANGER_MULTI_REF {


Please make sure the workflow name and filename match.

Do you accept the new name of the file?
cellrangermulti_ref.nf

subworkflows/local/align_cellrangermulti_idx.nf

modules/nf-core/bamtofastq10x/main.nf

github-actions · 2024-08-20T07:48:52Z

`nf-core lint` overall result: Passed ✅ ⚠️

Posted for pipeline commit 0132d9b

+| ✅ 205 tests passed       |+
#| ❔   4 tests were ignored |#
!| ❗   3 tests had warnings |!

❗ Test warnings:

pipeline_todos - TODO string in main.nf: Optionally add in-text citation tools to this list.
pipeline_todos - TODO string in main.nf: Optionally add bibliographic entries to this list.
pipeline_todos - TODO string in main.nf: Only uncomment below if logic in toolCitationText/toolBibliographyText has been filled!

❔ Tests ignored:

files_exist - File is ignored: lib/Utils.groovy
files_unchanged - File ignored due to lint config: .github/ISSUE_TEMPLATE/bug_report.yml
template_strings - template_strings
schema_params - schema_params

✅ Tests passed:

files_exist - File found: .gitattributes
files_exist - File found: .gitignore
files_exist - File found: .nf-core.yml
files_exist - File found: .editorconfig
files_exist - File found: .prettierignore
files_exist - File found: .prettierrc.yml
files_exist - File found: CHANGELOG.md
files_exist - File found: CITATIONS.md
files_exist - File found: CODE_OF_CONDUCT.md
files_exist - File found: LICENSE or LICENSE.md or LICENCE or LICENCE.md
files_exist - File found: nextflow_schema.json
files_exist - File found: nextflow.config
files_exist - File found: README.md
files_exist - File found: .github/.dockstore.yml
files_exist - File found: .github/CONTRIBUTING.md
files_exist - File found: .github/ISSUE_TEMPLATE/bug_report.yml
files_exist - File found: .github/ISSUE_TEMPLATE/config.yml
files_exist - File found: .github/ISSUE_TEMPLATE/feature_request.yml
files_exist - File found: .github/PULL_REQUEST_TEMPLATE.md
files_exist - File found: .github/workflows/branch.yml
files_exist - File found: .github/workflows/ci.yml
files_exist - File found: .github/workflows/linting_comment.yml
files_exist - File found: .github/workflows/linting.yml
files_exist - File found: assets/email_template.html
files_exist - File found: assets/email_template.txt
files_exist - File found: assets/sendmail_template.txt
files_exist - File found: assets/nf-core-scrnaseq_logo_light.png
files_exist - File found: conf/modules.config
files_exist - File found: conf/test.config
files_exist - File found: conf/test_full.config
files_exist - File found: docs/images/nf-core-scrnaseq_logo_light.png
files_exist - File found: docs/images/nf-core-scrnaseq_logo_dark.png
files_exist - File found: docs/output.md
files_exist - File found: docs/README.md
files_exist - File found: docs/README.md
files_exist - File found: docs/usage.md
files_exist - File found: main.nf
files_exist - File found: assets/multiqc_config.yml
files_exist - File found: conf/base.config
files_exist - File found: conf/igenomes.config
files_exist - File found: .github/workflows/awstest.yml
files_exist - File found: .github/workflows/awsfulltest.yml
files_exist - File found: modules.json
files_exist - File not found check: .github/ISSUE_TEMPLATE/bug_report.md
files_exist - File not found check: .github/ISSUE_TEMPLATE/feature_request.md
files_exist - File not found check: .github/workflows/push_dockerhub.yml
files_exist - File not found check: .markdownlint.yml
files_exist - File not found check: .nf-core.yaml
files_exist - File not found check: .yamllint.yml
files_exist - File not found check: bin/markdown_to_html.r
files_exist - File not found check: conf/aws.config
files_exist - File not found check: docs/images/nf-core-scrnaseq_logo.png
files_exist - File not found check: lib/Checks.groovy
files_exist - File not found check: lib/Completion.groovy
files_exist - File not found check: lib/NfcoreTemplate.groovy
files_exist - File not found check: lib/Workflow.groovy
files_exist - File not found check: lib/WorkflowMain.groovy
files_exist - File not found check: lib/WorkflowScrnaseq.groovy
files_exist - File not found check: parameters.settings.json
files_exist - File not found check: pipeline_template.yml
files_exist - File not found check: Singularity
files_exist - File not found check: lib/nfcore_external_java_deps.jar
files_exist - File not found check: .travis.yml
nextflow_config - Config variable found: manifest.name
nextflow_config - Config variable found: manifest.nextflowVersion
nextflow_config - Config variable found: manifest.description
nextflow_config - Config variable found: manifest.version
nextflow_config - Config variable found: manifest.homePage
nextflow_config - Config variable found: timeline.enabled
nextflow_config - Config variable found: trace.enabled
nextflow_config - Config variable found: report.enabled
nextflow_config - Config variable found: dag.enabled
nextflow_config - Config variable found: process.cpus
nextflow_config - Config variable found: process.memory
nextflow_config - Config variable found: process.time
nextflow_config - Config variable found: params.outdir
nextflow_config - Config variable found: params.input
nextflow_config - Config variable found: params.validationShowHiddenParams
nextflow_config - Config variable found: params.validationSchemaIgnoreParams
nextflow_config - Config variable found: manifest.mainScript
nextflow_config - Config variable found: timeline.file
nextflow_config - Config variable found: trace.file
nextflow_config - Config variable found: report.file
nextflow_config - Config variable found: dag.file
nextflow_config - Config variable (correctly) not found: params.nf_required_version
nextflow_config - Config variable (correctly) not found: params.container
nextflow_config - Config variable (correctly) not found: params.singleEnd
nextflow_config - Config variable (correctly) not found: params.igenomesIgnore
nextflow_config - Config variable (correctly) not found: params.name
nextflow_config - Config variable (correctly) not found: params.enable_conda
nextflow_config - Config timeline.enabled had correct value: true
nextflow_config - Config report.enabled had correct value: true
nextflow_config - Config trace.enabled had correct value: true
nextflow_config - Config dag.enabled had correct value: true
nextflow_config - Config manifest.name began with nf-core/
nextflow_config - Config variable manifest.homePage began with https://github.com/nf-core/
nextflow_config - Config dag.file ended with .html
nextflow_config - Config variable manifest.nextflowVersion started with >= or !>=
nextflow_config - Config manifest.version ends in dev: 2.8.0dev
nextflow_config - Config params.custom_config_version is set to master
nextflow_config - Config params.custom_config_base is set to https://raw.githubusercontent.com/nf-core/configs/master
nextflow_config - Lines for loading custom profiles found
nextflow_config - nextflow.config contains configuration profile test
nextflow_config - Config default value correct: params.aligner= alevin
nextflow_config - Config default value correct: params.protocol= auto
nextflow_config - Config default value correct: params.igenomes_base= s3://ngi-igenomes/igenomes/
nextflow_config - Config default value correct: params.simpleaf_rlen= 91
nextflow_config - Config default value correct: params.star_feature= Gene
nextflow_config - Config default value correct: params.kb_workflow= standard
nextflow_config - Config default value correct: params.custom_config_version= master
nextflow_config - Config default value correct: params.custom_config_base= https://raw.githubusercontent.com/nf-core/configs/master
nextflow_config - Config default value correct: params.max_cpus= 16
nextflow_config - Config default value correct: params.max_memory= 128.GB
nextflow_config - Config default value correct: params.max_time= 240.h
nextflow_config - Config default value correct: params.publish_dir_mode= copy
nextflow_config - Config default value correct: params.max_multiqc_email_size= 25.MB
nextflow_config - Config default value correct: params.validate_params= true
nextflow_config - Config default value correct: params.pipelines_testdata_base_path= https://raw.githubusercontent.com/nf-core/test-datasets/
files_unchanged - .gitattributes matches the template
files_unchanged - .prettierrc.yml matches the template
files_unchanged - CODE_OF_CONDUCT.md matches the template
files_unchanged - LICENSE matches the template
files_unchanged - .github/.dockstore.yml matches the template
files_unchanged - .github/CONTRIBUTING.md matches the template
files_unchanged - .github/ISSUE_TEMPLATE/config.yml matches the template
files_unchanged - .github/ISSUE_TEMPLATE/feature_request.yml matches the template
files_unchanged - .github/PULL_REQUEST_TEMPLATE.md matches the template
files_unchanged - .github/workflows/branch.yml matches the template
files_unchanged - .github/workflows/linting_comment.yml matches the template
files_unchanged - .github/workflows/linting.yml matches the template
files_unchanged - assets/email_template.html matches the template
files_unchanged - assets/email_template.txt matches the template
files_unchanged - assets/sendmail_template.txt matches the template
files_unchanged - assets/nf-core-scrnaseq_logo_light.png matches the template
files_unchanged - docs/images/nf-core-scrnaseq_logo_light.png matches the template
files_unchanged - docs/images/nf-core-scrnaseq_logo_dark.png matches the template
files_unchanged - docs/README.md matches the template
files_unchanged - .gitignore matches the template
files_unchanged - .prettierignore matches the template
actions_ci - '.github/workflows/ci.yml' is triggered on expected events
actions_ci - '.github/workflows/ci.yml' checks minimum NF version
actions_awstest - '.github/workflows/awstest.yml' is triggered correctly
actions_awsfulltest - .github/workflows/awsfulltest.yml is triggered correctly
actions_awsfulltest - .github/workflows/awsfulltest.yml does not use -profile test
readme - README Nextflow minimum version badge matched config. Badge: 23.04.0, Config: 23.04.0
readme - README Zenodo placeholder was replaced with DOI.
pipeline_name_conventions - Name adheres to nf-core convention
schema_lint - Schema lint passed
schema_lint - Schema title + description lint passed
schema_lint - Input mimetype lint passed: 'text/csv'
system_exit - No System.exit calls found
actions_schema_validation - Workflow validation passed: awstest.yml
actions_schema_validation - Workflow validation passed: branch.yml
actions_schema_validation - Workflow validation passed: fix-linting.yml
actions_schema_validation - Workflow validation passed: linting.yml
actions_schema_validation - Workflow validation passed: clean-up.yml
actions_schema_validation - Workflow validation passed: ci.yml
actions_schema_validation - Workflow validation passed: linting_comment.yml
actions_schema_validation - Workflow validation passed: awsfulltest.yml
actions_schema_validation - Workflow validation passed: download_pipeline.yml
actions_schema_validation - Workflow validation passed: release-announcements.yml
merge_markers - No merge markers found in pipeline files
modules_json - Only installed modules found in modules.json
multiqc_config - assets/multiqc_config.yml found and not ignored.
multiqc_config - assets/multiqc_config.yml contains report_section_order
multiqc_config - assets/multiqc_config.yml contains export_plots
multiqc_config - assets/multiqc_config.yml contains report_comment
multiqc_config - assets/multiqc_config.yml follows the ordering scheme of the minimally required plugins.
multiqc_config - assets/multiqc_config.yml contains a matching 'report_comment'.
multiqc_config - assets/multiqc_config.yml contains 'export_plots: true'.
modules_structure - modules directory structure is correct 'modules/nf-core/TOOL/SUBTOOL'
base_config - conf/base.config found and not ignored.
modules_config - conf/modules.config found and not ignored.
modules_config - FASTQC found in conf/modules.config and Nextflow scripts.
modules_config - MULTIQC found in conf/modules.config and Nextflow scripts.
modules_config - EMPTYDROPS_CELL_CALLING found in conf/modules.config and Nextflow scripts.
modules_config - MTX_TO_H5AD found in conf/modules.config and Nextflow scripts.
modules_config - GTF_GENE_FILTER found in conf/modules.config and Nextflow scripts.
modules_config - CELLRANGER_MKGTF found in conf/modules.config and Nextflow scripts.
modules_config - CELLRANGER_MKREF found in conf/modules.config and Nextflow scripts.
modules_config - CELLRANGER_COUNT found in conf/modules.config and Nextflow scripts.
modules_config - CELLRANGERARC_MKGTF found in conf/modules.config and Nextflow scripts.
modules_config - CELLRANGERARC_MKREF found in conf/modules.config and Nextflow scripts.
modules_config - CELLRANGERARC_COUNT found in conf/modules.config and Nextflow scripts.
modules_config - CELLRANGER_MKGTF found in conf/modules.config and Nextflow scripts.
modules_config - CELLRANGER_MKREF found in conf/modules.config and Nextflow scripts.
modules_config - UNIVERSC found in conf/modules.config and Nextflow scripts.
modules_config - GFFREAD_TXP2GENE found in conf/modules.config and Nextflow scripts.
modules_config - SIMPLEAF_INDEX found in conf/modules.config and Nextflow scripts.
modules_config - SIMPLEAF_QUANT found in conf/modules.config and Nextflow scripts.
modules_config - ALEVINQC found in conf/modules.config and Nextflow scripts.
modules_config - STAR_ALIGN found in conf/modules.config and Nextflow scripts.
modules_config - STAR_GENOMEGENERATE found in conf/modules.config and Nextflow scripts.
modules_config - STAR_ALIGN found in conf/modules.config and Nextflow scripts.
modules_config - KALLISTOBUSTOOLS_REF found in conf/modules.config and Nextflow scripts.
modules_config - KALLISTOBUSTOOLS_COUNT found in conf/modules.config and Nextflow scripts.
modules_config - FASTQC found in conf/modules.config and Nextflow scripts.
modules_config - NFCORE_SCRNASEQ found in conf/modules.config and Nextflow scripts.
modules_config - GUNZIP found in conf/modules.config and Nextflow scripts.
modules_config - CELLRANGER_MKGTF found in conf/modules.config and Nextflow scripts.
modules_config - CELLRANGER_MKREF found in conf/modules.config and Nextflow scripts.
modules_config - CELLRANGER_MKVDJREF found in conf/modules.config and Nextflow scripts.
modules_config - BAMTOFASTQ10X found in conf/modules.config and Nextflow scripts.
nfcore_yml - Repository type in .nf-core.yml is valid: pipeline
nfcore_yml - nf-core version in .nf-core.yml is set to the latest version: 2.14.1

Run details

nf-core/tools version 2.14.1
Run at 2024-08-20 07:48:34

grst · 2024-08-20T07:49:50Z

@grst is this a 10x thing or is it an issue of how nextflow stages the files? Cellranger does not require all files to have the same prefix, ie fastq_id. I'd like some guidance to how I can debug this.

How nextflow stages the files you can check by investigating the process work directory. I haven't worked much wich cellranger multi, but cellranger is pretty strict about the filenames. They need to follow the {sample_name}_S{i}_L00{j}_{R1,R2}_001.fastq.gz convention or they won't be found.

…ied extract_gex_fq

herpov · 2024-08-27T08:47:34Z

@grst is this a 10x thing or is it an issue of how nextflow stages the files? Cellranger does not require all files to have the same prefix, ie fastq_id. I'd like some guidance to how I can debug this.

How nextflow stages the files you can check by investigating the process work directory. I haven't worked much wich cellranger multi, but cellranger is pretty strict about the filenames. They need to follow the {sample_name}_S{i}_L00{j}_{R1,R2}_001.fastq.gz convention or they won't be found.

I realized the workflow renamed the files according to the GEX sample name, so I had to ensure that the IDs of the VDJ and AB channels were consistent with the demultiplexed GEX IDs.

… updated extract_gex_fq()

herpov · 2024-09-04T07:54:56Z

I have not tested the pipeline with frna data. Further, I had to exclude the sample containing probe barcodes from my test set, ie: 4PLEX_HUMAN from assets/cellranger_barcodes_samplesheet.csv. I have not further investigated the reason for failure.
I have been testing the workflow with the linked metadata.

When I tried including another dataset which had been hashed with the same cmo as one of the others I ran into this error:

When running the pipeline separately on the dataset which failed, I had no issues. I have not spent more time trying to work around it, because I don't expect it to be an issue in our use case and, probably, it is a rare event - but thought you'd like to know.

samplesheet.csv
fb_reference.csv
cmo.csv
barcodes_samplesheet.csv

Helle Rus Povlsen added 11 commits August 16, 2024 09:08

cp cellrangermulti to cellrangermulti_vdj in SCRNASEQ workflow

5812f72

cp align_cellrangermulti to align_cellrangermulti_vdj in local subwor…

a7a5cd3

…kflows

separate processes that generate reference files from cellranger multi

06e9932

installed nf-core/module bamtofastq10x

9bafa8d

add null/ to the gitignore list

2e32c2e

add description of demultiplexing combined with immuneprofiling

6072a4c

add bamtofastq10x module with amendments

27a8c16

move reference creation outside the cellranger multi to avoid rerunni…

a1d0170

…ng. additionally changing input channels

add subworkflow specific for handling sample demultiplexing followed …

170ec07

…by immune profiling

implement cellranger multi ref and vdj. branch channels to either run…

3b71ff8

… cellranger multi or cellranger multi+vdj

update publishDir for the two cellranger multi outputs. add publishDi…

0132d9b

…r to bamtofastq process

This comment was marked as resolved.

Sign in to view

herpov commented Aug 16, 2024

View reviewed changes

conf/modules.config Outdated Show resolved Hide resolved

herpov commented Aug 16, 2024

View reviewed changes

conf/modules.config Outdated Show resolved Hide resolved

herpov commented Aug 16, 2024

View reviewed changes

modules/nf-core/bamtofastq10x/main.nf Outdated Show resolved Hide resolved

herpov commented Aug 16, 2024

View reviewed changes

modules/nf-core/bamtofastq10x/main.nf Outdated Show resolved Hide resolved

grst reviewed Aug 20, 2024

View reviewed changes

Helle Rus Povlsen added 2 commits August 21, 2024 08:52

add func to expand feature channels to match demultiplexed gex. modif…

1434e45

…ied extract_gex_fq

remove renaming of files

18b9eae

Helle Rus Povlsen added 4 commits September 3, 2024 16:59

removed arg for BAMTOFASTQ and updated CELLRANGER_MULTI with regex

ecf453f

changed faux channels to value channels to be consumed infinitely and…

8359254

… updated extract_gex_fq()

remove unused code

c4356b0

renamed file

ced5ddc

update output path for fastq

8c6e1c0

Helle Rus Povlsen added 3 commits September 4, 2024 10:13

update output dir for emptydrops analysis

e22f8fb

update filename for generating reference files for cellranger multi

eff6158

remove frna option for immune-profiling

998f967

herpov mentioned this pull request Oct 2, 2024

[FEATURE] Update BAMTOFASTQ10X for multiplexed samples nf-core/modules#6725

Closed

updated bamtofastq10x module

cef8759

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Draft: Add feature for sample demultiplexing followed by immune profiling #365

Draft: Add feature for sample demultiplexing followed by immune profiling #365

herpov commented Aug 16, 2024 •

edited

Loading

This comment was marked as resolved.

herpov Aug 16, 2024

grst Aug 20, 2024

herpov Sep 4, 2024

grst Sep 9, 2024

herpov Aug 16, 2024

grst Aug 20, 2024

grst Aug 20, 2024

herpov Sep 4, 2024

github-actions bot commented Aug 20, 2024

❗ Test warnings:

❔ Tests ignored:

✅ Tests passed:

Run details

grst commented Aug 20, 2024

herpov commented Aug 27, 2024

herpov commented Sep 4, 2024

Draft: Add feature for sample demultiplexing followed by immune profiling #365

Are you sure you want to change the base?

Draft: Add feature for sample demultiplexing followed by immune profiling #365

Conversation

herpov commented Aug 16, 2024 • edited Loading

Description of changes

Rearrangement of code

Added tool

New code

PR checklist

This comment was marked as resolved.

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

github-actions bot commented Aug 20, 2024

nf-core lint overall result: Passed ✅ ⚠️

❗ Test warnings:

❔ Tests ignored:

✅ Tests passed:

Run details

grst commented Aug 20, 2024

herpov commented Aug 27, 2024

herpov commented Sep 4, 2024

herpov commented Aug 16, 2024 •

edited

Loading

`nf-core lint` overall result: Passed ✅ ⚠️