Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft: Add feature for sample demultiplexing followed by immune profiling #365

Draft
wants to merge 22 commits into
base: dev
Choose a base branch
from

Conversation

herpov
Copy link

@herpov herpov commented Aug 16, 2024

Description of changes

The current nf-core/scrnaseq version (v1.7.0) does not handle the use case where data is both sample multiplexed and requires immune profiling. The current version can however handle either situation separately.

The cellranger software provided by 10x does not handle this situation either, but this tutorial guides the user to follow 3 steps:

  1. Run cellranger multi to demultiplex the samples
  2. Convert the output .bam files to .fq files
  3. Run cellranger multi with immune profiling for each sample using the generated .fq files

This PR serves to enable that for the workflow. This has required an additional tool (nf-core/bamtofastq10x), some new code, and a bit of rearrangement of existing code.

Rearrangement of code

Since cellranger multi is to be run multiple times during a nextflow run, I moved the code for preparing the reference genome from subworkflows/local/align_cellrangermulti.nf and generated a new subworkflow, cellrangermulti_ref.nf.

Added tool

I added the tool nf-core/bamtofastq10x.

New code

I added the subworkflow align_cellrangermulti_vdj.nf based on align_cellrangermulti.nf which contains the above described steps: 1,2, and 3. For step 1 and 3 the nf-core cellranger multi module is invoked as in align_cellrangermulti.nf.

The main changes lie in channel operations in scrnaseq.nf and align_cellrangermulti_vdj.nf.

PR checklist

  • This comment contains a description of changes (with reason).
  • If you've fixed a bug or added code that should be tested, add tests!
  • If you've added a new tool - have you followed the pipeline conventions in the contribution docs
  • If necessary, also make a PR on the nf-core/scrnaseq branch on the nf-core/test-datasets repository.
  • Make sure your code lints (nf-core lint).
  • Ensure the test suite passes (nextflow run . -profile test,docker --outdir <OUTDIR>).
  • Check for unexpected warnings in debug mode (nextflow run . -profile debug,test,docker --outdir <OUTDIR>).
  • Usage Documentation in docs/usage.md is updated.
  • Output Documentation in docs/output.md is updated.
  • CHANGELOG.md is updated.
  • README.md is updated (including new tool citations and authors/contributors).

@herpov

This comment was marked as resolved.

conf/modules.config Outdated Show resolved Hide resolved
conf/modules.config Outdated Show resolved Hide resolved
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have made several changes to this script, but idk if I should make a PR for the module itself?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The changes you made seem pretty specific to your use-case, so it doesn't make sense to update the central module.
It is also not allows to change modules in a pipeline (-> linting error).

The preferred way would be to make it somehow work with the nf-core module unchanged. If this is not possible, you can make a copy of the module in the "local" folder and adapt it as needed.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@grst now the changes are minimal to this module. The rest I could fix by manipulating the output channel. Should I make a PR for bamtofastq or should I still just move this to "local" dir?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should I make a PR for bamtofastq

Yes please, the changes look like everyone will benefit from them.

bamtofastq \\
$args \\
$bam \\
$prefix
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed the output path from ${prefix}.fastq.gz.

bamtofastq generates a directory containing two folders: one for GEX and one for CMO .fastq files.
The two folders are prefixed with the .bam prefix.
All files are automatically prefixed with bamtofastq.

conf/modules.config Outdated Show resolved Hide resolved
conf/modules.config Outdated Show resolved Hide resolved
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The changes you made seem pretty specific to your use-case, so it doesn't make sense to update the central module.
It is also not allows to change modules in a pipeline (-> linting error).

The preferred way would be to make it somehow work with the nf-core module unchanged. If this is not possible, you can make a copy of the module in the "local" folder and adapt it as needed.

include { CELLRANGER_MKVDJREF } from "../../modules/nf-core/cellranger/mkvdjref/main.nf"

// Define workflow to subset and index a genome region fasta file
workflow CELLRANGER_MULTI_REF {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please make sure the workflow name and filename match.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you accept the new name of the file?
cellrangermulti_ref.nf

subworkflows/local/align_cellrangermulti_idx.nf Outdated Show resolved Hide resolved
modules/nf-core/bamtofastq10x/main.nf Outdated Show resolved Hide resolved
modules/nf-core/bamtofastq10x/main.nf Outdated Show resolved Hide resolved
Copy link

nf-core lint overall result: Passed ✅ ⚠️

Posted for pipeline commit 0132d9b

+| ✅ 205 tests passed       |+
#| ❔   4 tests were ignored |#
!| ❗   3 tests had warnings |!

❗ Test warnings:

  • pipeline_todos - TODO string in main.nf: Optionally add in-text citation tools to this list.
  • pipeline_todos - TODO string in main.nf: Optionally add bibliographic entries to this list.
  • pipeline_todos - TODO string in main.nf: Only uncomment below if logic in toolCitationText/toolBibliographyText has been filled!

❔ Tests ignored:

✅ Tests passed:

Run details

  • nf-core/tools version 2.14.1
  • Run at 2024-08-20 07:48:34

@grst
Copy link
Member

grst commented Aug 20, 2024

@grst is this a 10x thing or is it an issue of how nextflow stages the files? Cellranger does not require all files to have the same prefix, ie fastq_id. I'd like some guidance to how I can debug this.

How nextflow stages the files you can check by investigating the process work directory. I haven't worked much wich cellranger multi, but cellranger is pretty strict about the filenames. They need to follow the {sample_name}_S{i}_L00{j}_{R1,R2}_001.fastq.gz convention or they won't be found.

@herpov
Copy link
Author

herpov commented Aug 27, 2024

@grst is this a 10x thing or is it an issue of how nextflow stages the files? Cellranger does not require all files to have the same prefix, ie fastq_id. I'd like some guidance to how I can debug this.

How nextflow stages the files you can check by investigating the process work directory. I haven't worked much wich cellranger multi, but cellranger is pretty strict about the filenames. They need to follow the {sample_name}_S{i}_L00{j}_{R1,R2}_001.fastq.gz convention or they won't be found.

I realized the workflow renamed the files according to the GEX sample name, so I had to ensure that the IDs of the VDJ and AB channels were consistent with the demultiplexed GEX IDs.

@herpov
Copy link
Author

herpov commented Sep 4, 2024

I have not tested the pipeline with frna data. Further, I had to exclude the sample containing probe barcodes from my test set, ie: 4PLEX_HUMAN from assets/cellranger_barcodes_samplesheet.csv. I have not further investigated the reason for failure.
I have been testing the workflow with the linked metadata.

When I tried including another dataset which had been hashed with the same cmo as one of the others I ran into this error:

Screenshot 2024-08-30 at 15 26 45

When running the pipeline separately on the dataset which failed, I had no issues. I have not spent more time trying to work around it, because I don't expect it to be an issue in our use case and, probably, it is a rare event - but thought you'd like to know.

samplesheet.csv
fb_reference.csv
cmo.csv
barcodes_samplesheet.csv

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants