Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

unselected RNA-seq based workflow #334

Merged
merged 49 commits into from
Jul 17, 2024
Merged
Show file tree
Hide file tree
Changes from 23 commits
Commits
Show all changes
49 commits
Select commit Hold shift + click to select a range
11c4195
first steps & sc run
mapo9 Mar 21, 2024
6639f0f
input change
mapo9 Mar 25, 2024
2a42c32
stage before pull
mapo9 Apr 10, 2024
9d2c18a
Merge remote-tracking branch 'origin/master' into dev
mapo9 Apr 10, 2024
47a7642
merged most recent airrflow version
mapo9 Apr 11, 2024
a5af202
barcode airr test
mapo9 Apr 11, 2024
2f492b0
Merge pull request #325 from nf-core/dev
ggabernet Apr 23, 2024
48e1b68
dev update
mapo9 May 6, 2024
e434be7
Merge branch 'master' of https://github.com/mapo9/airrflow into trust4
mapo9 May 6, 2024
6242a6f
skip presto reporting which doesnt happen when trust4 is used
mapo9 May 7, 2024
761eccc
linting
mapo9 May 8, 2024
2da37e2
fastp before trust4
mapo9 May 8, 2024
d262a38
fastp before trust4
mapo9 May 10, 2024
50418e1
bulk and sc rnaseq input
mapo9 May 16, 2024
c6f95bf
bulk rnaseq test workflow
mapo9 May 16, 2024
c72146c
sc rnaseq tests
mapo9 May 16, 2024
7746378
docs
mapo9 May 16, 2024
5ef2f60
linting & prettier
mapo9 May 16, 2024
0bc5c19
trailing whitespace
mapo9 May 16, 2024
82c0688
trailing whitespace
mapo9 May 16, 2024
1a3a629
bugfix
mapo9 May 21, 2024
d822084
trailing whitespace
mapo9 May 21, 2024
90cada8
view statement removed
mapo9 May 21, 2024
35cf7f4
incorporating review comments
mapo9 May 23, 2024
cbdc415
fix test
mapo9 May 23, 2024
948bb53
removed view statement
mapo9 May 23, 2024
f13eeab
Merge branch 'dev' of https://github.com/ggabernet/nf-core-airrflow i…
ggabernet May 28, 2024
616349f
generate reference fasta
ggabernet May 28, 2024
9443bc8
channel in wrong subworkflow
ggabernet May 28, 2024
310e302
generate trust4 reference
ggabernet May 28, 2024
b9e977f
Merge branch 'dev' of https://github.com/nf-core/airrflow into trust4
ggabernet May 29, 2024
236e0f9
update container airrflow 4.1.0
ggabernet May 30, 2024
0c72700
test the dev container
ggabernet May 30, 2024
7101a6d
update container
ggabernet May 30, 2024
7304208
update changelog
ggabernet May 30, 2024
d933c48
improve protocol logs
ggabernet May 30, 2024
cf55590
presto wasnt updated
ggabernet May 30, 2024
fa5664e
fix linting
ggabernet May 30, 2024
032a56d
rm irrelevant param
ggabernet May 30, 2024
140c25c
Merge branch 'dev' of https://github.com/ggabernet/nf-core-airrflow i…
ggabernet May 30, 2024
0d7e636
merge dev
ggabernet May 31, 2024
bd3bbe2
add param skip alignment filter
ggabernet May 31, 2024
46a5fa2
add 'config_profile_url' default
mapo9 Jun 3, 2024
ba3eea1
trust4 nf-core module
mapo9 Jun 5, 2024
c186a69
improve docs
ggabernet Jul 17, 2024
cb6badc
Merge branch 'trust4' of https://github.com/mapo9/airrflow into trust4
ggabernet Jul 17, 2024
5e6e1a9
fix prettier
ggabernet Jul 17, 2024
b8dc66e
add locus selection to rnaseq workflow
ggabernet Jul 17, 2024
cfa4f73
fix issue with merge_UMI
ggabernet Jul 17, 2024
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions .github/workflows/ci.yml
Original file line number Diff line number Diff line change
Expand Up @@ -62,6 +62,8 @@ jobs:
"test_10x_sc",
"test_clontech_umi",
"test_nebnext_umi",
"test_rnaseq_bulk",
"test_rnaseq_sc",
]
fail-fast: false
steps:
Expand Down
27 changes: 27 additions & 0 deletions conf/test_rnaseq_bulk.config
Original file line number Diff line number Diff line change
@@ -0,0 +1,27 @@
/*
* -------------------------------------------------
* Nextflow config file for running tests
* -------------------------------------------------
* Defines bundled input files and everything required
* to run a fast and simple test. Use as follows:
* nextflow run nf-core/airrflow -profile test_rnaseq_bulk,<docker/singularity>
*/

params {
config_profile_name = 'Test bulk RNA-seq based workflow using TRUST4'
config_profile_description = 'Minimal test dataset to check pipeline function with raw bulk RNA-seq data'

// Limit resources so that this can run on GitHub Actions
max_cpus = 2
max_memory = 6.GB
max_time = 48.h

// params
mode = 'fastq'
library_generation_method = 'trust4'
clonal_threshold = 0

// Input data
input = 'https://raw.githubusercontent.com/nf-core/test-datasets/airrflow/testdata-rnaseq/rnaseq_metadata.tsv'
coord_fasta = 'https://raw.githubusercontent.com/nf-core/test-datasets/airrflow/testdata-rnaseq/IMGT+C.fa'
}
31 changes: 31 additions & 0 deletions conf/test_rnaseq_sc.config
Original file line number Diff line number Diff line change
@@ -0,0 +1,31 @@
/*
* -------------------------------------------------
* Nextflow config file for running tests
* -------------------------------------------------
* Defines bundled input files and everything required
* to run a fast and simple test. Use as follows:
* nextflow run nf-core/airrflow -profile test_rnaseq_sc,<docker/singularity>
*/

params {
config_profile_name = 'Test single-cell RNA-seq based workflow using TRUST4'
config_profile_description = 'Minimal test dataset to check pipeline function with raw single-cell RNA-seq data'

// Limit resources so that this can run on GitHub Actions
max_cpus = 2
max_memory = 6.GB
max_time = 48.h

// params
mode = 'fastq'
library_generation_method = 'trust4'
clonal_threshold = 0
barcode_read = R1
umi_position = R1
read_format = "bc:0:15,um:16:27"
skip_lineage = True

// Input data
input = 'https://raw.githubusercontent.com/nf-core/test-datasets/airrflow/testdata-rnaseq/sc_rnaseq_metadata.tsv'
coord_fasta = 'https://raw.githubusercontent.com/nf-core/test-datasets/airrflow/testdata-rnaseq/IMGT+C.fa'
}
49 changes: 46 additions & 3 deletions docs/usage.md
Original file line number Diff line number Diff line change
Expand Up @@ -42,13 +42,13 @@ nextflow run nf-core/airrflow \
A typical command to run the pipeline from **single cell raw fastq files** is:

```bash
nextflow run nf-core/airrflow -r dev \
nextflow run nf-core/airrflow \
-profile <docker/singularity/podman/shifter/charliecloud/conda/institute> \
--mode fastq \
--input input_samplesheet.tsv \
--library_generation_method sc_10x_genomics \
--reference_10x reference/refdata-cellranger-vdj-GRCh38-alts-ensembl-5.0.0.tar.gz \
--outdir ./results
--outdir results
```

A typical command for running the pipeline departing from **single-cell AIRR rearrangement tables or assembled bulk sequencing fasta** data is:
Expand Down Expand Up @@ -123,7 +123,7 @@ If you wish to share such profile (such as upload as supplementary material for

## Input samplesheet

### Fastq input samplesheet (bulk sequencing)
### Fastq input samplesheet (bulk AIRR and bulk/sc RNA sequencing)

The required input file for processing raw BCR or TCR bulk targeted sequencing data is a sample sheet in TSV format (tab separated). The columns `sample_id`, `filename_R1`, `filename_R2`, `subject_id`, `species`, `tissue`, `pcr_target_locus`, `single_cell`, `sex`, `age` and `biomaterial_provider` are required. An example samplesheet is:

Expand Down Expand Up @@ -511,6 +511,49 @@ nextflow run nf-core/airrflow -r dev \
- The 10xGenomics reference can be downloaded from the [download page](https://www.10xgenomics.com/support/software/cell-ranger/downloads)
- To generate a V(D)J segment fasta file as reference from IMGT one can follow the [cellranger docs](https://support.10xgenomics.com/single-cell-vdj/software/pipelines/latest/advanced/references#imgt).

## Supported unselected RNA-seq based methods

nf-core/airrflow supports unselected bulk or single-cell RNA-seq fastq files as input. [TRUST4](https://github.com/liulab-dfci/TRUST4) is used to extract TCR/BCR sequences from these files. The resulting AIRR tables are then fed into airrflow's Immcantation based workflow. <br>
To use unselected RNA-seq based input, specify `--library_generation_method trust4`.

### Bulk RNA-seq

A typical command to run the pipeline from **bulk RNA-seq fastq files** is:

```bash
nextflow run nf-core/airrfow \
-profile <docker/singularity/podman/shifter/charliecloud/conda/institute> \
--mode fastq \
--input input_samplesheet.tsv \
--library_generation_method trust4 \
--coord_fasta reference/IMGT+C.fa \
--outdir results
```

### Single-cell RNA-seq

A typical command to run the pipeline from **single-cell RNA-seq fastq files** is:

```bash
nextflow run nf-core/airrfow \
-profile <docker/singularity/podman/shifter/charliecloud/conda/institute> \
--mode fastq \
--input input_samplesheet.tsv \
--library_generation_method trust4 \
--umi_position R1 \
--read_format bc:0:15,um:16:27
--coord_fasta reference/IMGT+C.fa \
--outdir results
```

- If UMI's are present, the read containing them must be specified using the `--umi_position` parameter.
- The `--read_format` parameter can be used to specify the Barcode and UMI position within the reads (see TRUST4 [docs](https://github.com/liulab-dfci/TRUST4?tab=readme-ov-file#10x-genomics-data-and-barcode-based-single-cell-data))

#### Reference file

TRUST4 requires a reference. This can provided using the `--coord_fasta` parameter.
The reference fasta can be downloaded from IMGT and created using [TRUST4](https://github.com/liulab-dfci/TRUST4?tab=readme-ov-file#build-custom-vjc-gene-database-files-for--f-and---ref)

## Core Nextflow arguments

:::note
Expand Down
2 changes: 1 addition & 1 deletion modules.json
Original file line number Diff line number Diff line change
Expand Up @@ -46,7 +46,7 @@
},
"utils_nfcore_pipeline": {
"branch": "master",
"git_sha": "5caf7640a9ef1d18d765d55339be751bb0969dfa",
"git_sha": "92de218a329bfc9a9033116eb5f65fd270e72ba3",
"installed_by": ["subworkflows"]
},
"utils_nfvalidation_plugin": {
Expand Down
23 changes: 23 additions & 0 deletions modules/local/rename_fastq_trust4.nf
Original file line number Diff line number Diff line change
@@ -0,0 +1,23 @@
// Import generic module functions
process RENAME_FASTQ_TRUST4 {
tag "$meta.id"
label 'process_low'

conda "conda-forge::python=3.8.0 conda-forge::biopython=1.74"
container "${ workflow.containerEngine == 'singularity' && !task.ext.singularity_pull_docker_container ?
'https://depot.galaxyproject.org/singularity/mulled-v2-adc9bb9edc31eb38b3c24786a83b7dfa530e2bea:47d6d7765d7537847ced7dac873190d164146022-0' :
'biocontainers/mulled-v2-adc9bb9edc31eb38b3c24786a83b7dfa530e2bea:47d6d7765d7537847ced7dac873190d164146022-0' }"

input:
tuple val(meta), path(R1), path(R2)
tuple val(meta_2), path(orig_r1), path(orig_r2)

output:
tuple val(meta), path(orig_r1), path(orig_r2) , emit: reads

script:
"""
mv ${R1} ${orig_r1}
mv ${R2} ${orig_r2}
"""
}
102 changes: 102 additions & 0 deletions modules/local/trust4.nf
Original file line number Diff line number Diff line change
@@ -0,0 +1,102 @@
process TRUST4 {
tag "$meta.id"
label 'process_medium'

conda "bioconda::trust4=1.0.13"
container "${ workflow.containerEngine == 'singularity' && !task.ext.singularity_pull_docker_container ?
'https://depot.galaxyproject.org/singularity/trust4:1.0.13--h43eeafb_0':
'biocontainers/trust4:1.0.13--h43eeafb_0' }"

input:
tuple val(meta), path(bam), path(reads)
tuple val(meta2), path(fasta)
tuple val(meta3), path(vdj_reference)

output:
tuple val(meta), path("*.tsv") , emit: tsv
tuple val(meta), path("*_airr.tsv") , emit: airr_files
tuple val(meta), path("${meta.id}_airr.tsv") , emit: airr_tsv
tuple val(meta), path("*_report.tsv") , emit: report_tsv
tuple val(meta), path("*.fa") , emit: fasta
tuple val(meta), path("*.out") , emit: out
tuple val(meta), path("*.fq") , emit: fq
tuple val(meta), path("**") , emit: outs
path "versions.yml" , emit: versions

when:
task.ext.when == null || task.ext.when

script:
def args = task.ext.args ?: ''
def prefix = task.ext.prefix ?: "${meta.id}"
def bam_mode = bam ? "-b ${bam}" : ''
def single_end_mode = reads && meta.single_end ? "-u ${reads}" : ''
// reference is optional for fastq input
def reference = vdj_reference ? "--ref ${vdj_reference}" : ""
// separate forward from reverse pairs
def (forward, reverse) = reads.collate(2).transpose()
def paired_end_mode = reads && (meta.single_end == false) ? "-1 ${forward[0]} -2 ${reverse[0]}" : ''
def readFormat = params.read_format ? "--readFormat ${params.read_format}" : ''
def barcode = ''
if (meta.barcode_read) {
if (meta.barcode_read == "R1") {
barcode = "--barcode ${forward[0]}"
} else if (meta.barcode_read == "R2") {
barcode = "--barcode ${reverse[0]}"
}
}
else {
barcode = ''
}
def umi_position = ''
if (meta.umi_position) {
if (meta.umi_position == "R1") {
umi_position = "--UMI ${forward[0]}"
} else if (meta.umi_position == "R2") {
umi_position = "--UMI ${reverse[0]}"
}
}
else {
umi_position = ''
}

"""
run-trust4 \\
${bam_mode} \\
${single_end_mode} \\
${paired_end_mode} \\
${barcode} \\
${readFormat} \\
${umi_position} \\
-t $task.cpus \\
-f ${fasta} \\
-o ${prefix} \\
${reference} \\
$args

cat <<-END_VERSIONS > versions.yml
"${task.process}":
trust4: \$(run-trust4 2>&1 | grep -o 'v[0-9.]*-r[0-9]*' | sed 's/^/TRUST4 using /' )
END_VERSIONS
"""

stub:
def args = task.ext.args ?: ''
def prefix = task.ext.prefix ?: "${meta.id}"
"""
touch ${prefix}_airr.tsv
touch ${prefix}_airr_align.tsv
touch ${prefix}_report.tsv
touch ${prefix}_assembled_reads.fa
touch ${prefix}_annot.fa
touch ${prefix}_cdr3.out
touch ${prefix}_raw.out
touch ${prefix}_final.out
touch ${prefix}_toassemble.fq

cat <<-END_VERSIONS > versions.yml
"${task.process}":
trust4: \$(run-trust4 2>&1 | grep -o 'v[0-9.]*-r[0-9]*' | sed 's/^/TRUST4 using /' )
END_VERSIONS
"""
}
11 changes: 10 additions & 1 deletion nextflow.config
Original file line number Diff line number Diff line change
Expand Up @@ -31,7 +31,7 @@ params {
primer_revpr = false

// UMI and primer handling
umi_position = 'R1'
umi_position = null
mapo9 marked this conversation as resolved.
Show resolved Hide resolved
umi_length = -1
umi_start = 0

Expand Down Expand Up @@ -123,6 +123,13 @@ params {
// -----------------------
reference_10x = null

// -----------------------
// raw RNA seq input options
// -----------------------
barcode_read = null
read_format = null
coord_fasta = null


// -----------------------
// generic nf-core options
Expand Down Expand Up @@ -299,6 +306,8 @@ profiles {
test_10x_sc { includeConfig 'conf/test_10x_sc.config' }
test_clontech_umi { includeConfig 'conf/test_clontech_umi.config' }
test_nebnext_umi { includeConfig 'conf/test_nebnext_umi.config' }
test_rnaseq_bulk { includeConfig 'conf/test_rnaseq_bulk.config' }
test_rnaseq_sc { includeConfig 'conf/test_rnaseq_sc.config' }
nebnext_umi_tcr { includeConfig 'conf/nebnext_umi_tcr.config' }
nebnext_umi_bcr { includeConfig 'conf/nebnext_umi_bcr.config' }
clontech_umi_bcr { includeConfig 'conf/clontech_umi_bcr.config' }
Expand Down
Loading
Loading