Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add assembled check to samplesheet, fix report use read_rearrangement #266

Merged
merged 17 commits into from
Jul 20, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 3 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -10,12 +10,15 @@ and this project adheres to [Semantic Versioning](http://semver.org/spec/v2.0.0.
- [#268](https://github.com/nf-core/airrflow/pull/268) Added parameters for FindThreshold in `modules.config`.
- [#268](https://github.com/nf-core/airrflow/pull/268) Validate samplesheet also for `assembled` samplesheet.
- [#259](https://github.com/nf-core/airrflow/pull/259) Update to `EnchantR v0.1.3`.
- [#266](https://github.com/nf-core/airrflow/pull/266) Added clonal reports tables to final report folder.
- [#266](https://github.com/nf-core/airrflow/pull/266) Added processes to include sampleID to filename in assembled workflow to keep it unique.

### `Fixed`

- [#268](https://github.com/nf-core/airrflow/pull/268) Allows for uppercase and lowercase locus in samplesheet `pcr_target_locus`.
- [#259](https://github.com/nf-core/airrflow/pull/259) Samplesheet only allows data from one species.
- [#259](https://github.com/nf-core/airrflow/pull/259) Introduced fix for a too long command with hundreds of datasets.
- [#266](https://github.com/nf-core/airrflow/pull/266) Convert samplesheet required columns to strings when needed.

### `Dependencies`

Expand Down
26 changes: 18 additions & 8 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -15,7 +15,7 @@

## Introduction

** nf-core/airrflow ** is a bioinformatics best-practice pipeline to analyze B-cell or T-cell repertoire sequencing data. It makes use of the [Immcantation](https://immcantation.readthedocs.io) toolset. The input data can be targeted amplicon bulk sequencing data of the V, D, J and C regions of the B/T-cell receptor with multiplex PCR or 5' RACE protocol, or assembled reads (bulk or single cell).
**nf-core/airrflow** is a bioinformatics best-practice pipeline to analyze B-cell or T-cell repertoire sequencing data. It makes use of the [Immcantation](https://immcantation.readthedocs.io) toolset. The input data can be targeted amplicon bulk sequencing data of the V, D, J and C regions of the B/T-cell receptor with multiplex PCR or 5' RACE protocol, or assembled reads (bulk or single cell).

![nf-core/airrflow overview](docs/images/airrflow_workflow_overview.png)

Expand Down Expand Up @@ -87,7 +87,7 @@ First, ensure that the pipeline tests run on your infrastructure:
nextflow run nf-core/airrflow -profile test,<docker/singularity/podman/shifter/charliecloud/conda/institute> --outdir <OUTDIR>
```

To run on your data, prepare a tab-separated samplesheet with your input data. Depending on the input data type (bulk or single-cell, raw reads or assembled reads) the input samplesheet will vary. Please follow the [documentation on samplesheets](https://nf-co.re/airrflow/usage#input-samplesheet) for more details. An example samplesheet for running the pipeline on raw BCR / TCR sequencing data looks as follows:
To run nf-core/airrflow with your data, prepare a tab-separated samplesheet with your input data. Depending on the input data type (bulk or single-cell, raw reads or assembled reads) the input samplesheet will vary. Please follow the [documentation on samplesheets](https://nf-co.re/airrflow/usage#input-samplesheet) for more details. An example samplesheet for running the pipeline on bulk BCR / TCR sequencing data in fastq format looks as follows:

| sample_id | filename_R1 | filename_R2 | filename_I1 | subject_id | species | pcr_target_locus | tissue | sex | age | biomaterial_provider | single_cell | intervention | collection_time_point_relative | cell_subset |
| --------- | ------------------------------- | ------------------------------- | ------------------------------- | ---------- | ------- | ---------------- | ------ | ------ | --- | -------------------- | ----------- | -------------- | ------------------------------ | ------------ |
Expand All @@ -96,28 +96,38 @@ To run on your data, prepare a tab-separated samplesheet with your input data. D

Each row represents a sample with fastq files (paired-end).

A typical command to run the pipeline is:
A typical command to run the pipeline from **bulk raw fastq files** is:

```bash
nextflow run nf-core/airrflow \
-profile <docker/singularity/podman/shifter/charliecloud/conda/institute> \
--input samplesheet.tsv \
--mode fastq \
--input input_samplesheet.tsv \
--library_generation_method specific_pcr_umi \
--cprimers CPrimers.fasta \
--vprimers VPrimers.fasta \
--umi_length 12 \
--max_memory 8.GB \
--max_cpus 8 \
--umi_position R1 \
--outdir ./results
```

A typical command to run the pipeline from **single-cell AIRR rearrangement tables or assembled bulk sequencing fasta** data is:

```bash
nextflow run nf-core/airrflow \
-profile <docker/singularity/podman/shifter/charliecloud/conda/institute> \
--input input_samplesheet.tsv \
--mode assembled \
--outdir results
```

See the [usage documentation](https://nf-co.re/airrflow/usage) and the [parameter documentation](https://nf-co.re/airrflow/parameters) for more details on how to use the pipeline and all the available parameters.

> **Warning:**
> Please provide pipeline parameters via the CLI or Nextflow `-params-file` option. Custom config files including those
> provided by the `-c` Nextflow option can be used to provide any configuration _**except for parameters**_;
> see [docs](https://nf-co.re/usage/configuration#custom-configuration-files).

For more details, please refer to the [usage documentation](https://nf-co.re/airrflow/usage) and the [parameter documentation](https://nf-co.re/airrflow/parameters).

## Pipeline output

To see the the results of a test run with a full size dataset refer to the [results](https://nf-co.re/airrflow/results) tab on the nf-core website pipeline page.
Expand Down
17 changes: 14 additions & 3 deletions bin/check_samplesheet.py
Original file line number Diff line number Diff line change
Expand Up @@ -68,16 +68,16 @@ def check_samplesheet(file_in, assembled):
"age",
]
required_columns_assembled = [
"sample_id",
"filename",
"sample_id",
"subject_id",
"species",
"pcr_target_locus",
"single_cell",
"sex",
"tissue",
"biomaterial_provider",
"age",
"single_cell",
]
no_whitespaces_raw = [
"sample_id",
Expand All @@ -99,9 +99,16 @@ def check_samplesheet(file_in, assembled):

## Read header
header = [x.strip('"') for x in fin.readline().strip().split("\t")]

## Read tab
tab = pd.read_csv(file_in, sep="\t", header=0)

## Set required columns as strings
types_dict = dict()
types_dict.update({col: str for col in required_columns_assembled[1:7]})
for col, col_type in types_dict.items():
tab[col] = tab[col].astype(col_type)

# Check that all required columns for assembled and raw samplesheets are there, and do not contain whitespaces
if assembled:
for col in required_columns_assembled:
Expand All @@ -118,8 +125,12 @@ def check_samplesheet(file_in, assembled):
col, no_whitespaces_assembled
)
)

else:
if any(tab["single_cell"].tolist()):
print_error(
"Some single cell column values are TRUE. The raw mode only accepts bulk samples. If processing single cell samples, please set the `--mode assembled` flag, and provide an AIRR rearrangement as input."
)

for col in required_columns_raw:
if col not in header:
print("ERROR: Please check samplesheet header: {} ".format(",".join(header)))
Expand Down
15 changes: 14 additions & 1 deletion conf/modules.config
Original file line number Diff line number Diff line change
Expand Up @@ -99,7 +99,19 @@ process {
ext.args = '--quiet'
}

withName: 'MERGE_UMI' {
withName: RENAME_FASTQ {
publishDir = [
enabled: false
]
}

withName: 'RENAME_FILE_*' {
publishDir = [
enabled: false
]
}

withName: MERGE_UMI {
publishDir = [
[
enabled: false
Expand Down Expand Up @@ -384,6 +396,7 @@ process {
mode: params.publish_dir_mode,
saveAs: { filename -> filename.equals('versions.yml') ? null : filename }
]
errorStrategy = 'retry'
}

// ------------------------------
Expand Down
Loading
Loading