nf-core · ggabernet · Jul 20, 2023 · Jun 22, 2023 · Jun 22, 2023 · Jun 22, 2023
diff --git a/CHANGELOG.md b/CHANGELOG.md
@@ -10,12 +10,15 @@ and this project adheres to [Semantic Versioning](http://semver.org/spec/v2.0.0.
 - [#268](https://github.com/nf-core/airrflow/pull/268) Added parameters for FindThreshold in `modules.config`.
 - [#268](https://github.com/nf-core/airrflow/pull/268) Validate samplesheet also for `assembled` samplesheet.
 - [#259](https://github.com/nf-core/airrflow/pull/259) Update to `EnchantR v0.1.3`.
+- [#266](https://github.com/nf-core/airrflow/pull/266) Added clonal reports tables to final report folder.
+- [#266](https://github.com/nf-core/airrflow/pull/266) Added processes to include sampleID to filename in assembled workflow to keep it unique.
 
 ### `Fixed`
 
 - [#268](https://github.com/nf-core/airrflow/pull/268) Allows for uppercase and lowercase locus in samplesheet `pcr_target_locus`.
 - [#259](https://github.com/nf-core/airrflow/pull/259) Samplesheet only allows data from one species.
 - [#259](https://github.com/nf-core/airrflow/pull/259) Introduced fix for a too long command with hundreds of datasets.
+- [#266](https://github.com/nf-core/airrflow/pull/266) Convert samplesheet required columns to strings when needed.
 
 ### `Dependencies`
 

diff --git a/README.md b/README.md
@@ -15,7 +15,7 @@
 
 ## Introduction
 
-** nf-core/airrflow ** is a bioinformatics best-practice pipeline to analyze B-cell or T-cell repertoire sequencing data. It makes use of the [Immcantation](https://immcantation.readthedocs.io) toolset. The input data can be targeted amplicon bulk sequencing data of the V, D, J and C regions of the B/T-cell receptor with multiplex PCR or 5' RACE protocol, or assembled reads (bulk or single cell).
+**nf-core/airrflow** is a bioinformatics best-practice pipeline to analyze B-cell or T-cell repertoire sequencing data. It makes use of the [Immcantation](https://immcantation.readthedocs.io) toolset. The input data can be targeted amplicon bulk sequencing data of the V, D, J and C regions of the B/T-cell receptor with multiplex PCR or 5' RACE protocol, or assembled reads (bulk or single cell).
 
 ![nf-core/airrflow overview](docs/images/airrflow_workflow_overview.png)
 
@@ -87,7 +87,7 @@ First, ensure that the pipeline tests run on your infrastructure:
 nextflow run nf-core/airrflow -profile test,<docker/singularity/podman/shifter/charliecloud/conda/institute> --outdir <OUTDIR>
 ```
 
-To run on your data, prepare a tab-separated samplesheet with your input data. Depending on the input data type (bulk or single-cell, raw reads or assembled reads) the input samplesheet will vary. Please follow the [documentation on samplesheets](https://nf-co.re/airrflow/usage#input-samplesheet) for more details. An example samplesheet for running the pipeline on raw BCR / TCR sequencing data looks as follows:
+To run nf-core/airrflow with your data, prepare a tab-separated samplesheet with your input data. Depending on the input data type (bulk or single-cell, raw reads or assembled reads) the input samplesheet will vary. Please follow the [documentation on samplesheets](https://nf-co.re/airrflow/usage#input-samplesheet) for more details. An example samplesheet for running the pipeline on bulk BCR / TCR sequencing data in fastq format looks as follows:
 
 | sample_id | filename_R1                     | filename_R2                     | filename_I1                     | subject_id | species | pcr_target_locus | tissue | sex    | age | biomaterial_provider | single_cell | intervention   | collection_time_point_relative | cell_subset  |
 | --------- | ------------------------------- | ------------------------------- | ------------------------------- | ---------- | ------- | ---------------- | ------ | ------ | --- | -------------------- | ----------- | -------------- | ------------------------------ | ------------ |
@@ -96,28 +96,38 @@ To run on your data, prepare a tab-separated samplesheet with your input data. D
 
 Each row represents a sample with fastq files (paired-end).
 
-A typical command to run the pipeline is:
+A typical command to run the pipeline from **bulk raw fastq files** is:
 
 ```bash
 nextflow run nf-core/airrflow \
 -profile <docker/singularity/podman/shifter/charliecloud/conda/institute> \
---input samplesheet.tsv \
+--mode fastq \
+--input input_samplesheet.tsv \
 --library_generation_method specific_pcr_umi \
 --cprimers CPrimers.fasta \
 --vprimers VPrimers.fasta \
 --umi_length 12 \
---max_memory 8.GB \
---max_cpus 8 \
+--umi_position R1 \
 --outdir ./results
 ```
 
+A typical command to run the pipeline from **single-cell AIRR rearrangement tables or assembled bulk sequencing fasta** data is:
+
+```bash
+nextflow run nf-core/airrflow \
+-profile <docker/singularity/podman/shifter/charliecloud/conda/institute> \
+--input input_samplesheet.tsv \
+--mode assembled \
+--outdir results
+```
+
+See the [usage documentation](https://nf-co.re/airrflow/usage) and the [parameter documentation](https://nf-co.re/airrflow/parameters) for more details on how to use the pipeline and all the available parameters.
+
 > **Warning:**
 > Please provide pipeline parameters via the CLI or Nextflow `-params-file` option. Custom config files including those
 > provided by the `-c` Nextflow option can be used to provide any configuration _**except for parameters**_;
 > see [docs](https://nf-co.re/usage/configuration#custom-configuration-files).
 
-For more details, please refer to the [usage documentation](https://nf-co.re/airrflow/usage) and the [parameter documentation](https://nf-co.re/airrflow/parameters).
-
 ## Pipeline output
 
 To see the the results of a test run with a full size dataset refer to the [results](https://nf-co.re/airrflow/results) tab on the nf-core website pipeline page.

diff --git a/bin/check_samplesheet.py b/bin/check_samplesheet.py
@@ -68,16 +68,16 @@ def check_samplesheet(file_in, assembled):
             "age",
         ]
         required_columns_assembled = [
-            "sample_id",
             "filename",
+            "sample_id",
             "subject_id",
             "species",
             "pcr_target_locus",
-            "single_cell",
             "sex",
             "tissue",
             "biomaterial_provider",
             "age",
+            "single_cell",
         ]
         no_whitespaces_raw = [
             "sample_id",
@@ -99,9 +99,16 @@ def check_samplesheet(file_in, assembled):
 
         ## Read header
         header = [x.strip('"') for x in fin.readline().strip().split("\t")]
+
         ## Read tab
         tab = pd.read_csv(file_in, sep="\t", header=0)
 
+        ## Set required columns as strings
+        types_dict = dict()
+        types_dict.update({col: str for col in required_columns_assembled[1:7]})
+        for col, col_type in types_dict.items():
+            tab[col] = tab[col].astype(col_type)
+
         # Check that all required columns for assembled and raw samplesheets are there, and do not contain whitespaces
         if assembled:
             for col in required_columns_assembled:
@@ -118,8 +125,12 @@ def check_samplesheet(file_in, assembled):
                             col, no_whitespaces_assembled
                         )
                     )
-
         else:
+            if any(tab["single_cell"].tolist()):
+                print_error(
+                    "Some single cell column values are TRUE. The raw mode only accepts bulk samples. If processing single cell samples, please set the `--mode assembled` flag, and provide an AIRR rearrangement as input."
+                )
+
             for col in required_columns_raw:
                 if col not in header:
                     print("ERROR: Please check samplesheet header: {} ".format(",".join(header)))

diff --git a/conf/modules.config b/conf/modules.config
@@ -99,7 +99,19 @@ process {
         ext.args = '--quiet'
     }
 
-    withName: 'MERGE_UMI' {
+    withName: RENAME_FASTQ {
+        publishDir = [
+            enabled: false
+        ]
+    }
+
+    withName: 'RENAME_FILE_*' {
+        publishDir = [
+            enabled: false
+        ]
+    }
+
+    withName: MERGE_UMI {
         publishDir = [
             [
                 enabled: false
@@ -384,6 +396,7 @@ process {
             mode: params.publish_dir_mode,
             saveAs: { filename -> filename.equals('versions.yml') ? null : filename }
         ]
+        errorStrategy = 'retry'
     }
 
     // ------------------------------