Document correct rnaseq matrix usage

nf-core · Nov 20, 2023 · 1e72670 · 1e72670
1 parent 01fcc71
commit 1e72670
Show file tree

Hide file tree

Showing 3 changed files with 20 additions and 2 deletions.
diff --git a/README.md b/README.md
@@ -51,6 +51,10 @@ RNA-seq:
      -profile rnaseq,<docker/singularity/podman/shifter/charliecloud/conda/institute>
 ```
 
+:::note
+If you are using the outputs of the nf-core rnaseq workflow as input here, please use the **gene_counts_length_scaled.tsv** or **gene_counts_scaled.tsv** matrices. See the [usage documentation](https://nf-co.re/differentialabundance/usage) for more information.
+:::
+
 Affymetrix microarray:
 
 ```bash

diff --git a/assets/differentialabundance_report.Rmd b/assets/differentialabundance_report.Rmd
@@ -37,7 +37,7 @@ params:
   features_gtf_feature_type: NULL
   features_gtf_table_first_field: NULL
   features_log2_assays: NULL
-  raw_matrix: null                                            # e.g. 0_salmon.merged.gene_counts.tsv
+  raw_matrix: null                                            # e.g. 0_salmon.merged.gene_counts_length_scaled.tsv
   normalised_matrix: null
   variance_stabilised_matrix: null                            # e.g. test_files/3_treatment-WT-P23H.vst.tsv
   contrasts_file: null                                        # e.g. GSE156533.contrasts.csv
@@ -944,4 +944,4 @@ print( htmltools::tagList(datatable(versions_table, caption = "Software versions
 
 ```{r, echo=FALSE, results='asis'}
 htmltools::includeMarkdown(params$citations)
-```
+```
diff --git a/docs/usage.md b/docs/usage.md
@@ -67,6 +67,20 @@ The "file" column in this example is used to specify the data file associated wi
 
 This is a numeric square matrix file, comma or tab-separated, with a column for every observation, and features corresponding to the supplied feature set. The parameters `--observations_id_col` and `--features_id_col` define which of the associated fields should be matched in those inputs.
 
+#### Outputs from nf-core/rnaseq and other tximport-processed results
+
+The nf-core rnaseq workflow uses [tximport](https://bioconductor.org/packages/release/bioc/html/tximport.html) to generate its quantification matrices. It does not currently output sufficient information to allow modelling of transcript length biases in differential analysis by this workflow, so we must use matrices per the second recommended approach in the [documentation](https://bioconductor.org/packages/release/bioc/vignettes/tximport/inst/doc/tximport.html#Downstream_DGE_in_Bioconductor):
+
+> "The second method is to use the tximport argument countsFromAbundance="lengthScaledTPM" or "scaledTPM", and then to use the gene-level count matrix txi$counts directly as you would a regular count matrix with these software. Let’s call this method “bias corrected counts without an offset”"
+
+This corresponds to the **gene_counts_length_scaled.tsv** or **gene_counts_scaled.tsv** matrices, respectively, from the rnaseq workflow.
+
+Note that those documents also say:
+
+> "Note: Do not manually pass the original gene-level counts to downstream methods without an offset."
+
+This corresponds to the 'gene_counts.tsv' matrix, so we do not recomend this matrix is used as input for this workflow.
+
 ### MaxQuant intensities
 
 ```bash