Standardize conversion workflow #369

fmalmeida · 2024-09-16T08:46:10Z

Summary

Hi,
This work relates to #310 and discussion that is happening in slack.

Currently, the implementation as been done only for Star aligner so I am opening the PR so we can have a proper discussion and alignment over what the code should look like, before I can apply the same for the other aligners.

PoC overview

Aligners
- star
- alevin
- cellranger
- kallisto
Make mtx_to_h5ad module have aligner-specific template to avoid having a gigantic single module with multiple conditions
added a copy of cellbender subworkflow from scdownstream to perform a more straightforward empty drops filter that keeps format of h5ad input
- Thus, not requiring further work after filter, since it will keep the format of its input, which should be the standardized h5ad
Moved h5ad to Rds conversion to use anndataR package so to keep it simple

TODOs summary

Define where (and what) h5ad format standardization should happen
- Should it happen directly in the aligner-specific template script (would be the most straightforward) or should it be in a separate (maybe the anndataR conversion) module that will perform the standardization and generate a new h5ad for all aligners?
Create the anndataR docker container within nf-core (currently is using my personal repo)
Define what is the desired standardization for h5ad
- which commands should be executed?
Cleanup repo (remove old modules currently as backup, documentation, etc.)

Addressed issues

Close #385
Close #310
Close #370
Close #330

github-actions · 2024-09-16T08:48:31Z

`nf-core pipelines lint` overall result: Passed ✅ ⚠️

Posted for pipeline commit 079bb7e

+| ✅ 216 tests passed       |+
#| ❔   4 tests were ignored |#
!| ❗   4 tests had warnings |!

❗ Test warnings:

pipeline_todos - TODO string in nextflow.config: Optionally, you can add a pipeline-specific nf-core config at https://github.com/nf-core/configs
pipeline_todos - TODO string in main.nf: Optionally add in-text citation tools to this list.
pipeline_todos - TODO string in main.nf: Optionally add bibliographic entries to this list.
pipeline_todos - TODO string in main.nf: Only uncomment below if logic in toolCitationText/toolBibliographyText has been filled!

❔ Tests ignored:

files_exist - File is ignored: lib/Utils.groovy
files_unchanged - File ignored due to lint config: .github/ISSUE_TEMPLATE/bug_report.yml
template_strings - template_strings
schema_params - schema_params

✅ Tests passed:

files_exist - File found: .gitattributes
files_exist - File found: .gitignore
files_exist - File found: .nf-core.yml
files_exist - File found: .editorconfig
files_exist - File found: .prettierignore
files_exist - File found: .prettierrc.yml
files_exist - File found: CHANGELOG.md
files_exist - File found: CITATIONS.md
files_exist - File found: CODE_OF_CONDUCT.md
files_exist - File found: LICENSE or LICENSE.md or LICENCE or LICENCE.md
files_exist - File found: nextflow_schema.json
files_exist - File found: nextflow.config
files_exist - File found: README.md
files_exist - File found: .github/.dockstore.yml
files_exist - File found: .github/CONTRIBUTING.md
files_exist - File found: .github/ISSUE_TEMPLATE/bug_report.yml
files_exist - File found: .github/ISSUE_TEMPLATE/config.yml
files_exist - File found: .github/ISSUE_TEMPLATE/feature_request.yml
files_exist - File found: .github/PULL_REQUEST_TEMPLATE.md
files_exist - File found: .github/workflows/branch.yml
files_exist - File found: .github/workflows/ci.yml
files_exist - File found: .github/workflows/linting_comment.yml
files_exist - File found: .github/workflows/linting.yml
files_exist - File found: assets/email_template.html
files_exist - File found: assets/email_template.txt
files_exist - File found: assets/sendmail_template.txt
files_exist - File found: assets/nf-core-scrnaseq_logo_light.png
files_exist - File found: conf/modules.config
files_exist - File found: conf/test.config
files_exist - File found: conf/test_full.config
files_exist - File found: docs/images/nf-core-scrnaseq_logo_light.png
files_exist - File found: docs/images/nf-core-scrnaseq_logo_dark.png
files_exist - File found: docs/output.md
files_exist - File found: docs/README.md
files_exist - File found: docs/README.md
files_exist - File found: docs/usage.md
files_exist - File found: main.nf
files_exist - File found: assets/multiqc_config.yml
files_exist - File found: conf/base.config
files_exist - File found: conf/igenomes.config
files_exist - File found: conf/igenomes_ignored.config
files_exist - File found: .github/workflows/awstest.yml
files_exist - File found: .github/workflows/awsfulltest.yml
files_exist - File found: modules.json
files_exist - File not found check: .github/ISSUE_TEMPLATE/bug_report.md
files_exist - File not found check: .github/ISSUE_TEMPLATE/feature_request.md
files_exist - File not found check: .github/workflows/push_dockerhub.yml
files_exist - File not found check: .markdownlint.yml
files_exist - File not found check: .nf-core.yaml
files_exist - File not found check: .yamllint.yml
files_exist - File not found check: bin/markdown_to_html.r
files_exist - File not found check: conf/aws.config
files_exist - File not found check: docs/images/nf-core-scrnaseq_logo.png
files_exist - File not found check: lib/Checks.groovy
files_exist - File not found check: lib/Completion.groovy
files_exist - File not found check: lib/NfcoreTemplate.groovy
files_exist - File not found check: lib/Workflow.groovy
files_exist - File not found check: lib/WorkflowMain.groovy
files_exist - File not found check: lib/WorkflowScrnaseq.groovy
files_exist - File not found check: parameters.settings.json
files_exist - File not found check: pipeline_template.yml
files_exist - File not found check: Singularity
files_exist - File not found check: lib/nfcore_external_java_deps.jar
files_exist - File not found check: .travis.yml
nextflow_config - Found nf-schema plugin
nextflow_config - Config variable found: manifest.name
nextflow_config - Config variable found: manifest.nextflowVersion
nextflow_config - Config variable found: manifest.description
nextflow_config - Config variable found: manifest.version
nextflow_config - Config variable found: manifest.homePage
nextflow_config - Config variable found: timeline.enabled
nextflow_config - Config variable found: trace.enabled
nextflow_config - Config variable found: report.enabled
nextflow_config - Config variable found: dag.enabled
nextflow_config - Config variable found: process.cpus
nextflow_config - Config variable found: process.memory
nextflow_config - Config variable found: process.time
nextflow_config - Config variable found: params.outdir
nextflow_config - Config variable found: params.input
nextflow_config - Config variable found: validation.help.enabled
nextflow_config - Config variable found: manifest.mainScript
nextflow_config - Config variable found: timeline.file
nextflow_config - Config variable found: trace.file
nextflow_config - Config variable found: report.file
nextflow_config - Config variable found: dag.file
nextflow_config - Config variable found: validation.help.beforeText
nextflow_config - Config variable found: validation.help.afterText
nextflow_config - Config variable found: validation.help.command
nextflow_config - Config variable found: validation.summary.beforeText
nextflow_config - Config variable found: validation.summary.afterText
nextflow_config - Config variable (correctly) not found: params.nf_required_version
nextflow_config - Config variable (correctly) not found: params.container
nextflow_config - Config variable (correctly) not found: params.singleEnd
nextflow_config - Config variable (correctly) not found: params.igenomesIgnore
nextflow_config - Config variable (correctly) not found: params.name
nextflow_config - Config variable (correctly) not found: params.enable_conda
nextflow_config - Config variable (correctly) not found: params.max_cpus
nextflow_config - Config variable (correctly) not found: params.max_memory
nextflow_config - Config variable (correctly) not found: params.max_time
nextflow_config - Config variable (correctly) not found: params.validationFailUnrecognisedParams
nextflow_config - Config variable (correctly) not found: params.validationLenientMode
nextflow_config - Config variable (correctly) not found: params.validationSchemaIgnoreParams
nextflow_config - Config variable (correctly) not found: params.validationShowHiddenParams
nextflow_config - Config timeline.enabled had correct value: true
nextflow_config - Config report.enabled had correct value: true
nextflow_config - Config trace.enabled had correct value: true
nextflow_config - Config dag.enabled had correct value: true
nextflow_config - Config manifest.name began with nf-core/
nextflow_config - Config variable manifest.homePage began with https://github.com/nf-core/
nextflow_config - Config dag.file ended with .html
nextflow_config - Config variable manifest.nextflowVersion started with >= or !>=
nextflow_config - Config manifest.version ends in dev: 2.8.0dev
nextflow_config - Config params.custom_config_version is set to master
nextflow_config - Config params.custom_config_base is set to https://raw.githubusercontent.com/nf-core/configs/master
nextflow_config - Lines for loading custom profiles found
nextflow_config - nextflow.config contains configuration profile test
nextflow_config - Config default value correct: params.aligner= alevin
nextflow_config - Config default value correct: params.protocol= auto
nextflow_config - Config default value correct: params.igenomes_base= s3://ngi-igenomes/igenomes/
nextflow_config - Config default value correct: params.simpleaf_rlen= 91
nextflow_config - Config default value correct: params.star_feature= Gene
nextflow_config - Config default value correct: params.kb_workflow= standard
nextflow_config - Config default value correct: params.custom_config_version= master
nextflow_config - Config default value correct: params.custom_config_base= https://raw.githubusercontent.com/nf-core/configs/master
nextflow_config - Config default value correct: params.publish_dir_mode= copy
nextflow_config - Config default value correct: params.max_multiqc_email_size= 25.MB
nextflow_config - Config default value correct: params.validate_params= true
nextflow_config - Config default value correct: params.pipelines_testdata_base_path= https://raw.githubusercontent.com/nf-core/test-datasets/
files_unchanged - .gitattributes matches the template
files_unchanged - .prettierrc.yml matches the template
files_unchanged - CODE_OF_CONDUCT.md matches the template
files_unchanged - LICENSE matches the template
files_unchanged - .github/.dockstore.yml matches the template
files_unchanged - .github/CONTRIBUTING.md matches the template
files_unchanged - .github/ISSUE_TEMPLATE/config.yml matches the template
files_unchanged - .github/ISSUE_TEMPLATE/feature_request.yml matches the template
files_unchanged - .github/PULL_REQUEST_TEMPLATE.md matches the template
files_unchanged - .github/workflows/branch.yml matches the template
files_unchanged - .github/workflows/linting_comment.yml matches the template
files_unchanged - .github/workflows/linting.yml matches the template
files_unchanged - assets/email_template.html matches the template
files_unchanged - assets/email_template.txt matches the template
files_unchanged - assets/sendmail_template.txt matches the template
files_unchanged - assets/nf-core-scrnaseq_logo_light.png matches the template
files_unchanged - docs/images/nf-core-scrnaseq_logo_light.png matches the template
files_unchanged - docs/images/nf-core-scrnaseq_logo_dark.png matches the template
files_unchanged - docs/README.md matches the template
files_unchanged - .gitignore matches the template
files_unchanged - .prettierignore matches the template
actions_ci - '.github/workflows/ci.yml' is triggered on expected events
actions_ci - '.github/workflows/ci.yml' checks minimum NF version
actions_awstest - '.github/workflows/awstest.yml' is triggered correctly
actions_awsfulltest - .github/workflows/awsfulltest.yml is triggered correctly
actions_awsfulltest - .github/workflows/awsfulltest.yml does not use -profile test
readme - README Nextflow minimum version badge matched config. Badge: 24.04.2, Config: 24.04.2
readme - README Zenodo placeholder was replaced with DOI.
plugin_includes - No wrong validation plugin imports have been found
pipeline_name_conventions - Name adheres to nf-core convention
schema_lint - Schema lint passed
schema_lint - Schema title + description lint passed
schema_lint - Input mimetype lint passed: 'text/csv'
system_exit - No System.exit calls found
actions_schema_validation - Workflow validation passed: linting.yml
actions_schema_validation - Workflow validation passed: branch.yml
actions_schema_validation - Workflow validation passed: fix-linting.yml
actions_schema_validation - Workflow validation passed: release-announcements.yml
actions_schema_validation - Workflow validation passed: awsfulltest.yml
actions_schema_validation - Workflow validation passed: template_version_comment.yml
actions_schema_validation - Workflow validation passed: download_pipeline.yml
actions_schema_validation - Workflow validation passed: ci.yml
actions_schema_validation - Workflow validation passed: clean-up.yml
actions_schema_validation - Workflow validation passed: awstest.yml
actions_schema_validation - Workflow validation passed: linting_comment.yml
merge_markers - No merge markers found in pipeline files
modules_json - Only installed modules found in modules.json
multiqc_config - assets/multiqc_config.yml found and not ignored.
multiqc_config - assets/multiqc_config.yml contains report_section_order
multiqc_config - assets/multiqc_config.yml contains export_plots
multiqc_config - assets/multiqc_config.yml contains report_comment
multiqc_config - assets/multiqc_config.yml follows the ordering scheme of the minimally required plugins.
multiqc_config - assets/multiqc_config.yml contains a matching 'report_comment'.
multiqc_config - assets/multiqc_config.yml contains 'export_plots: true'.
modules_structure - modules directory structure is correct 'modules/nf-core/TOOL/SUBTOOL'
base_config - conf/base.config found and not ignored.
modules_config - conf/modules.config found and not ignored.
modules_config - FASTQC found in conf/modules.config and Nextflow scripts.
modules_config - MULTIQC found in conf/modules.config and Nextflow scripts.
modules_config - CELLBENDER_REMOVEBACKGROUND found in conf/modules.config and Nextflow scripts.
modules_config - ADATA_BARCODES found in conf/modules.config and Nextflow scripts.
modules_config - MTX_TO_H5AD found in conf/modules.config and Nextflow scripts.
modules_config - GTF_GENE_FILTER found in conf/modules.config and Nextflow scripts.
modules_config - CELLRANGER_MKGTF found in conf/modules.config and Nextflow scripts.
modules_config - CELLRANGER_MKREF found in conf/modules.config and Nextflow scripts.
modules_config - CELLRANGER_COUNT found in conf/modules.config and Nextflow scripts.
modules_config - CELLRANGERARC_MKGTF found in conf/modules.config and Nextflow scripts.
modules_config - CELLRANGERARC_MKREF found in conf/modules.config and Nextflow scripts.
modules_config - CELLRANGERARC_COUNT found in conf/modules.config and Nextflow scripts.
modules_config - CELLRANGER_MKGTF found in conf/modules.config and Nextflow scripts.
modules_config - CELLRANGER_MKREF found in conf/modules.config and Nextflow scripts.
modules_config - UNIVERSC found in conf/modules.config and Nextflow scripts.
modules_config - GFFREAD_TXP2GENE found in conf/modules.config and Nextflow scripts.
modules_config - SIMPLEAF_INDEX found in conf/modules.config and Nextflow scripts.
modules_config - SIMPLEAF_QUANT found in conf/modules.config and Nextflow scripts.
modules_config - ALEVINQC found in conf/modules.config and Nextflow scripts.
modules_config - STAR_GENOMEGENERATE found in conf/modules.config and Nextflow scripts.
modules_config - STAR_ALIGN found in conf/modules.config and Nextflow scripts.
modules_config - KALLISTOBUSTOOLS_REF found in conf/modules.config and Nextflow scripts.
modules_config - KALLISTOBUSTOOLS_COUNT found in conf/modules.config and Nextflow scripts.
modules_config - FASTQC found in conf/modules.config and Nextflow scripts.
modules_config - NFCORE_SCRNASEQ found in conf/modules.config and Nextflow scripts.
modules_config - GUNZIP found in conf/modules.config and Nextflow scripts.
modules_config - CELLRANGER_MKGTF found in conf/modules.config and Nextflow scripts.
modules_config - CELLRANGER_MKREF found in conf/modules.config and Nextflow scripts.
modules_config - CELLRANGER_MKVDJREF found in conf/modules.config and Nextflow scripts.
nfcore_yml - Repository type in .nf-core.yml is valid: pipeline
nfcore_yml - nf-core version in .nf-core.yml is set to the latest version: 3.0.2

Run details

nf-core/tools version 3.0.2
Run at 2024-10-30 11:16:56

grst · 2024-10-28T10:01:12Z

so we would double the execution of the mtx conversions? one to at least have it before cellbender, and one again afterwards?

only for filtered/unfiltered (i.e. what comes out of the aligner). The rest (e.g. cellbender) should run entirely in h5ad space.
The idea is really to generate a h5ad once that is standardized across all aligners, and then use this for everything downstream.

modules/local/adata_barcodes.nf

modules/local/templates/anndatar_convert.R

modules/local/templates/concat_h5ad.py

modules/local/templates/mtx_to_h5ad_cellranger.py

modules/local/templates/mtx_to_h5ad_kallisto.py

grst · 2024-10-28T11:27:13Z

modules/local/templates/mtx_to_h5ad_kallisto.py

+    for type in ['spliced', 'unspliced']:
+        input_to_adata(
+            matrix=glob.glob("${inputs}/" + f"{type}*.mtx")[0],
+            barcodes=glob.glob("${inputs}/" + f"{type}*.barcodes.txt")[0],
+            features=glob.glob("${inputs}/" + f"{type}*.genes.txt")[0],
+            output="${meta.id}/${meta.id}_${meta.input_type}" + f"_{type}_matrix.h5ad",
+            sample="${meta.id}",
+            t2g="${txp2gene}"
+        )


Better to make a single anndata object and store this information in .layers.

Co-authored-by: Gregor Sturm <[email protected]>

nf-core-bot · 2024-10-30T08:03:31Z

Warning

Newer version of the nf-core template is available.

Your pipeline is using an old version of the nf-core template: 2.14.1.
Please update your pipeline to the latest version.

For more documentation on how to update your pipeline, please see the nf-core documentation and Synchronisation documentation.

Co-authored-by: Gregor Sturm <[email protected]>

…options

zxBIB Almeida,Felipe (GCBDS) EXTERNAL added 23 commits August 19, 2024 11:22

comment out for development

52fa9ca

refact modules for STAR aligner

bbf299c

directly pass txp2gene

c3eb5ea

simplify module lines

4771ed1

emit h5ad on starsolo

af85f87

add versions emition

d2a5386

update module to use templates and cleanup way of priting versions

0e55051

fix h5ad generator script

994d5a5

simplify check

b317165

Fix h5ad structure

72e9d50

updated concat module

8829080

workflow misses emptydrops and seurat & mtx conversion modules

9fc75b4

start emptydrops cellbender subworkflow

dc69f47

fix paths

9c20420

started the anndatar standardization module

6831195

concat h5ad with anndatar h5ad

237d1ca

update tag information

f5fb4a2

update tags

752666b

module is only to convert to rds

d0ad7f6

update directives

20ac4ae

update comments

da5b036

add cellbender to workflow

b90f388

start organisation of files

7c304cc

fmalmeida requested a review from grst September 16, 2024 08:46

fmalmeida self-assigned this Sep 16, 2024

fmalmeida linked an issue Sep 16, 2024 that may be closed by this pull request

Clean up mtx conversion code #310

Open

fmalmeida requested a review from apeltzer September 19, 2024 12:06

zxBIB Almeida,Felipe (GCBDS) EXTERNAL added 2 commits September 26, 2024 09:25

fix file naming

8320431

resolve emptydrops naming

1c95e85

grst reviewed Oct 28, 2024

View reviewed changes

This was referenced Oct 28, 2024

MTX_TO_H5AD module cannot take ch_star_index #370

Open

Error with CONCAT_H5AD #330

Open

write uncompressed

c1e8357

Co-authored-by: Gregor Sturm <[email protected]>

fmalmeida and others added 24 commits October 30, 2024 09:04

use .astype(str)

ae85710

Co-authored-by: Gregor Sturm <[email protected]>

simplify iteration

ae8809a

Co-authored-by: Gregor Sturm <[email protected]>

perform join left operation

68464de

Co-authored-by: Gregor Sturm <[email protected]>

do not compress output h5ad

fdedc4d

Co-authored-by: Gregor Sturm <[email protected]>

perform join left operation

187dbf6

Co-authored-by: Gregor Sturm <[email protected]>

not compress h5ad output

98af608

simplify iteration

efd6299

simplify index iteration

f357fd7

fix unmatched parenthesis

bd1a74c

fix use of igenomes ... pipeline was not properly selecting igenomes …

4731e00

…options

correct values parsing

a71348f

fix container registry

d2f9cfd

do not save versions files

5c31226

also convert concat h5ads

8278429

manage subdirectory in publishDir

2b199d0

match template scripts with new publishDir

cb67797

have outputs separated

7230095

make parsing inside sub-workflow

206a7c1

added kallisto to correct structure

c4a09e0

correct mix of channels

971667b

correct for cellranger multi

8122113

correct stub

44d1cb7

remove glob in txp2gene

18cfdd7

added small comment to local modules

079bb7e

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Standardize conversion workflow #369

Standardize conversion workflow #369

fmalmeida commented Sep 16, 2024 •

edited by grst

Loading

github-actions bot commented Sep 16, 2024 •

edited

Loading

❗ Test warnings:

❔ Tests ignored:

✅ Tests passed:

Run details

grst commented Oct 28, 2024

grst Oct 28, 2024

nf-core-bot commented Oct 30, 2024

Standardize conversion workflow #369

Are you sure you want to change the base?

Standardize conversion workflow #369

Conversation

fmalmeida commented Sep 16, 2024 • edited by grst Loading

Summary

PoC overview

TODOs summary

Addressed issues

github-actions bot commented Sep 16, 2024 • edited Loading

nf-core pipelines lint overall result: Passed ✅ ⚠️

❗ Test warnings:

❔ Tests ignored:

✅ Tests passed:

Run details

grst commented Oct 28, 2024

grst Oct 28, 2024

Choose a reason for hiding this comment

nf-core-bot commented Oct 30, 2024

fmalmeida commented Sep 16, 2024 •

edited by grst

Loading

github-actions bot commented Sep 16, 2024 •

edited

Loading

`nf-core pipelines lint` overall result: Passed ✅ ⚠️