Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Standardize conversion workflow #369

Draft
wants to merge 66 commits into
base: dev
Choose a base branch
from
Draft
Show file tree
Hide file tree
Changes from 41 commits
Commits
Show all changes
66 commits
Select commit Hold shift + click to select a range
52fa9ca
comment out for development
Aug 19, 2024
bbf299c
refact modules for STAR aligner
Aug 19, 2024
c3eb5ea
directly pass txp2gene
Aug 19, 2024
4771ed1
simplify module lines
Aug 19, 2024
af85f87
emit h5ad on starsolo
Aug 19, 2024
d2a5386
add versions emition
Aug 19, 2024
0e55051
update module to use templates and cleanup way of priting versions
Aug 19, 2024
994d5a5
fix h5ad generator script
Aug 20, 2024
b317165
simplify check
Aug 20, 2024
72e9d50
Fix h5ad structure
Aug 21, 2024
8829080
updated concat module
Aug 30, 2024
9fc75b4
workflow misses emptydrops and seurat & mtx conversion modules
Aug 30, 2024
dc69f47
start emptydrops cellbender subworkflow
Sep 12, 2024
9c20420
fix paths
Sep 13, 2024
6831195
started the anndatar standardization module
Sep 13, 2024
237d1ca
concat h5ad with anndatar h5ad
Sep 13, 2024
f5fb4a2
update tag information
Sep 13, 2024
752666b
update tags
Sep 13, 2024
d0ad7f6
module is only to convert to rds
Sep 13, 2024
20ac4ae
update directives
Sep 13, 2024
da5b036
update comments
Sep 13, 2024
b90f388
add cellbender to workflow
Sep 13, 2024
7c304cc
start organisation of files
Sep 13, 2024
8320431
fix file naming
Sep 26, 2024
1c95e85
resolve emptydrops naming
Sep 26, 2024
e6fff34
also convert emptydrops filter matrices
Sep 26, 2024
e9c88e8
move files to BKP since they will be replaced
Sep 26, 2024
f7f5fa5
add conversion for alevin
Sep 26, 2024
73c91c2
re-organise star modules
Sep 26, 2024
417bf69
fixed publishDir directives
Sep 27, 2024
94bf9c1
add h5ad conversion module
Sep 30, 2024
43a29c2
integrate to mtx_conversion module
Sep 30, 2024
7fc32f0
reorganize module levels and fix docker image
Sep 30, 2024
e366396
reorganize star modules
Sep 30, 2024
76f126e
re-structure and include cellranger
Oct 4, 2024
b92af8b
add alevin to new structure
Oct 4, 2024
c57b09f
add star to new structure
Oct 4, 2024
d0d1f81
Add kallisto standard workflow to structure
Oct 4, 2024
aa99cd1
Account for non-standard kallisto workflows
Oct 4, 2024
0ca1161
Simplify alevin template
Oct 4, 2024
28251b0
reorganise star template
Oct 7, 2024
c1e8357
write uncompressed
fmalmeida Oct 30, 2024
ae85710
use .astype(str)
fmalmeida Oct 30, 2024
ae8809a
simplify iteration
fmalmeida Oct 30, 2024
68464de
perform join left operation
fmalmeida Oct 30, 2024
fdedc4d
do not compress output h5ad
fmalmeida Oct 30, 2024
187dbf6
perform join left operation
fmalmeida Oct 30, 2024
98af608
not compress h5ad output
fmalmeida Oct 30, 2024
efd6299
simplify iteration
fmalmeida Oct 30, 2024
f357fd7
simplify index iteration
fmalmeida Oct 30, 2024
bd1a74c
fix unmatched parenthesis
fmalmeida Oct 30, 2024
4731e00
fix use of igenomes ... pipeline was not properly selecting igenomes …
fmalmeida Oct 30, 2024
a71348f
correct values parsing
fmalmeida Oct 30, 2024
d2f9cfd
fix container registry
fmalmeida Oct 30, 2024
5c31226
do not save versions files
fmalmeida Oct 30, 2024
8278429
also convert concat h5ads
fmalmeida Oct 30, 2024
2b199d0
manage subdirectory in publishDir
fmalmeida Oct 30, 2024
cb67797
match template scripts with new publishDir
fmalmeida Oct 30, 2024
7230095
have outputs separated
fmalmeida Oct 30, 2024
206a7c1
make parsing inside sub-workflow
fmalmeida Oct 30, 2024
c4a09e0
added kallisto to correct structure
fmalmeida Oct 30, 2024
971667b
correct mix of channels
fmalmeida Oct 30, 2024
8122113
correct for cellranger multi
fmalmeida Oct 30, 2024
44d1cb7
correct stub
fmalmeida Oct 30, 2024
18cfdd7
remove glob in txp2gene
fmalmeida Oct 30, 2024
079bb7e
added small comment to local modules
fmalmeida Oct 30, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
29 changes: 18 additions & 11 deletions conf/modules.config
Original file line number Diff line number Diff line change
Expand Up @@ -32,25 +32,31 @@ process {
}

if (!params.skip_emptydrops) {
withName: EMPTYDROPS_CELL_CALLING {
withName: 'CELLBENDER_REMOVEBACKGROUND' {
publishDir = [
path: { "${params.outdir}/${params.aligner}" },
path: { "${params.outdir}/${params.aligner}/${meta.id}/emptydrops_filter" },
mode: params.publish_dir_mode,
saveAs: { filename -> filename.equals('versions.yml') ? null : filename }
]
}
withName: 'ADATA_BARCODES' {
ext.prefix = { "${meta.id}_${meta.input_type}_matrix" }
publishDir = [
path: { "${params.outdir}/${params.aligner}/mtx_conversions/${meta.id}" },
mode: params.publish_dir_mode,
saveAs: { filename ->
if ( params.aligner == 'cellranger' ) "count/${meta.id}/${filename}"
else if ( params.aligner == 'kallisto' ) "${meta.id}.count/${filename}"
else "${meta.id}/${filename}"
}
saveAs: { filename -> filename.equals('versions.yml') ? null : filename }
]
}
}

withName: 'MTX_TO_H5AD|CONCAT_H5AD|MTX_TO_SEURAT' {
withName: 'MTX_TO_H5AD|CONCAT_H5AD|ANNDATAR_CONVERT' {
publishDir = [
path: { "${params.outdir}/${params.aligner}/mtx_conversions" },
mode: params.publish_dir_mode
mode: params.publish_dir_mode,
saveAs: { filename -> filename.equals('versions.yml') ? null : filename }
]
}

withName: 'GTF_GENE_FILTER' {
publishDir = [
path: { "${params.outdir}/gtf_filter" },
Expand Down Expand Up @@ -161,8 +167,9 @@ if (params.aligner == "alevin") {
}
withName: 'SIMPLEAF_QUANT' {
publishDir = [
path: { "${params.outdir}/${params.aligner}" },
mode: params.publish_dir_mode
path: { "${params.outdir}/${params.aligner}/${meta.id}" },
mode: params.publish_dir_mode,
saveAs: { filename -> filename.equals('versions.yml') ? null : filename }
]
ext.args = "-r cr-like"
}
Expand Down
5 changes: 5 additions & 0 deletions modules.json
Original file line number Diff line number Diff line change
Expand Up @@ -5,6 +5,11 @@
"https://github.com/nf-core/modules.git": {
"modules": {
"nf-core": {
"cellbender/removebackground": {
"branch": "master",
"git_sha": "06c8865e36741e05ad32ef70ab3fac127486af48",
"installed_by": ["modules"]
},
"cellranger/count": {
"branch": "master",
"git_sha": "90dad5491658049282ceb287a3d7732c1ce39837",
Expand Down
File renamed without changes.
139 changes: 139 additions & 0 deletions modules/local/BKP/mtx_to_h5ad.nf
Original file line number Diff line number Diff line change
@@ -0,0 +1,139 @@
process MTX_TO_H5AD {
tag "$meta.id"
label 'process_medium'

conda "conda-forge::scanpy conda-forge::python-igraph conda-forge::leidenalg"
container "${ workflow.containerEngine == 'singularity' && !task.ext.singularity_pull_docker_container ?
'https://depot.galaxyproject.org/singularity/scanpy:1.7.2--pyhdfd78af_0' :
'biocontainers/scanpy:1.7.2--pyhdfd78af_0' }"

input:
// inputs from cellranger nf-core module does not come in a single sample dir
// for each sample, the sub-folders and files come directly in array.
tuple val(meta), path(inputs)
path txp2gene
path star_index

output:
tuple val(input_type), path("${meta.id}/*h5ad") , emit: h5ad
path "versions.yml" , emit: versions

when:
task.ext.when == null || task.ext.when

script:
// Get a file to check input type. Some aligners bring arrays instead of a single file.
def input_to_check = (inputs instanceof String) ? inputs : inputs[0]

// check input type of inputs
input_type = (input_to_check.toUriString().contains('unfiltered') || input_to_check.toUriString().contains('raw')) ? 'raw' : 'filtered'
if ( params.aligner == 'alevin' ) { input_type = 'raw' } // alevin has its own filtering methods and mostly output a single mtx, 'raw' here means, the base tool output
if (input_to_check.toUriString().contains('emptydrops')) { input_type = 'custom_emptydrops_filter' }

// def file paths for aligners. Cellranger is normally converted with the .h5 files
// However, the emptydrops call, always generate .mtx files, thus, cellranger 'emptydrops' required a parsing
if (params.aligner in [ 'cellranger', 'cellrangerarc', 'cellrangermulti' ] && input_type == 'custom_emptydrops_filter') {

aligner = 'cellranger'
txp2gene = ''
star_index = ''
mtx_matrix = "emptydrops_filtered/matrix.mtx"
barcodes_tsv = "emptydrops_filtered/barcodes.tsv"
features_tsv = "emptydrops_filtered/features.tsv"

} else if (params.aligner == 'kallisto') {

kb_pattern = (input_type == 'raw') ? 'un' : ''
mtx_dir = (input_type == 'custom_emptydrops_filter') ? 'emptydrops_filtered' : "counts_${kb_pattern}filtered"
if ((input_type == 'custom_emptydrops_filter') && (params.kb_workflow != 'standard')) { mtx_dir = 'emptydrops_filtered/\${input_type}' } // dir has subdirs for non-standard workflows
mtx_matrix = "${mtx_dir}/*.mtx"
barcodes_tsv = "${mtx_dir}/*.barcodes.txt"
features_tsv = "${mtx_dir}/*.genes.names.txt"

// kallisto allows the following workflows: ["standard", "lamanno", "nac"]
// lamanno creates "spliced" and "unspliced"
// nac creates "nascent", "ambiguous" "mature"
// also, lamanno produces a barcodes and genes file for both spliced and unspliced
// while nac keep only one for all the different .mtx files produced
kb_non_standard_files = ""
if (params.kb_workflow == "lamanno") {
kb_non_standard_files = "spliced unspliced"
matrix = "${mtx_dir}/\${input_type}.mtx"
barcodes_tsv = "${mtx_dir}/\${input_type}.barcodes.txt"
features_tsv = "${mtx_dir}/\${input_type}.genes.txt"
}
if (params.kb_workflow == "nac") {
kb_non_standard_files = "nascent ambiguous mature"
matrix = "${mtx_dir}/*\${input_type}.mtx"
features_tsv = "${mtx_dir}/*.genes.txt"
} // barcodes tsv has same pattern as standard workflow

} else if (params.aligner == 'alevin') {

// alevin does not have filtered/unfiltered results
mtx_dir = (input_type == 'custom_emptydrops_filter') ? 'emptydrops_filtered' : '*_alevin_results/af_quant/alevin'
mtx_matrix = "${mtx_dir}/quants_mat.mtx"
barcodes_tsv = "${mtx_dir}/quants_mat_rows.txt"
features_tsv = "${mtx_dir}/quants_mat_cols.txt"

} else if (params.aligner == 'star') {

mtx_dir = (input_type == 'custom_emptydrops_filter') ? 'emptydrops_filtered' : "${input_type}"
suffix = (input_type == 'custom_emptydrops_filter') ? '' : '.gz'
mtx_matrix = "${mtx_dir}/matrix.mtx${suffix}"
barcodes_tsv = "${mtx_dir}/barcodes.tsv${suffix}"
features_tsv = "${mtx_dir}/features.tsv${suffix}"

}

//
// run script
//
if (params.aligner in [ "cellranger", "cellrangerarc", "cellrangermulti"] && input_type != 'custom_emptydrops_filter')
"""
# convert file types
mtx_to_h5ad.py \\
--aligner cellranger \\
--input *${input_type}_feature_bc_matrix.h5 \\
--sample ${meta.id} \\
--out ${meta.id}/${meta.id}_${input_type}_matrix.h5ad
"""

else if (params.aligner == 'kallisto' && params.kb_workflow != 'standard')
"""
# convert file types
for input_type in ${kb_non_standard_files} ; do
mtx_to_h5ad.py \\
--aligner ${params.aligner} \\
--sample ${meta.id} \\
--input ${matrix} \\
--barcode ${barcodes_tsv} \\
--feature ${features_tsv} \\
--txp2gene ${txp2gene} \\
--star_index ${star_index} \\
--out ${meta.id}/${meta.id}_\${input_type}_matrix.h5ad ;
done
"""

else
"""
# convert file types
mtx_to_h5ad.py \\
--task_process ${task.process} \\
--aligner ${params.aligner} \\
--sample ${meta.id} \\
--input $mtx_matrix \\
--barcode $barcodes_tsv \\
--feature $features_tsv \\
--txp2gene ${txp2gene} \\
--star_index ${star_index} \\
--out ${meta.id}/${meta.id}_${input_type}_matrix.h5ad
"""

stub:
"""
mkdir ${meta.id}
touch ${meta.id}/${meta.id}_matrix.h5ad
touch versions.yml
"""
}
File renamed without changes.
23 changes: 23 additions & 0 deletions modules/local/adata_barcodes.nf
Original file line number Diff line number Diff line change
@@ -0,0 +1,23 @@
process ADATA_BARCODES {
tag "$meta.id"
fmalmeida marked this conversation as resolved.
Show resolved Hide resolved
label 'process_single'

conda "${moduleDir}/environment.yml"
container "${ workflow.containerEngine == 'singularity' && !task.ext.singularity_pull_docker_container ?
'oras://community.wave.seqera.io/library/anndata:0.10.7--e9840a94592528c8':
'community.wave.seqera.io/library/anndata:0.10.7--336c6c1921a0632b' }"

input:
tuple val(meta), path(h5ad), path(barcodes_csv)

output:
tuple val(meta), path("*.h5ad"), emit: h5ad
path "versions.yml" , emit: versions

when:
task.ext.when == null || task.ext.when

script:
prefix = task.ext.prefix ?: "${meta.id}"
template 'barcodes.py'
}
24 changes: 24 additions & 0 deletions modules/local/anndatar_convert.nf
Original file line number Diff line number Diff line change
@@ -0,0 +1,24 @@
process ANNDATAR_CONVERT {
tag "${meta.id}"

label 'process_medium'

container "docker://fmalmeida/anndatar:dev" // TODO: Fix

input:
tuple val(meta), path(h5ad)

output:
tuple val(meta), path("${meta.id}/${meta.id}_${meta.input_type}_matrix.Rds"), emit: rds

when:
task.ext.when == null || task.ext.when

script:
template 'anndatar_convert.R'

stub:
"""
touch ${meta.id}.Rds
"""
}
17 changes: 6 additions & 11 deletions modules/local/concat_h5ad.nf
Original file line number Diff line number Diff line change
@@ -1,13 +1,13 @@
process CONCAT_H5AD {
tag "${meta.id}"

label 'process_medium'

conda "conda-forge::scanpy conda-forge::python-igraph conda-forge::leidenalg"
container "${ workflow.containerEngine == 'singularity' && !task.ext.singularity_pull_docker_container ?
'https://depot.galaxyproject.org/singularity/scanpy:1.7.2--pyhdfd78af_0' :
'biocontainers/scanpy:1.7.2--pyhdfd78af_0' }"
conda "conda-forge::scanpy==1.10.2 conda-forge::python-igraph conda-forge::leidenalg"
container "community.wave.seqera.io/library/scanpy:1.10.2--e83da2205b92a538"

input:
tuple val(input_type), path(h5ad)
tuple val(meta), path(h5ad)
path samplesheet

output:
Expand All @@ -17,12 +17,7 @@ process CONCAT_H5AD {
task.ext.when == null || task.ext.when

script:
"""
concat_h5ad.py \\
--input $samplesheet \\
--out combined_${input_type}_matrix.h5ad \\
--suffix "_matrix.h5ad"
"""
template 'concat_h5ad.py'

stub:
"""
Expand Down
Loading
Loading