diff --git a/docs/workflows/standalone/dorado_basecalling.md b/docs/workflows/standalone/dorado_basecalling.md index 296355849..95d0c7ea0 100644 --- a/docs/workflows/standalone/dorado_basecalling.md +++ b/docs/workflows/standalone/dorado_basecalling.md @@ -12,11 +12,11 @@ The Dorado Basecalling workflow is used to convert Oxford Nanopore `POD5` sequen ### Model Type Selection -Users can choose between automatic or manual model selection using a configurable use_auto_model flag: +Users can configure the basecalling model by setting the dorado_model input parameter: -Automatic Model Selection: Automatically picks the best model ('sup', 'hac', or 'fast') based on the input file and user-defined model accuracy paramater. - -Manual Model Input: If the user disables automatic selection, a specific model path or model version must be provided. +- Default Model: "sup" (super accuracy) is used unless overridden by the user. +- Manual Model Input: Users can specify the full path or name of a specific model (e.g., dna_r10.4.1_e8.2_400bps_hac@v4.2.0). +- Automatic Detection: When set to sup, hac, or fast, Dorado will automatically select the appropriate model version if available. - **Model Type (sup):** (super accuracy) The most accurate model, recommended for critical applications requiring the highest basecall accuracy. It is the slowest of the three model types. - **Model Type (hac):** (High Accuracy) A balance between speed and accuracy, recommended for most users. Provides accurate results faster than `sup` but less accurate than `sup`. @@ -42,10 +42,8 @@ Manual Model Input: If the user disables automatic selection, a specific model p | **Task** | **Variable** | **Type** | **Description** | **Default Value** | **Required** | |---|---|---|---|---|---| | Basecalling | **input_files** | Array[File] | Array of `POD5` files for basecalling | None | Yes | -| Basecalling | **use_auto_model** | Boolean | Use automatic model selection (`sup`, `hac`, or `fast` based on model accuracy)| true | No | -| Basecalling | **model_accuracy** | String | Desired model accuracy (`sup`, `hac`, `fast`) if using automatic selection | sup | No | +| Basecalling | **dorado_model** | Boolean | Model accuracy or full model name (default: 'sup')| sup | No | | Basecalling | **fastq_file_name** | String | Prefix for naming output FASTQ files | None | Yes | -| Basecalling | **dorado_model** | String | Model type (e.g., `dna_r10.4.1_e8.2_260bps_sup@v3.5.2`) if manual input | None | Yes | | Basecalling | **kit_name** | String | Sequencing kit name used (e.g., `SQK-RPB114-24`). | None | Yes | | Basecalling | **cpu** | Int | Number of CPUs allocated | 8 | No | | Basecalling | **memory** | String | Amount of memory to allocate | 32GB | No | @@ -59,6 +57,8 @@ Manual Model Input: If the user disables automatic selection, a specific model p --- ### Detailed Input Information + +- **dorado_model**: If set to 'sup', 'hac', or 'fast', the workflow will run with automatic model selection. If a full model name is provided, Dorado will use that model directly. - **fastq_file_name**: This will serve as a prefix for the output FASTQ files. For example, if you provide `project001`, the resulting files will be named `project001_barcodeXX.fastq.gz`. - **kit_name**: Ensure the correct kit name is provided, as it determines the barcoding and adapter trimming behavior. - **fastq_upload_path**: This is the folder path in Terra where the final FASTQ files will be transferred for further analysis. Ensure the path matches your Terra workspace configuration. diff --git a/tasks/basecalling/task_dorado_basecall.wdl b/tasks/basecalling/task_dorado_basecall.wdl index 14b8547c2..ea1727703 100644 --- a/tasks/basecalling/task_dorado_basecall.wdl +++ b/tasks/basecalling/task_dorado_basecall.wdl @@ -5,6 +5,9 @@ task basecall { File input_file # Single POD5 file for scatter processing String dorado_model = "sup" # Default model to 'sup', can be overridden with full model name see docs String kit_name # Sequencing kit name + Int disk_size = 100 + Int memory = 32 + Int cpu = 8 String docker = "us-docker.pkg.dev/general-theiagen/staphb/dorado:0.8.0" } @@ -41,8 +44,10 @@ task basecall { runtime { docker: docker - cpu: 8 - memory: "32GB" + cpu: cpu + memory: "~{memory} GB" + disks: "local-disk " + disk_size + " SSD" + disk: disk_size + " GB" gpuCount: 1 gpuType: "nvidia-tesla-t4" } } diff --git a/tasks/basecalling/task_dorado_demux.wdl b/tasks/basecalling/task_dorado_demux.wdl index c108706c3..a37ea0883 100644 --- a/tasks/basecalling/task_dorado_demux.wdl +++ b/tasks/basecalling/task_dorado_demux.wdl @@ -5,6 +5,9 @@ task dorado_demux { Array[File] bam_files String kit_name String fastq_file_name + Int cpu = 4 + Int memory = 16 + Int disk_size = 50 String docker = "us-docker.pkg.dev/general-theiagen/staphb/dorado:0.8.0" } @@ -83,10 +86,10 @@ task dorado_demux { Array[File] fastq_files = glob("~{fastq_file_name}_*.fastq.gz") } - runtime { + runtime { docker: docker - cpu: 4 - memory: "16GB" - maxRetries: 0 + cpu: cpu + memory: "~{memory} GB" + disks: "local-disk ~{disk_size} SSD" } } \ No newline at end of file diff --git a/tasks/basecalling/task_samtools_convert.wdl b/tasks/basecalling/task_samtools_convert.wdl index a111b5cd0..69d4bfd96 100644 --- a/tasks/basecalling/task_samtools_convert.wdl +++ b/tasks/basecalling/task_samtools_convert.wdl @@ -3,6 +3,9 @@ version 1.0 task samtools_convert { input { Array[File] sam_files + Int cpu = 4 + Int memory = 16 + Int disk_size = 50 String docker = "us-docker.pkg.dev/general-theiagen/staphb/samtools:1.15" } @@ -30,10 +33,10 @@ task samtools_convert { Array[File] bam_files = glob("output/bam/*.bam") } - runtime { + runtime { docker: docker - cpu: 4 - memory: "16GB" - maxRetries: 0 + cpu: cpu + memory: "~{memory} GB" + disks: "local-disk ~{disk_size} HDD" } } diff --git a/workflows/utilities/wf_dorado_basecalling.wdl b/workflows/utilities/wf_dorado_basecalling.wdl index fde60de0a..cd6915e19 100644 --- a/workflows/utilities/wf_dorado_basecalling.wdl +++ b/workflows/utilities/wf_dorado_basecalling.wdl @@ -13,7 +13,7 @@ workflow dorado_basecalling_workflow { input { Array[File] input_files - String dorado_model = "sup" # Default to sup model, user can override with a full model name + String dorado_model = "sup" String kit_name String new_table_name String fastq_upload_path @@ -23,6 +23,9 @@ workflow dorado_basecalling_workflow { String terra_project String terra_workspace String fastq_file_name + Int cpu = 8 + Int memory = 32 + Int disk_size = 100 } scatter (file in input_files) { @@ -30,7 +33,10 @@ workflow dorado_basecalling_workflow { input: input_file = file, dorado_model = dorado_model, - kit_name = kit_name + kit_name = kit_name, + cpu = cpu, + memory = memory, + disk_size = disk_size } }