Skip to content

Commit

Permalink
docs update and runtime params
Browse files Browse the repository at this point in the history
  • Loading branch information
fraser-combe committed Oct 30, 2024
1 parent a4feae6 commit 6a40a37
Show file tree
Hide file tree
Showing 5 changed files with 36 additions and 19 deletions.
14 changes: 7 additions & 7 deletions docs/workflows/standalone/dorado_basecalling.md
Original file line number Diff line number Diff line change
Expand Up @@ -12,11 +12,11 @@ The Dorado Basecalling workflow is used to convert Oxford Nanopore `POD5` sequen

### Model Type Selection

Users can choose between automatic or manual model selection using a configurable use_auto_model flag:
Users can configure the basecalling model by setting the dorado_model input parameter:

Automatic Model Selection: Automatically picks the best model ('sup', 'hac', or 'fast') based on the input file and user-defined model accuracy paramater.

Manual Model Input: If the user disables automatic selection, a specific model path or model version must be provided.
- Default Model: "sup" (super accuracy) is used unless overridden by the user.
- Manual Model Input: Users can specify the full path or name of a specific model (e.g., [email protected]).
- Automatic Detection: When set to sup, hac, or fast, Dorado will automatically select the appropriate model version if available.

- **Model Type (sup):** (super accuracy) The most accurate model, recommended for critical applications requiring the highest basecall accuracy. It is the slowest of the three model types.
- **Model Type (hac):** (High Accuracy) A balance between speed and accuracy, recommended for most users. Provides accurate results faster than `sup` but less accurate than `sup`.
Expand All @@ -42,10 +42,8 @@ Manual Model Input: If the user disables automatic selection, a specific model p
| **Task** | **Variable** | **Type** | **Description** | **Default Value** | **Required** |
|---|---|---|---|---|---|
| Basecalling | **input_files** | Array[File] | Array of `POD5` files for basecalling | None | Yes |
| Basecalling | **use_auto_model** | Boolean | Use automatic model selection (`sup`, `hac`, or `fast` based on model accuracy)| true | No |
| Basecalling | **model_accuracy** | String | Desired model accuracy (`sup`, `hac`, `fast`) if using automatic selection | sup | No |
| Basecalling | **dorado_model** | Boolean | Model accuracy or full model name (default: 'sup')| sup | No |
| Basecalling | **fastq_file_name** | String | Prefix for naming output FASTQ files | None | Yes |
| Basecalling | **dorado_model** | String | Model type (e.g., `[email protected]`) if manual input | None | Yes |
| Basecalling | **kit_name** | String | Sequencing kit name used (e.g., `SQK-RPB114-24`). | None | Yes |
| Basecalling | **cpu** | Int | Number of CPUs allocated | 8 | No |
| Basecalling | **memory** | String | Amount of memory to allocate | 32GB | No |
Expand All @@ -59,6 +57,8 @@ Manual Model Input: If the user disables automatic selection, a specific model p
---

### Detailed Input Information

- **dorado_model**: If set to 'sup', 'hac', or 'fast', the workflow will run with automatic model selection. If a full model name is provided, Dorado will use that model directly.
- **fastq_file_name**: This will serve as a prefix for the output FASTQ files. For example, if you provide `project001`, the resulting files will be named `project001_barcodeXX.fastq.gz`.
- **kit_name**: Ensure the correct kit name is provided, as it determines the barcoding and adapter trimming behavior.
- **fastq_upload_path**: This is the folder path in Terra where the final FASTQ files will be transferred for further analysis. Ensure the path matches your Terra workspace configuration.
Expand Down
9 changes: 7 additions & 2 deletions tasks/basecalling/task_dorado_basecall.wdl
Original file line number Diff line number Diff line change
Expand Up @@ -5,6 +5,9 @@ task basecall {
File input_file # Single POD5 file for scatter processing
String dorado_model = "sup" # Default model to 'sup', can be overridden with full model name see docs
String kit_name # Sequencing kit name
Int disk_size = 100
Int memory = 32
Int cpu = 8
String docker = "us-docker.pkg.dev/general-theiagen/staphb/dorado:0.8.0"
}

Expand Down Expand Up @@ -41,8 +44,10 @@ task basecall {

runtime {
docker: docker
cpu: 8
memory: "32GB"
cpu: cpu
memory: "~{memory} GB"
disks: "local-disk " + disk_size + " SSD"
disk: disk_size + " GB"
gpuCount: 1
gpuType: "nvidia-tesla-t4" }
}
11 changes: 7 additions & 4 deletions tasks/basecalling/task_dorado_demux.wdl
Original file line number Diff line number Diff line change
Expand Up @@ -5,6 +5,9 @@ task dorado_demux {
Array[File] bam_files
String kit_name
String fastq_file_name
Int cpu = 4
Int memory = 16
Int disk_size = 50
String docker = "us-docker.pkg.dev/general-theiagen/staphb/dorado:0.8.0"
}

Expand Down Expand Up @@ -83,10 +86,10 @@ task dorado_demux {
Array[File] fastq_files = glob("~{fastq_file_name}_*.fastq.gz")
}

runtime {
runtime {
docker: docker
cpu: 4
memory: "16GB"
maxRetries: 0
cpu: cpu
memory: "~{memory} GB"
disks: "local-disk ~{disk_size} SSD"
}
}
11 changes: 7 additions & 4 deletions tasks/basecalling/task_samtools_convert.wdl
Original file line number Diff line number Diff line change
Expand Up @@ -3,6 +3,9 @@ version 1.0
task samtools_convert {
input {
Array[File] sam_files
Int cpu = 4
Int memory = 16
Int disk_size = 50
String docker = "us-docker.pkg.dev/general-theiagen/staphb/samtools:1.15"
}

Expand Down Expand Up @@ -30,10 +33,10 @@ task samtools_convert {
Array[File] bam_files = glob("output/bam/*.bam")
}

runtime {
runtime {
docker: docker
cpu: 4
memory: "16GB"
maxRetries: 0
cpu: cpu
memory: "~{memory} GB"
disks: "local-disk ~{disk_size} HDD"
}
}
10 changes: 8 additions & 2 deletions workflows/utilities/wf_dorado_basecalling.wdl
Original file line number Diff line number Diff line change
Expand Up @@ -13,7 +13,7 @@ workflow dorado_basecalling_workflow {

input {
Array[File] input_files
String dorado_model = "sup" # Default to sup model, user can override with a full model name
String dorado_model = "sup"
String kit_name
String new_table_name
String fastq_upload_path
Expand All @@ -23,14 +23,20 @@ workflow dorado_basecalling_workflow {
String terra_project
String terra_workspace
String fastq_file_name
Int cpu = 8
Int memory = 32
Int disk_size = 100
}

scatter (file in input_files) {
call basecall_task.basecall as basecall_step {
input:
input_file = file,
dorado_model = dorado_model,
kit_name = kit_name
kit_name = kit_name,
cpu = cpu,
memory = memory,
disk_size = disk_size
}
}

Expand Down

0 comments on commit 6a40a37

Please sign in to comment.