Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add dorado #256

Closed
wants to merge 24 commits into from
Closed

add dorado #256

wants to merge 24 commits into from

Conversation

yuukiiwa
Copy link
Contributor

@yuukiiwa yuukiiwa commented Oct 17, 2023

Currently, dorado v0.3.2 is incorporated into through its docker container. It works without demultiplexing, but doesn't work with demultiplexing with qcat downstream

I raised an issue on dorado's github repo requesting for the dorado v0.4.0 to be dockerized (here is the issue). Will not incorporate basecalling and demultiplexing until the dorado v0.4.0 is available on docker hub.

I made some changes to basecalling without demultiplexing, where the user can specify the input fast5 directory from the samplesheet for each sample. If a user has multiple sample, then he/she will have to indicate the respective input fast5 directory for those samples.

Here is the run on my machine:
Screenshot from 2023-10-18 00-09-04

@nf-core nf-core deleted a comment from github-actions bot Oct 17, 2023
@github-actions
Copy link

github-actions bot commented Oct 17, 2023

nf-core lint overall result: Failed ❌

Posted for pipeline commit 5e61982

+| ✅ 215 tests passed       |+
!| ❗   6 tests had warnings |!
-| ❌  20 tests failed       |-

❌ Test failures:

  • files_exist - File must be removed: lib/NfcoreTemplate.groovy
  • files_exist - File must be removed: lib/Utils.groovy
  • files_exist - File must be removed: lib/WorkflowMain.groovy
  • files_exist - File must be removed: lib/WorkflowNanoseq.groovy
  • nextflow_config - Config default value incorrect: params.kit is set as `` in nextflow_schema.json but is `null` in `nextflow.config`.
  • nextflow_config - Config default value incorrect: params.flowcell is set as `` in nextflow_schema.json but is `null` in `nextflow.config`.
  • nextflow_config - Config default value incorrect: params.dorado_model is set as `` in nextflow_schema.json but is `null` in `nextflow.config`.
  • nextflow_config - Config default value incorrect: params.jaffal_ref_dir is set as for_jaffal in nextflow_schema.json but is null in nextflow.config.
  • nextflow_config - Config default value incorrect: params.tracedir is set as ${params.outdir}/pipeline_info in nextflow_schema.json but is ./results/pipeline_info in nextflow.config.
  • files_unchanged - .github/CONTRIBUTING.md does not match the template
  • files_unchanged - .github/PULL_REQUEST_TEMPLATE.md does not match the template
  • files_unchanged - .github/workflows/branch.yml does not match the template
  • files_unchanged - .github/workflows/linting_comment.yml does not match the template
  • files_unchanged - .github/workflows/linting.yml does not match the template
  • files_unchanged - assets/email_template.html does not match the template
  • files_unchanged - assets/email_template.txt does not match the template
  • files_unchanged - assets/nf-core-nanoseq_logo_light.png does not match the template
  • files_unchanged - docs/images/nf-core-nanoseq_logo_light.png does not match the template
  • files_unchanged - docs/images/nf-core-nanoseq_logo_dark.png does not match the template
  • modules_config - conf/modules.config contains withName:STRINGTIE2, but the corresponding process is not present in any of the Nextflow scripts.

❗ Test warnings:

  • nextflow_config - Config manifest.version should end in dev: 3.1.0
  • pipeline_todos - TODO string in README.md: Describe the minimum required steps to execute the pipeline, e.g. how to prepare samplesheets.
  • pipeline_todos - TODO string in methods_description_template.yml: #Update the HTML below to your prefered methods description, e.g. add publication citation for this pipeline
  • pipeline_todos - TODO string in awsfulltest.yml: You can customise AWS full pipeline tests as required
  • pipeline_todos - TODO string in base.config: Customise requirements for specific processes.
  • nfcore_yml - nf-core version not set in .nf-core.yml

✅ Tests passed:

Run details

  • nf-core/tools version 2.14.1
  • Run at 2024-08-26 05:37:20

Copy link
Member

@jfy133 jfy133 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lots of 'unusual' things to what I normally see in nf-core pipelines, but I don't see anything necessarily breaking, except for the licensing issue.

Main thing though you have quite a few modules that appear to only work with conda (not docker), but there is no documentation on this, I think it would be very important to add lots of warnings/checks in the code and also usage documentation warning users that conda won't be possible in many cases

Comment on lines +29 to +30
trim_barcodes=true
output_demultiplex_fast5 = true
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Harshil Align™️!

tag "$meta.id"
label 'process_medium'

container "docker.io/ontresearch/dorado"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just to double check, it is OK to use this license wise?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And would this work with singularity stilll?

Comment on lines +18 to +19
dorado download --model $dorado_model
dorado basecaller $dorado_model $pod5_path --device $dorado_device --emit-fastq > basecall.fastq
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are there any options a user could theoretically add? Missing ext.args, for example.

dorado: \$(echo \$(dorado --version 2>&1) | sed -r 's/.{81}//')
END_VERSIONS

gzip basecall.fastq
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should probably go before the emissions, and should the file be forced to be basecall.fastq for downstream purposes? Otherwise Iw ould recommend using the ${prefix}.fastq system

label 'process_medium'

conda "conda-forge::r-base=4.0.3 bioconda::bioconductor-bambu=3.0.8 bioconda::bioconductor-bsgenome=1.66.0"
container "docker.io/yuukiiwa/pod5:0.2.4"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same above

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this could be a biocontainer

modules/local/get_test_data.nf Show resolved Hide resolved
nextflow_schema.json Outdated Show resolved Hide resolved
Comment on lines +67 to +71
if (workflow.profile.contains('test')){
ch_input_path = params.input_path
} else {
ch_input_path = Channel.fromPath(params.input_path, checkIfExists: true)
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't really understand this, there is no difference in the way the channel gets taken right?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These are different.

For the test, I need to get the stage fast5 directory (with many fast5 files) from nf-core/test-dataset, so the input_path is not local, so checkInExist doesn't work

For the user input, there's no staging of the fast5 directory, so checking whether those exist is needed.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah sorry, misread input_path as just input 😅 . I find the testdata set up unusual which is also why tripped me up, but I don't think this is relevant for this PR (also given you've been waiting such a long time)

workflows/nanoseq.nf Outdated Show resolved Hide resolved
"dorado_device": {
"type": "string",
"default": "cuda:all",
"description": "Device specified using '--device'.",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is dorado a particular model of nanopore or something? What is a dorado device?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is for specifying what kind of compute one wants to use: cuda:all for all GPUs or cuda:0 for a specific GPU or CPU

@yuukiiwa yuukiiwa closed this Sep 17, 2024
@yuukiiwa yuukiiwa deleted the add_dorado branch September 17, 2024 11:16
@maxulysse
Copy link
Member

replaced by #277

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants