-
Notifications
You must be signed in to change notification settings - Fork 713
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Added samshee #6749
Added samshee #6749
Conversation
modules/nf-core/samshee/main.nf
Outdated
|
||
input: | ||
tuple val(meta), path(samplesheet) | ||
val(validator_schema) //optional |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's tricky to fully support samshee's range of inputs here.
Samshee can accept either
- a schema name e.g.
urn:samshee:illuminav2/v1
- a json string, e.g.
{"required": ["BCLConvert_Data"]}
- a URI pointing to a local file, e.g.
{"$ref": "file:./test.schema.json"}
- a URI pointing to a remote file, e.g.
'{"$ref": "https://dataportal.lit.eu/schemas/litngscoresamplesheet/v0.1/litngscoresamplesheet.schema.json"}'
plus it can accept multiple schemas. Especially when referencing a local file, value channels are not ideal, as when working with a cloud-based solution, the file needs to be provisioned by nextflow. And even when it's a remote URL, it might be preferable to have the download handled by nextflow.
Maybe it could work to have two input channels, one accepting value, and one accepting files, such that a call to the samshee module looks like
SAMSHEE(
[['id': 'samplesheet01'], path("samplesheet.csv")],
['{"required": ["Data"]}', '{"$ref": "urn:samshee:illuminav2/v1"}'],
[path("file:./test.schema.json"), path('https://dataportal.lit.eu/schemas/litngscoresamplesheet/v0.1/litngscoresamplesheet.schema.json')]
)
WDYT?
)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @grst I think this makes sense, I'll work on it
Here's a testcase. Note that this is a V1 samplesheet, see also my comment here: nf-core/demultiplex#268 (comment)
|
@grst Some updates on this: Regarding having a different channel for local and remote json files, it shouldn't be needed since, as long as we stage the remote file, it will always be local |
@grst I added the additional features you requested, let me know what you think! @atrigila Also implemented your suggestions, I'm waiting on the bioconda package https://github.com/bioconda/bioconda-recipes/pull/51315/checks?check_run_id=31375653510 |
modules/nf-core/samshee/main.nf
Outdated
def arg_json_schema_validator = json_schema_validator ? "--schema '${json_schema_validator}'" : "" | ||
def arg_name_schema_validator = name_schema_validator ? "--schema '${name_schema_validator}'" : "" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we could either merge these two into one channel, then the user would need to specify '{"$ref": "urn:samshee:illuminav2/v1"}
or I'd ajust the second line as follows:
def arg_json_schema_validator = json_schema_validator ? "--schema '${json_schema_validator}'" : "" | |
def arg_name_schema_validator = name_schema_validator ? "--schema '${name_schema_validator}'" : "" | |
def arg_json_schema_validator = json_schema_validator ? "--schema '${json_schema_validator}'" : "" | |
def arg_name_schema_validator = name_schema_validator ? "--schema '{\"\$ref\": \"${name_schema_validator}\"'" : "" |
I have a slight preference for the single channel version as it keeps the module simpler.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree with single channel version. I would suggest (untested):
def arg_schema = json_schema_validator ? "--schema '${json_schema_validator}'" : (task.ext.args?.contains("--schema") ? "" : "")
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@grst I also thought of having the user just specify just the name, but I thought it could add a layer of confusion. I did it this way so that they can just use exactly the same strings that are displayed in the samshee readme. Doing it this way we would need to add an explanation saying that they need to extract the string from the readme that comes after the "$ref:", do you think it would be easier for them to understand?
I went with the approach of not having the channels combined into a tuple to make it easier to avoid issues regarding optional inputs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fair enough, let's go with a single channel for schemas specified as string then (+ the second one for path
s).
modules/nf-core/samshee/main.nf
Outdated
def arg_json_schema_validator = json_schema_validator ? "--schema '${json_schema_validator}'" : "" | ||
def arg_name_schema_validator = name_schema_validator ? "--schema '${name_schema_validator}'" : "" | ||
def arg_file_schema_validator = file_schema_validator ? "--schema '{\"\$ref\": \"file:${file_schema_validator}\"}'" : "" | ||
def arg_v1_schema = params.v1_schema ? "--output-format sectioned" : "" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think a module should ever (?) access a global param
. So this should either be task.ext.v1_schema
, or just rely on the user to provide the "sectioned" flag via task.ext.args
when necessary.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree. I would supply it via task.ext.args
modules/nf-core/samshee/main.nf
Outdated
container "${ workflow.containerEngine == 'singularity' && !task.ext.singularity_pull_docker_container ? | ||
'docker://community.wave.seqera.io/library/pip_samshee:733e11f3377fc2e3' : | ||
'community.wave.seqera.io/library/pip_samshee:733e11f3377fc2e3' }" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here is a comment with the latest guidelines on incorporating seqera containers to modules:
https://nfcore.slack.com/archives/CJRH30T6V/p1728901074405529
Check the container settings for singularity (steps 8-11), I think they require https.
modules/nf-core/samshee/main.nf
Outdated
val(json_schema_validator) // optional | ||
val(name_schema_validator) // optional |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All non-mandatory command-line tool non-file arguments MUST be provided as a string via the $task.ext.args variable. (https://nf-co.re/docs/guidelines/components/modules#optional-command-arguments)
modules/nf-core/samshee/main.nf
Outdated
def arg_json_schema_validator = json_schema_validator ? "--schema '${json_schema_validator}'" : "" | ||
def arg_name_schema_validator = name_schema_validator ? "--schema '${name_schema_validator}'" : "" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree with single channel version. I would suggest (untested):
def arg_schema = json_schema_validator ? "--schema '${json_schema_validator}'" : (task.ext.args?.contains("--schema") ? "" : "")
modules/nf-core/samshee/main.nf
Outdated
def arg_json_schema_validator = json_schema_validator ? "--schema '${json_schema_validator}'" : "" | ||
def arg_name_schema_validator = name_schema_validator ? "--schema '${name_schema_validator}'" : "" | ||
def arg_file_schema_validator = file_schema_validator ? "--schema '{\"\$ref\": \"file:${file_schema_validator}\"}'" : "" | ||
def arg_v1_schema = params.v1_schema ? "--output-format sectioned" : "" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree. I would supply it via task.ext.args
modules/nf-core/samshee/main.nf
Outdated
$arg_json_schema_validator \ | ||
$arg_name_schema_validator \ | ||
$arg_file_schema_validator \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Replace for a single arg_schema
that will be populated depending on whether there is an input json file OR the ext.args contain --schema
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think I follow, so users can use many of arg_json_schema_validator
and arg_name_schema_validator
and arg_file_schema_validator
. Aren't they mutually exclusive?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Aren't they mutually exclusive?
No, they are not. The user could for instance specify the '{"$ref": "urn:samshee:illuminav2/v1"}
schema, and additionally validate against a custom schema provided as json file.
@grst @atrigila We will be passing all string values through the ext.args, so there is no longer a need to combine the channels. I also removed the custom ext.args since this is also not good practices, and we can organize all of these parameters through the modules.config. Note: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think file schemas are currently not respected, see my code suggestion.
LGTM otherwise!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good job! Thanks for addressing our points :)
- pip: # FIXME https://github.com/nf-core/modules/issues/5814 | ||
- samshee==0.2.1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ask in slack what we can do with this as we will need a core team member to approve this PR
description: "illumina v2 samplesheet" | ||
pattern: "*.{csv}" | ||
- - file_schema_validator: | ||
type: string |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
isn't type: file
here as well?
e.g. [ id:'test', lane:1 ] | ||
- "*_formatted.csv": | ||
type: file | ||
description: "illumina v2 samplesheet" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No matter the input (v1 and v2) the output is always going to be a v2 samplesheet?
params { | ||
v1_schema = true | ||
json_schema_validator = '{"required": ["Data"]}' | ||
name_schema_validator = null | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I normally pass these via ext.args
and using different configs, but this works anyway 👍 .
e.g. [ id:'test', lane:1 ] | ||
- samplesheet: | ||
type: file | ||
description: "illumina v2 samplesheet" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
only v2 samplesheet? or could it be v1 as well?
Addressing nf-core/demultiplex#268
PR checklist
Closes #XXX
versions.yml
file.label
nf-core modules test <MODULE> --profile docker
nf-core modules test <MODULE> --profile singularity
nf-core modules test <MODULE> --profile conda
nf-core subworkflows test <SUBWORKFLOW> --profile docker
nf-core subworkflows test <SUBWORKFLOW> --profile singularity
nf-core subworkflows test <SUBWORKFLOW> --profile conda