-
Notifications
You must be signed in to change notification settings - Fork 29
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Document how to create a Workflow Run Crate file #148
Comments
from rocrate.model.contextentity import ContextEntity
...
action = crate.add(ContextEntity(crate, properties={
"@type": "CreateAction",
"name": "Execution of foo.cwl",
}))
workflow = crate.add_workflow(...)
action["instrument"] = workflow
...
crate.root_dataset["mentions"] = [action] Take a look at the runcrate code, it generates a very detailed workflow run ro-crate so it has many examples. |
Will do. Thank you @simleo ! |
@rsirvent has kindly shared his most up to date code, and I noticed he was manually writing the WRROC I implemented a similar approach in the Autosubmit merge request, and after that got the (venv) (autosubmit4) kinow@ranma:~/Development/python/workspace/runcrate/tools/consume_crate$ python consume_crate.py ~/autosubmit/a000/rocrate.zip
action #d88221a0-ede7-4dad-a478-618d9f53c88e
instrument: workflow.yml (['File', 'SoftwareSourceCode', 'ComputationalWorkflow'])
started: 2023-02-17T15:42:31
ended: 2023-02-17T15:43:45
inputs:
outputs: I think I can see the log files in my JSON metadata, as well as the workflow graph plot PDF file. So I think I am done with this initial version of RO-Crate support for Autosubmit, compliant with Workflow Run Crate. Since Autosubmit doesn't track the tools executed by the workflow tasks (i.e. we just execute a shell script that may execute one or more executables) I think I won't implement the Provenance Run Crate profile (@simleo you asked that in the last meeting, I believe). As far as I can tell, I will finish writing tests and documentation for RO-Crate in Autosubmit, and then start writing the text for the RO-Crate paper 🤓 🥳 Thanks for the help! |
Looking at https://autosubmit.readthedocs.io/en/master/userguide/defining_workflows/index.html, it seems that the workflow refers to shell scripts for the various steps, and from what you've said I guess these scripts can contain arbitrary code, and they are the ones that actually know about input and output files. So the WMS does not know about inputs and outputs, and therefore cannot copy them to the RO-Crate. This means that the RO-Crate would not be much informative except for the log files, workflow diagram and timestamps. Moreover, the absence of I guess the inclusion of inputs and outputs could be made to work through some sort of convention. For instance, suppose that users write their scripts so that all input files (and directories) are taken from under a top-level |
Hi @simleo
Correct.
Correct too. I believe this is not exclusive to Autosubmit. With ecFlow or Cylc, you would start a workflow and the tasks could access a NFS partition to fetch data, or maybe a remote service like ECMWF Mars, NOAA, some FTP server, etc. The output may be stored locally, or the workflow may not produce anything (e.g. call a web service posting some data, i.e. not stored locally).
Exactly.
Yes. I hadn't thought that far, and it sounds wrong to me too now.
That's interesting. I will think about it, and talk with other engineers that work on Autosubmit to check if they have other ideas too. Thanks! |
I've updated the AS merge request with a check box to implement the inputs and outputs. I will use a link of globs in the Autosubmit config file - inputs:
- 'proj/PROJECT_FOLDER/inputs/namelist1.nml'
- 'proj/PROJECT_FOLDER/inputs/**/*.xml'
- ...
- outputs:
- '/scratch/project_12345/MODEL/ABC/SV/1/200101*\.*.nc'
- ... Maybe I will replace the simple string by an object/map to allow users to choose the mime-type or add more info if needed (can't recall what goes in the inputs/outputs schema). But this convention suggested by @simleo looks like the simplest solution for WMS's that do not have the feature to track inputs and outputs, like Cylc or ecFlow too. Probably worth adding it to the RO-Crate site/docs, and maybe to I also pinged my group leader to ask about a public workflow to use a test and upload to Zenodo/WorkflowHub.eu/etc. for testing it 👍 |
@simleo , I started working on the inputs and outputs today, but got stuck working on the inputs & outputs. Could you confirm if I have to use the Looking at PyCOMPSS, there are three workflows. But none displays inputs nor outputs in the WorkflowHub.eu UI. I opened the one that was most recently updated, with version 2 created 23rd Jan 2023. It looks like that workflow actually has inputs attached to the Thanks! |
AFAIK, WorkflowHub does not read workflow inputs and outputs from the RO-Crate, but only from the workflow file, for languages it knows (e.g., CWL). Note that many workflows are not uploaded to WorkflowHub as RO-Crates at all (but you can download them as RO-Crates because WorkflowHub generates one for you), so in many cases there would be no RO-Crate to read anyway. The COMPSs workflow you're looking at was uploaded as an RO-Crate, but it lists actual files in the workflow's So you can also avoid listing formal parameters altogether, and only list actual files in the action's |
I will have to read a bit more about formal parameters and take another look at workflowhub/compss/more ro-crate files. But I think I got the right direction to follow here. Thank you @simleo ! |
Hi @simleo Spent some time reading about FAIRDom, Seek, WorkflowHub, BioSchemas, and the formal parameter. It was quite a journey reviewing terms and how things are connected.
I think you are right. I can see Seek seems to have some code executed only for CWL
Seems like CWL is the workflow class that enables more features in Seek/WorkflowHub.
Noted, I wasn't aware of that. Thanks!
I think I understand this part now.
I believe I will have to do just that. As in the COMPSs case, Autosubmit does not have enough information to create Formal Parameters as defined in the BioSchemas spec — and even if I look at the Autosubmit runtime/saved data, all I can get are probably file and parameter names, without content type, default values, and most of what's available for Formal Parameters. I think it wouldn't make much sense to use that with Autosubmit.
👍
Definitely. If one day CWL has better support for the kind of workflows produced with Autosubmit/Cylc/ecFlow I would then investigate integrating CWL into Autosubmit .Then the workflow class used for a tool such as WorkflowHub would be, I think, either Autosubmit but handled in the WorkflowHub as CWL (I think it does that for Galaxy, more or less, but there's a galaxy2cwl that's used, I think), or CWL directly. But that's a little further in the future, I think. I'm trying to implement the Workflow Run Crate profile, and I think the object&result of the workflow for inputs/outputs are fine - https://www.researchobject.org/workflow-run-crate/requirements & ResearchObject/workflow-run-crate#16. So I will start looking at how to add the inputs and outputs in Autosubmit, looking at COMPSs for reference. My initial idea is to use the same JSON file I am using for the "patch" applied to the Autosubmit configuration, but with something like {
"@graph": { ... patch goes here, with license, author, etc },
"inputs": [
{ "name": "model_input/abc.nc", "encodingFormat": "application/netcdf", "valueRequired": true, "description": "Input file for the grid data..."},
{ "glob": "extra_input/**/*.tmp", "encodingFormat": "application/text", "valueRequired": false, "description": "Auxiliary, optional files"},
...
],
"outputs": [ ... ] The first entry of The second entry has a format that would be useful for inputs & outputs of workflow managers that do not have formal parameters, allowing the WMS to iterate the I think that way I will have everything needed to create a basic workflow run crate, with inputs & outputs in a similar way as implemented in COMPSs 🤞 Cheers |
Note that |
Hi!
I am creating the Autosubmit RO-Crate using
ro-crate-py
, using COMPSs as reference. It callsadd_workflow
,add(Person)
, and other functions that appear in thero-crate-py
README
. However, inspecting the JSON file I see no mention ofCreateAction
.I noticed this after I tried to validate the
rocrate.zip
file created with Autosubmit usingruncrate
.Is there an easy way to use
ro-crate-py
to produce an RO-Crate that conforms to Workflow Run Crate profile, and that can be validated with theconsume_crate
script? cc @simleoThanks!
Bruno
The text was updated successfully, but these errors were encountered: