Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add methods for adding and updating JSON-LD directly (partials for WMS) #149

Merged
merged 14 commits into from
Mar 31, 2023

Conversation

kinow
Copy link
Member

@kinow kinow commented Mar 27, 2023

Closes #146

@simleo I was toying with the code you suggest a little in the Autosubmit code base. But then I thought that if I wrote those functions in Autosubmit, other WMS would still require to do the same.

It occurred to me, then, maybe add these new methods to ro-crate-py instead? I made this draft pull request after testing it with Autosubmit. You can see the current implementation here: https://earth.bsc.es/gitlab/es/autosubmit/-/merge_requests/317/diffs

Most of the code in autosubmit.py in that merge request is retrieving metadata about the Autosubmit workflow configuration and execution logs and data. But what previously was

        yaml_file = Path(experiment_path, 'conf/rocrate.yml')
        try:
            with open(yaml_file) as f:
                try:
                    yaml_content = YAMLParserFactory().create_parser().load(f)
                except Exception as e:
...
...
# Create RO-Crate and add Autosubmit data
...
...
        # Fill-in the pre-populated data
        crate.license = yaml_content['license']
        # These two sets keep track of the authors and orgs to add to the CRATE
        authors = set()
        organizations = set()
        for author in yaml_content['authors']:
            authors.add(author['orcid'])
            organizations.add(author['ror'])
            crate.add(Person(crate, author['orcid'], {
                'name': author['name'],
                'contactPoint': { '@id': f'mailto: {author["email"]}' },
                'affiliation': { '@id': f'mailto: {author["ror"]}' }
            }))
            crate.add(ContextEntity(crate, f'mailto: {author["email"]}', {
                '@type': 'ContactPoint',
                'contactType': 'Author',
                'email': author['email'],
                'identifier': author['email'],
                'url': author['orcid'],
            }))
            crate.add(ContextEntity(crate, author['ror'], {
                '@type': 'Organization',
                'name': author['organisation_name']
            }))
        crate.creator = { '@id': id for id in authors }
        crate.publisher = { '@id': id for id in organizations }

Now became, basically

        json_file = Path(experiment_path, 'conf/rocrate.json')
        try:
            with open(json_file, 'r') as f:
                try:
                    json_content = json.load(f)
                except Exception as e:
                    raise AutosubmitCritical(f'Failed to parse $expid/conf/rocrate.json pre-populated file {json_file}', 7011, e)
        except IOError:
...
...
# Create RO-Crate and add Autosubmit data
# because ro-crate-py's add replaces data.
...
...
            if '@graph' in json_content:
                for jsonld_node in json_content['@graph']:
                    crate.add_or_update_jsonld(jsonld_node)

            # Write RO-Crate ZIP.
            crate.write_zip(Path(experiment_path, "rocrate.zip"))

I liked the general structure, and I think it could be applied to other WMS that preferred to go JSON-LD → ro-crate-py → JSON-LD, instead of what I had in the merge request before, YAML → Python → ro-crate-py → JSON-LD (where YAML → Python could possibly vary between WMS's).

If it sounds like a good idea I will add docs & tests 👍 Thank you for the code example.

Also added another commit with comments & f-strings for code that I was having a look, and trying to understand before using it. Let me know if you prefer that I drop that commit and open a separate pull request for it.

Cheers
Bruno

@kinow kinow requested a review from simleo March 27, 2023 21:50
rocrate/rocrate.py Outdated Show resolved Hide resolved
rocrate/rocrate.py Outdated Show resolved Hide resolved
@simleo
Copy link
Collaborator

simleo commented Mar 28, 2023

I've added a commit with some changes:

  • Removed type hints to avoid confusion, they're not used in the library
  • Removed the docstring for the ROCrate class -- some things there actually change depending on whether the crate is being created or has been read from a directory / zip file, better not add a docstring at this time
  • Tweaked the docstrings of the new methods a bit.

To answer the question in the TODO comment: when updating, any keys starting with @ need to be removed (after popping the @id). You don't want @type to be there, to avoid messing up things. When adding, instead, @type should be there: if not, a Thing will be created (that might be what the user wants though).

Please do add unit tests for the new methods. Better add a new test module for these.

@kinow
Copy link
Member Author

kinow commented Mar 28, 2023

Thanks for the review @simleo ! Will take a look at your commit and update with tests.

@kinow kinow force-pushed the add-json-ld-methods branch from 8821e70 to 9108a90 Compare March 28, 2023 22:11
@kinow kinow force-pushed the add-json-ld-methods branch from c9d09c0 to 2eadd09 Compare March 28, 2023 23:03
@kinow kinow marked this pull request as ready for review March 28, 2023 23:03
@@ -278,7 +345,7 @@ Note that data entities (e.g., workflows) must already be present in the directo
cd test/test-data/ro-crate-galaxy-sortchangecase
```

This directory is already an ro-crate. Delete the metadata file to get a plain directory tree:
This directory is already an RO-Crate. Delete the metadata file to get a plain directory tree:
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Only place ro-crate was used, the rest appeared to use RO-Crate, so I changed the text here to match others.

to, respectively, add a new entity, update an existing entity, or handle
deciding whether to add (if `@id` does not exist in the JSON-LD metadata)
or update an entity automatically.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some example docs, @simleo . Not sure if needed, nor if this is the best place, but given my short memory, I thought it better to write it down somewhere. Let me know if you prefer this to be moved/edited (feel free to edit it too, if you'd like).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tests added! Copied the header from other files. Feel free to edit/suggest changes, please 🙇

Thanks!

@simleo
Copy link
Collaborator

simleo commented Mar 31, 2023

I've changed the docs to make them more general, since the new methods are also general (as appropriate for a generic RO-Crate library) and useful beyond RO-Crate generation by a WMS. I've also tweaked the code a bit, details are in the commits. Thanks for the contribution!

Copy link
Member Author

@kinow kinow left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've changed the docs to make them more general, since the new methods are also general (as appropriate for a generic RO-Crate library) and useful beyond RO-Crate generation by a WMS. I've also tweaked the code a bit, details are in the commits. Thanks for the contribution!

Just had a look at the modified docs, and they look great! Thanks a ton for the review & updates, @simleo.

+1

Thanks

@kinow
Copy link
Member Author

kinow commented Mar 31, 2023

Oh, and the test failure happened on macos... in other projects I had to kick macos jobs as they were less reliable then linux on github-actions. Maybe a kick will fix it, otherwise let me know and I can take a look and try to make the build pass.

@simleo
Copy link
Collaborator

simleo commented Mar 31, 2023

Oh, and the test failure happened on macos...

I think there's something weird happening with the macOS CI instances. I've added a skipif clause to run that test only on posix.

@simleo simleo merged commit 260fdbb into ResearchObject:master Mar 31, 2023
@kinow kinow deleted the add-json-ld-methods branch March 31, 2023 08:48
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Allow to attach partials to a crate?
2 participants