Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Test Datasets #55

Merged
merged 2 commits into from
Jul 29, 2024
Merged

Test Datasets #55

merged 2 commits into from
Jul 29, 2024

Conversation

cc-a
Copy link
Collaborator

@cc-a cc-a commented Jul 24, 2024


This PR creates infrastructure for working with realistic test datasets from previous Imperial College data repository. All of the datasets are currently publically accessible and discoverable via and so do not present any data security issues.

This PR provides:

  • a list of DOIs of the datasets - test_data/dois.
  • a script to download the datasets - test_data/download_test_data.py.
  • a script to create records - test_data/create_test_data_records.py.

See test_data/README.md and the scripts for usage. The total size of the datasets is around 100MB once downloaded.

Not a great deal of thought has been put into mapping record metadata from the Datacite schema to the current internal Invenio one as the later is expected to evolve over the course of the project. The creation of records will need to be updated as the data model changes..

Developer Checklist

Developers should review and confirm each of these items before requesting review

  • Code meets acceptance criteria from issue
  • Unit tests are written and all pass
  • User Test Scripts (if required) are written and have been run through
  • Code documentation and related non-code documentation has all been updated

Reviewer Checklist

Reviewers should review and confirm each of these items before approval
If there are multiple reviewers, this section can be duplicated for each reviewer

  • Code meets acceptance criteria from issue
  • Unit tests are written and all pass
  • User Test Scripts (if required) are written and have been run through
  • Code documentation and related non-code documentation has all been updated
  • Migation has been created and tested

Testing

As noted in the README, make sure your working directory is test_data and the repository services are setup and running then:

  • pipenv run python download_test_data.py (if not run before)
  • pipenv run python create_test_data_records.py

@cc-a cc-a marked this pull request as ready for review July 24, 2024 10:10
Copy link
Collaborator

@J4bbi J4bbi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I successfully deployed and ran the scripts to download the Datacite records and create records in the running Invenio instance.

Everything worked fine, nothing to add.

@cc-a cc-a merged commit e0c5b1e into develop Jul 29, 2024
2 checks passed
@cc-a cc-a deleted the feature/icl-test-data branch July 29, 2024 11:08
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: Done
Development

Successfully merging this pull request may close these issues.

2 participants