Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This PR creates infrastructure for working with realistic test datasets from previous Imperial College data repository. All of the datasets are currently publically accessible and discoverable via and so do not present any data security issues.
This PR provides:
test_data/dois
.test_data/download_test_data.py
.test_data/create_test_data_records.py
.See
test_data/README.md
and the scripts for usage. The total size of the datasets is around 100MB once downloaded.Not a great deal of thought has been put into mapping record metadata from the Datacite schema to the current internal Invenio one as the later is expected to evolve over the course of the project. The creation of records will need to be updated as the data model changes..
Developer Checklist
Developers should review and confirm each of these items before requesting review
Reviewer Checklist
Reviewers should review and confirm each of these items before approval
If there are multiple reviewers, this section can be duplicated for each reviewer
Testing
As noted in the README, make sure your working directory is
test_data
and the repository services are setup and running then:pipenv run python download_test_data.py
(if not run before)pipenv run python create_test_data_records.py