This release includes new features to facilitate synchronizing new ProQuest ETDs from AWS S3 and loading these ETDs into GW ScholarSpace while avoiding duplicate loads.
Instructions for importing ETDs from ProQuest have been revised on the Wiki: https://github.com/gwu-libraries/scholarspace-hyrax/wiki/Bulkrax-imports
New features
ETD pipeline
- New
proquest_zipfile
metadata field on GwETD type works, which is intended to store the filename of the original ProQuest zip file - e.g.etdadmin_upload_100535.zip
(#572). This field is not visible to site users, but users with edit rights can see it when editing a work. - New rake tasks:
gwss:populate_etd_proquest_zipfile
This should only need to be run once, for migration of existing GwETD works. It matches up the filename of the main PDF file on each GwETD (e.g.Anderson_gwu_0075M_16591.pdf
) with the main PDF file within each ProQuest zip file in S3 (e.g.etdadmin_upload_1075322.zip
)gwss:download_new_pq_zips
- Downloads new (and only new) ETDs from S3, by comparing filenames in S3 withproquest_zipfile
values on GwETD works.
- Improvements to rake tasks:
gwss:ingest_pq_etds
Technical debt
- Removes old rake tasks for ingesting Bulkrax content that are now no longer needed. (#571). Consistent with this, https://github.com/gwu-libraries/etd-loader and https://github.com/gwu-libraries/batch-loader repositories have been archived.
- Removes remnants of Travis CI (#573)
Upgrade instructions
Prerequisites
Set values in .env
for:
AWS_ACCESS_KEY_ID
AWS_SECRET_ACCESS_KEY
AWS_REGION
AWS_PROQUEST_ETD_BUCKET_NAME
Install new gem(s)
Run bundle install
Populate proquest_zipfile
on existing ETDs:
Run the gwss:populate_etd_proquest_zipfile
task. Edit one of the GwETDs and observe that proquest_zipfile
is (correctly) populated.
(Optionally) Load latest ProQuest ETDs from S3
Follow the instructions at https://github.com/gwu-libraries/scholarspace-hyrax/wiki/Bulkrax-imports to download new ETDs from S3, create the Bulkrax manifest, import into GW ScholarSpace, and clean up.