Skip to content

2.2.2

Latest
Compare
Choose a tag to compare
@kerchner kerchner released this 14 Jan 17:03

This release includes new features to facilitate synchronizing new ProQuest ETDs from AWS S3 and loading these ETDs into GW ScholarSpace while avoiding duplicate loads.

Instructions for importing ETDs from ProQuest have been revised on the Wiki: https://github.com/gwu-libraries/scholarspace-hyrax/wiki/Bulkrax-imports

New features

ETD pipeline

  • New proquest_zipfile metadata field on GwETD type works, which is intended to store the filename of the original ProQuest zip file - e.g. etdadmin_upload_100535.zip (#572). This field is not visible to site users, but users with edit rights can see it when editing a work.
  • New rake tasks:
    • gwss:populate_etd_proquest_zipfile This should only need to be run once, for migration of existing GwETD works. It matches up the filename of the main PDF file on each GwETD (e.g. Anderson_gwu_0075M_16591.pdf) with the main PDF file within each ProQuest zip file in S3 (e.g. etdadmin_upload_1075322.zip)
    • gwss:download_new_pq_zips - Downloads new (and only new) ETDs from S3, by comparing filenames in S3 with proquest_zipfile values on GwETD works.
  • Improvements to rake tasks:
    • gwss:ingest_pq_etds

Technical debt

Upgrade instructions

Prerequisites

Set values in .env for:

  • AWS_ACCESS_KEY_ID
  • AWS_SECRET_ACCESS_KEY
  • AWS_REGION
  • AWS_PROQUEST_ETD_BUCKET_NAME

Install new gem(s)

Run bundle install

Populate proquest_zipfile on existing ETDs:

Run the gwss:populate_etd_proquest_zipfile task. Edit one of the GwETDs and observe that proquest_zipfile is (correctly) populated.

(Optionally) Load latest ProQuest ETDs from S3

Follow the instructions at https://github.com/gwu-libraries/scholarspace-hyrax/wiki/Bulkrax-imports to download new ETDs from S3, create the Bulkrax manifest, import into GW ScholarSpace, and clean up.