Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

T300 bulkrax8.0.0 #555

Merged
merged 30 commits into from
Sep 4, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
30 commits
Select commit Hold shift + click to select a range
18de431
Upgrade from Bulkrax 2.3.0 to 8.0.0, no configuration just yet
kerchner Apr 15, 2024
c9a3b2b
Fixes uploads-with-files issue by pointing to bulkrax branch
kerchner Apr 21, 2024
b73746c
Work in Progress - tasks to ingest ProQuest ETD zips
kerchner Apr 28, 2024
c9b1c42
WIP - next need to create CSV from array of metadata hashes
kerchner May 1, 2024
3018d19
WIP - fixed problem creating header row
kerchner May 2, 2024
8c69d97
Fixed embargo logic; fixed CSV structure
kerchner May 6, 2024
7d7b47e
Eliminated folder names from metadata csv FileSet entries; copy files…
kerchner May 8, 2024
dd9eb7c
Adds 'bulkrax_identifier' metadata; fixes imports of works w/files, u…
kerchner May 20, 2024
c13eb43
implemented parent work/child FileSet bulkrax_identifier, repaired em…
kerchner May 21, 2024
5fdf6a7
refactor file paths for extracted zip; parse creator/contributors
kerchner May 24, 2024
e97b032
Repair attachment filenames with spaces (or else bulkrax will); fix a…
kerchner May 24, 2024
d5342b1
Add degree, advisors, committee members
kerchner May 24, 2024
d6a515d
Add gw_affiliation, date_created
kerchner May 24, 2024
c47c8ec
Simplify embargo date; add rights statement; clean up
kerchner May 27, 2024
4419701
Fix truncated file; clarify configs, set default rights
kerchner May 27, 2024
caff6dc
Update bulkrax hash, now contains db migration fix
kerchner May 28, 2024
d3e87e5
Code cleanup for PR
kerchner Jun 3, 2024
01d91df
Add scholarspace-ingest directory and volume mapping
kerchner Jun 3, 2024
020ad50
Add mapping for scholarspace-ingest directory
kerchner Jun 4, 2024
99cbb75
Add CI directive to create ingest folder
kerchner Jun 4, 2024
9fd14dd
Upgrade Bulkrax to 8.1.0
kerchner Jun 4, 2024
c764271
Allow admin user to visit /importers and /exporters even when there i…
kerchner Jun 7, 2024
22ea974
Add fixture zips for bulkrax rspec testing
alepbloyd Jun 11, 2024
7a09fa8
Add sidekiq inline testing setting
alepbloyd Jun 11, 2024
b3aa9c1
Set testing queue for inline sidekiq
alepbloyd Jun 11, 2024
86762e0
Modify ingest_bulkrax_prep when in test mode
alepbloyd Jun 11, 2024
906e622
Add bulkrax importer tests
alepbloyd Jun 11, 2024
d33a54a
Simplify bulkrax tests
alepbloyd Jun 11, 2024
9030ee8
Populates degree and resource_type. License is still WIP, pending inp…
kerchner Jun 14, 2024
bf8c139
Added resource_type field
kerchner Jun 14, 2024
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions .github/workflows/ci-cache.yml
Original file line number Diff line number Diff line change
Expand Up @@ -33,6 +33,7 @@ jobs:
mkdir /opt/scholarspace-minter
mkdir /opt/scholarspace/fedora-data
mkdir /opt/scholarspace/solr-data
mkdir /opt/scholarspace/scholarspace-ingest
cd /opt/scholarspace
# Checkout the repository code
- name: Check out repository code
Expand Down
1 change: 1 addition & 0 deletions Dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -41,6 +41,7 @@ RUN mkdir -p /opt/scholarspace/scholarspace-hyrax \
&& mkdir -p /opt/scholarspace/scholarspace-tmp \
&& mkdir -p /opt/scholarspace/scholarspace-minter \
&& mkdir -p /opt/scholarspace/scholarspace-derivatives \
&& mkdir -p /opt/scholarspace/scholarspace-ingest \
&& chmod 775 -R /opt/scholarspace/scholarspace-derivatives

WORKDIR /opt/scholarspace/scholarspace-hyrax
Expand Down
3 changes: 1 addition & 2 deletions Gemfile
Original file line number Diff line number Diff line change
Expand Up @@ -64,8 +64,7 @@ gem 'riiif', '~> 2.0'

gem 'cookies_eu'

#gem 'bulkrax', git: 'https://github.com/samvera-labs/bulkrax.git'
gem 'bulkrax', '2.3.0'
gem 'bulkrax', '8.1.0'

gem 'willow_sword', github: 'notch8/willow_sword'

Expand Down
17 changes: 11 additions & 6 deletions Gemfile.lock
Original file line number Diff line number Diff line change
Expand Up @@ -104,7 +104,7 @@ GEM
babel-transpiler (0.7.0)
babel-source (>= 4.0, < 6)
execjs (~> 2.0)
bagit (0.4.5)
bagit (0.4.6)
docopt (~> 0.5.0)
validatable (~> 1.6)
base64 (0.2.0)
Expand Down Expand Up @@ -162,18 +162,21 @@ GEM
signet (~> 0.8)
typhoeus
builder (3.2.4)
bulkrax (2.3.0)
bagit (~> 0.4)
bulkrax (8.1.0)
bagit (~> 0.4.6)
coderay
denormalize_fields
iso8601 (~> 0.9.0)
kaminari
language_list (~> 1.2, >= 1.2.1)
libxml-ruby (~> 3.1.0)
libxml-ruby (~> 3.2.4)
loofah (>= 2.2.3)
marcel
oai (>= 0.4, < 2.x)
rack (>= 2.0.6)
rails (>= 5.1.6)
rdf (>= 2.0.2, < 4.0)
rubyzip
simple_form
byebug (11.1.3)
cancancan (1.17.0)
Expand Down Expand Up @@ -221,6 +224,8 @@ GEM
declarative-builder (0.1.0)
declarative-option (< 0.2.0)
declarative-option (0.1.0)
denormalize_fields (1.3.0)
activerecord (>= 4.1.14, < 8.0.0)
deprecation (1.1.0)
activesupport
devise (4.9.2)
Expand Down Expand Up @@ -577,7 +582,7 @@ GEM
multi_json
libv8-node (16.19.0.1-x86_64-darwin)
libv8-node (16.19.0.1-x86_64-linux)
libxml-ruby (3.1.0)
libxml-ruby (3.2.4)
link_header (0.0.8)
linkeddata (3.1.6)
equivalent-xml (~> 0.6)
Expand Down Expand Up @@ -1062,7 +1067,7 @@ DEPENDENCIES
blacklight_range_limit
bootsnap (>= 1.1.0)
bootstrap-sass (~> 3.0)
bulkrax (= 2.3.0)
bulkrax (= 8.1.0)
byebug
capybara (>= 2.15)
chosen-rails
Expand Down
11 changes: 1 addition & 10 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -66,6 +66,7 @@ a separate user for the app, but it is not necessary. That user will need to ow
/opt/scholarspace/certs
/opt/scholarspace/scholarspace-tmp
/opt/scholarspace/scholarspace-minter
/opt/scholarspace/scholarspace-ingest
```
6. In `/opt/scholarspace/scholarspace-hyrax` run `cp example.env .env` to create the local environment file.
7. Edit `.env` to add the following values:
Expand Down Expand Up @@ -174,16 +175,6 @@ echo $CR_PAT | docker login ghcr.io -u [USERNAME] --password-stdin

## Setting up a new production instance

### (Optional) Install etd-loader

* Install the **etd-loader** application in `/opt/etd-loader` as per instructions at https://github.com/gwu-libraries/etd-loader

* When configuring `config.py`, ensure that it contains the following values:
```
ingest_path = "/opt/scholarspace/scholarspace-hyrax"
ingest_command = "rake RAILS_ENV=production gwss:ingest_etd"
```

### Migrating Production Database

In the app-server container (i.e. through `docker exec -it scholarspace-hyrax_app-server_1 /bin/sh`, followed by `su scholarspace`), run:
Expand Down
11 changes: 11 additions & 0 deletions app/models/ability.rb
Original file line number Diff line number Diff line change
Expand Up @@ -42,4 +42,15 @@ def contentadmins_can_create_curation_concerns
can :index, Hydra::AccessControls::Embargo
can :index, Hydra::AccessControls::Lease
end

# Added for Bulkrax 5.0.0+
def can_import_works?
# can_create_any_work?
admin? or contentadmin_user?
end

def can_export_works?
# can_create_any_work?
admin? or contentadmin_user?
end
end
4 changes: 4 additions & 0 deletions app/models/collection.rb
Original file line number Diff line number Diff line change
Expand Up @@ -4,4 +4,8 @@ class Collection < ActiveFedora::Base
# You can replace these metadata if they're not suitable
include Hyrax::BasicMetadata
self.indexer = Hyrax::CollectionWithBasicMetadataIndexer

property :bulkrax_identifier, predicate: ::RDF::URI("https://iro.bl.uk/resource#bulkraxIdentifier"), multiple: false do |index|
index.as :stored_searchable, :facetable
end
end
6 changes: 6 additions & 0 deletions app/models/file_set.rb
Original file line number Diff line number Diff line change
@@ -1,4 +1,10 @@
# Generated by hyrax:models:install
class FileSet < ActiveFedora::Base
# include ::Hyrax::FileSetBehavior

property :bulkrax_identifier, predicate: ::RDF::URI("https://iro.bl.uk/resource#bulkraxIdentifier"), multiple: false do |index|
index.as :stored_searchable, :facetable
end

include ::Hyrax::FileSetBehavior
end
4 changes: 4 additions & 0 deletions app/models/gw_etd.rb
Original file line number Diff line number Diff line change
Expand Up @@ -28,5 +28,9 @@ class GwEtd < ActiveFedora::Base
index.as :stored_searchable, :facetable
end

property :bulkrax_identifier, predicate: ::RDF::URI("https://iro.bl.uk/resource#bulkraxIdentifier"), multiple: false do |index|
index.as :stored_searchable, :facetable
end

include ::Hyrax::BasicMetadata
end
4 changes: 4 additions & 0 deletions app/models/gw_journal_issue.rb
Original file line number Diff line number Diff line change
Expand Up @@ -24,6 +24,10 @@ class GwJournalIssue < ActiveFedora::Base
index.as :stored_searchable
end

property :bulkrax_identifier, predicate: ::RDF::URI("https://iro.bl.uk/resource#bulkraxIdentifier"), multiple: false do |index|
index.as :stored_searchable, :facetable
end

# This must be included at the end, because it finalizes the metadata
# schema (by adding accepts_nested_attributes)
include ::Hyrax::BasicMetadata
Expand Down
6 changes: 5 additions & 1 deletion app/models/gw_work.rb
Original file line number Diff line number Diff line change
Expand Up @@ -16,5 +16,9 @@ class GwWork < ActiveFedora::Base
index.as :stored_searchable
end

property :bulkrax_identifier, predicate: ::RDF::URI("https://iro.bl.uk/resource#bulkraxIdentifier"), multiple: false do |index|
index.as :stored_searchable, :facetable
end

include ::Hyrax::BasicMetadata
end
end
146 changes: 146 additions & 0 deletions bin/importer
Original file line number Diff line number Diff line change
@@ -0,0 +1,146 @@
#!/usr/bin/env ruby
# frozen_string_literal: true

require_relative '../config/environment'

require 'slop'

def main(opts = {})
check_required_params

update = opts[:importer_id].present?
port = opts[:port].presence
url = build_url(opts.delete(:importer_id), opts.delete(:url), port)

headers = { 'Content-Type' => 'application/json' }
headers['Authorization'] = "Token: #{opts.delete(:auth_token)}"
params = build_params(opts)

logger.info("POST to #{url} - PARAMS #{params}")

conn = Faraday.new(
url: url,
headers: headers
)

response = if update
conn.put do |request|
request.body = params.to_json
end
else
conn.post do |request|
request.body = params.to_json
end
end

puts "#{response.status} - #{response.body.truncate(200)}"
end

def check_required_params
if opts[:importer_id].blank? && invalid?(opts)
puts 'Missing required parameters'
help
end

if opts[:auth_token].blank? # rubocop:disable Style/GuardClause
puts 'Missing Authentication Token --auth_token'
exit
end
end

def invalid?(opts)
required_params.each do |p|
return true if opts[p.to_sym].blank?
end
return false
end

def required_params
Bulkrax.api_definition['bulkrax']['importer'].map { |key, value| key if value['required'] == true }.compact
end

def build_params(opts = {})
params = {}
params[:commit] = opts.delete(:commit)
parser_fields = {
metadata_file_name: opts.delete(:metadata_file_name),
metadata_format: opts.delete(:metadata_format),
rights_statement: opts.delete(:rights_statement),
override_rights_statement: opts.delete(:override_rights_statement),
import_file_path: opts.delete(:import_file_path),
metadata_prefix: opts.delete(:metadata_prefix),
set: opts.delete(:set),
collection_name: opts.delete(:collection_name)
}.compact
params[:importer] = opts.compact
params[:importer][:user_id] = opts.delete(:user_id)
params[:importer][:admin_set_id] = opts.delete(:admin_set_id)
params[:importer][:parser_fields] = parser_fields || {}
return params.compact
end

def build_url(importer_id, url, port = nil)
if url.nil?
protocol = Rails.application.config.force_ssl ? 'https://' : 'http://'
host = Rails.application.config.action_mailer.default_url_options[:host]
url = "#{protocol}#{host}"
url = "#{url}:#{port}" if port
end
path = Bulkrax::Engine.routes.url_helpers.polymorphic_path(Bulkrax::Importer)
url = File.join(url, path)
url = File.join(url, importer_id) if importer_id
return url
end

def logger
Rails.logger
end

def version
puts "Bulkrax #{Bulkrax::VERSION}"
puts "Slop #{Slop::VERSION}"
end

# Format the help for the CLI
def help
puts 'CREATE:'
puts ' bin/importer --name "My Import" --parser_klass Bulkrax::CsvParser --commit "Create and Import" --import_file_path /data/tmp/import.csv --auth_token 12345'
puts 'UPDATE:'
puts ' bin/importer --importer_id 1 --commit "Update and Re-Import (update metadata only)" --import_file_path /data/tmp/import.csv --auth_token 12345'
puts 'PARAMETERS:'
Bulkrax.api_definition['bulkrax']['importer'].each_pair do |key, value|
next if key == 'parser_fields'
puts " --#{key}"
value.each_pair do |k, v|
next if k == 'contained_in'
puts " #{k}: #{v}"
end
end
puts ' --url'
puts " Repository URL"
exit
end

# Setup the options
options = Slop.parse do |o|
o.on '--version', 'Print the version' do
version
exit
end

o.on '--help', 'Print help' do
help
exit
end

Bulkrax.api_definition['bulkrax']['importer'].each_pair do |key, value|
if value['required'].blank?
o.string "--#{key}", value['definition'], default: nil
else
o.string "--#{key}", value['definition']
end
end
o.string '--url', 'Repository URL'
end

main(options.to_hash)
2 changes: 1 addition & 1 deletion config/environments/test.rb
Original file line number Diff line number Diff line change
Expand Up @@ -41,5 +41,5 @@
# config.action_view.raise_on_missing_translations = true
config.permanent_url_base = "https://scholarspace-etds.library.gwu.edu/"

config.active_job.queue_adapter = :test
config.active_job.queue_adapter = :sidekiq
end
Loading