To run JRuby you will need a JRE (the JVM runtime environment) version 7 or higher.
$ java --version
java 9
Java(TM) SE Runtime Environment (build 9+181)
Java HotSpot(TM) 64-Bit Server VM (build 9+181, mixed mode)
Follow these instructions to install JRuby if you do not already have it.
$ ruby --version
jruby 9.3.9.0
Make sure you have ssh keys established on your machine
Make sure you have docker installed and running
Clone the application and install:
$ git clone [email protected]:psu-libraries/psulib_traject.git
$ cd psulib_traject
$ bundle install
For local development, you can change the settings by adding configuration files. These will be ignored by git.
Create 2 files: config/settings.local.yml
and config/settings/test.local.yml
and add the following lines to each:
solr:
url: http://localhost:8983/solr/
port: 8983
Change the URL and port numbers if you want to use a different port. You will also need to set your environment variables with the Solr username and password.
When using jruby, traject will use multiple threads, but we want to tailor that to our system. In
config/settings.local.yml
add:
hathi_overlap_path: spec/fixtures/hathitrust/overlap.tsv
processing_thread_pool: 5
Start Solr via the Docker container
$ bundle exec rake docker:up
This will download and configure Solr, if it's not already present, or if it is, start up the container again. If you need to reconfigure Solr:
$ bundle exec rake docker:clean
$ bundle exec rake docker:conf
Convert marc records and import into Solr
$ bundle exec traject -c config/traject.rb solr/sample_data/sample_psucat.mrc
For testing purposes you can run traject with the --debug-mode
flag to
display the output to the console (and not push the data to Solr).
$ bundle exec traject --debug-mode -c config/traject.rb solr/sample_data/sample_psucat.mrc
HathiTrust access level can be recorded in ht_access_ss
. It will expect to have an overlap report tsv from HathiTrust
at ConfigSettings.hathi_overlap_path
. This file should be the latest overlap report from HathiTrust.
Because the monthly overlap file lives in a restricted area that can only be accessed by signing in to Box at UMich, we
will need to manually set the overlap.tsv prior to indexing operations when there is a new overlap. This can be done by
scp
ing the file up to the location specified in ConfigSettings.hathi_overlap_path
.