-
Notifications
You must be signed in to change notification settings - Fork 1
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Add a reader for harvesting directly from purl-fetcher HTTP API
This adds a traject reader that can be useful in development when you want to quickly index many records from purl-fetcher without resorting to Kafka. It is intended for dev use only. It can point at any release target (searchworks, earthworks) and index all of the items currently released to that target. This PR also modifies PublicCocinaRecord and PublicXmlRecord to optionally accept a connection object, so that a single Faraday connection can be shared amongst the reader and records, which enables parallelizing record-fetching from purl to match the number of traject threads. This setup allows indexing everything released to Earthworks in a little under 5 minutes with 4 threads on my machine.
- Loading branch information
1 parent
d5e0a2c
commit 905be11
Showing
6 changed files
with
80 additions
and
13 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,45 @@ | ||
# frozen_string_literal: true | ||
|
||
require 'faraday' | ||
require 'progress_bar' | ||
|
||
module Traject | ||
# A reader that fetches all items released to a target from purl-fetcher | ||
class PurlFetcherReader | ||
attr_reader :input_stream, :settings | ||
|
||
def initialize(input_stream, settings) | ||
@settings = Traject::Indexer::Settings.new settings | ||
@input_stream = input_stream | ||
end | ||
|
||
def each | ||
return to_enum(:each) unless block_given? | ||
|
||
response = client.get("/released/#{target}.json") | ||
records = JSON.parse(response.body) | ||
bar = ProgressBar.new(records.length) | ||
|
||
records.each do |record| | ||
yield PurlRecord.new(record['druid'].gsub('druid:', ''), purl_url: @settings['purl.url'], client:) | ||
bar.increment! | ||
end | ||
end | ||
|
||
private | ||
|
||
def target | ||
@settings['purl_fetcher.target'] || 'Searchworks' | ||
end | ||
|
||
def host | ||
@settings['purl_fetcher.url'] || 'https://purl-fetcher.stanford.edu' | ||
end | ||
|
||
def client | ||
@client ||= Faraday.new(url: host) do |builder| | ||
builder.adapter(:net_http_persistent, pool_size: @settings['processing_thread_pool']) | ||
end | ||
end | ||
end | ||
end |