A configurable, generic search UI for Archives of Nethys Built with Elastic App Search's Reference UI.
The project can be configured via a JSON config file.
You can easily control things like...
- The Engine the UI runs against
- Which fields are displayed
- The filters that are used
If you would like to make configuration changes, there is no need to regenerate this app from your App Search Dashboard!
You can simply open up the engine.json file, update the options, and then restart this app.
The following is a complete list of options available for configuration in engine.json.
option | value type | required/optional | source |
---|---|---|---|
engineName |
String | required | Found in your App Search Dashboard. |
endpointBase |
String | required* | (*) Elastic Enterprise Search deployment URL, example: "http://127.0.0.1:3002". Not required if using App Search on swiftype.com. |
hostIdentifier |
String | required* | (*) Only required if using App Search on swiftype.com. |
searchKey |
String | required | Found in your App Search Dashboard. |
searchFields |
Array[String] | required | A list of fields that will be searched with your search term. |
resultFields |
Array[String] | required | A list of fields that will be displayed within your results. |
querySuggestFields |
Array[String] | optional | A list of fields that will be searched and displayed as query suggestions. |
titleField |
String | optional | The field to display as the title in results. |
urlField |
String | optional | A field with a url to use as a link in results. |
sortFields |
Array[String] | optional | A list of fields that will be used for sort options. |
facets |
Array[String] | optional | A list of fields that will be available as "facet" filters. Read more about facets within the App Search documentation. |
npm run deploy
Originally, the data came from a scrape of the Archives of Nethys website, for Pathfinder - 2nd Edition. I scraped it using the Elastic App Search Crawler. When I scraped it, the Crawler was still in "beta", so I did a little extra cleanup to get the data formatted exactly as I wanted it.
First, I ran the crawler in local mode, with a config like:
max_indexed_links_count: 1000
max_extracted_links_count: 10000
max_url_length: 70
max_crawl_depth: 10
domain_allowlist:
- https://2e.aonprd.com
seed_urls:
- https://2e.aonprd.com/Actions.aspx
- https://2e.aonprd.com/Afflictions.aspx
- https://2e.aonprd.com/Ancestries.aspx
- https://2e.aonprd.com/Archetypes.aspx
- https://2e.aonprd.com/Backgrounds.aspx
- https://2e.aonprd.com/Classes.aspx
- https://2e.aonprd.com/Conditions.aspx
- https://2e.aonprd.com/Creatures.aspx
- https://2e.aonprd.com/Equipment.aspx?All=true
- https://2e.aonprd.com/Feats.aspx
- https://2e.aonprd.com/Hazards.aspx
- https://2e.aonprd.com/Rules.aspx
- https://2e.aonprd.com/Skills.aspx
- https://2e.aonprd.com/SpellLists.aspx?Tradition=0
- https://2e.aonprd.com/Traits.aspx
- https://2e.aonprd.com/Rules.aspx?ID=1483
- https://2e.aonprd.com/Rules.aspx?ID=748
- https://2e.aonprd.com/Rules.aspx?ID=1587
- https://2e.aonprd.com/Rules.aspx?ID=686
output_sink: file
output_dir: examples/output/aon
Next, I made a new dir:
mkdir examples/output/aon-cleaned/
Then, I cleaned the raw crawl JSON:
require 'json'
require 'nokogiri'
output_dir = 'examples/output/aon-cleaned/'
input_dir = 'examples/output/aon/'
Dir.foreach(input_dir) do |filename|
next if filename == '.' or filename == '..'
puts "Parsing #{filename}"
json = File.read("#{input_dir}#{filename}")
hsh = JSON.parse(json)
parsed_data = Nokogiri::HTML.parse(hsh['content'])
title = parsed_data.title.strip.gsub(' - Archives of Nethys: Pathfinder 2nd Edition Database','')
categories = title.split(' - ')
category = categories.size >= 2 ? categories[1] : nil
sub_category = categories.size >= 3 ? categories[2] :nil
common_keywords = ["Archives", "Nethys", "Wiki", "Archives of Nethys", "Pathfinder", "Official", "AoN", "AoNPRD", "PRD", "PFSRD", "2E", "2nd Edition"]
keywords = parsed_data.at('meta[name=keywords]')['content'].split(', ') - common_keywords
description = parsed_data.at('meta[name=description]')['content']
description = Nokogiri::HTML(description)&.text
body_content = parsed_data.at_css('[id="ctl00_MainContent_DetailedOutput"]')&.text
body_content = parsed_data.at_css('[id="ctl00_RadDrawer1_Content_MainContent_DetailedOutput"]')&.text unless body_content
result = {
:title => title,
:category => category,
:sub_category => sub_category,
:keywords => keywords,
:description => description,
:body => body_content,
:url => hsh['url']
}
output_json = JSON.pretty_generate(result)
File.open("#{output_dir}#{filename}", 'w') { |file| file.write(output_json) }
puts "Parsed #{filename}"
rescue StandardError => e
puts "Failed to parse #{filename} because #{e.class}: #{e.message}"
end
Then, I used the App Search Ruby Client (gem install elastic-enterprise-search
) to index the cleaned json
require 'elastic-enterprise-search'
host = <host>
key = <private key>
ent_client = Elastic::EnterpriseSearch::Client.new(host: host)
ent_client.app_search.http_auth = key
client = ent_client.app_search
engine_name = 'aon-non-crawl'
documents = []
batch = 1
output_dir = 'examples/output/aon-cleaned/'
batch_ids = {}
Dir.foreach(output_dir) do |filename|
next if filename == '.' or filename == '..'
document = JSON.parse(File.read("#{output_dir}#{filename}"))
document[:id] = document.dup.slice('title', 'category', 'keywords', 'description', 'body').hash.to_s
unless batch_ids.keys.include?(document[:id])
documents << document
batch_ids[document[:id]] = true
end
if documents.size == 100
puts "indexing batch #{batch}"
client.index_documents(engine_name, documents: documents)
documents = []
batch_ids = {}
batch = batch + 1
end
end
client.index_documents(engine_name, documents: documents)
If something went wrong, and we wanted to delete the engine contents without deleting the engine itself, we can run:
batch = 0
loop do
batch += 1
result = client.list_documents(engine_name)
ids = result.body['results'].map{|it| it['id']}
if ids.empty?
break
end
puts "Deleting batch #{batch}"
client.delete_documents(engine_name, document_ids: ids)
end