Skip to content

Latest commit

 

History

History
49 lines (32 loc) · 1.46 KB

README.md

File metadata and controls

49 lines (32 loc) · 1.46 KB

Experimental prototypes based on the dataset produced by the Nineteenth-Century Knowledge Project led by Peter M. Logan.

Documentation

How to reproduce this proof of concept?

To reproduce the POC from this repository and the corpus.

get the code & data

  1. create a new folder poc
  2. clone this repository into poc/eb-pre
  3. clone the Encyclopedia repository in a separate folder poc/kp-editions

link the data into the code base

  1. cd poc/eb-pre/data
  2. ln -s ../../kp-editions

And remove superseded copies of the encyclopedia entries:

  1. rm -rf kp-editions/eb07/TXT/ kp-editions/eb07/XML/

create & active the python environment

  1. cd poc/eb-pre
  2. python3 -m venv venv
  3. source venv/bin/activate
  4. pip install -U pip
  5. pip install build/requirements.txt

(re-)index the entries with linguistic properties

  1. cd poc/eb-pre/tools
  2. rm ../data/index.json
  3. python prep.py

(re-)create the embeddings

  1. cd poc/eb-pre/tools
  2. rm ../data/semantic_search/*
  3. python classify.py
  4. python compress.py ../data/semantic_search/semantic_search-edition_7-doc2vec-learn-mc_40-ng_1-tm_0.5-ch_sentence.tv2.json 2

launch & visit the web application

  1. cd poc/eb-pre
  2. python3 -m http.server 8000
  3. visit the following URL with your browser: http://localhost:8000/docs/