Skip to content

kingsdigitallab/eb-pre

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

55 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Experimental prototypes based on the dataset produced by the Nineteenth-Century Knowledge Project led by Peter M. Logan.

Documentation

How to reproduce this proof of concept?

To reproduce the POC from this repository and the corpus.

get the code & data

  1. create a new folder poc
  2. clone this repository into poc/eb-pre
  3. clone the Encyclopedia repository in a separate folder poc/kp-editions

link the data into the code base

  1. cd poc/eb-pre/data
  2. ln -s ../../kp-editions

And remove superseded copies of the encyclopedia entries:

  1. rm -rf kp-editions/eb07/TXT/ kp-editions/eb07/XML/

create & active the python environment

  1. cd poc/eb-pre
  2. python3 -m venv venv
  3. source venv/bin/activate
  4. pip install -U pip
  5. pip install build/requirements.txt

(re-)index the entries with linguistic properties

  1. cd poc/eb-pre/tools
  2. rm ../data/index.json
  3. python prep.py

(re-)create the embeddings

  1. cd poc/eb-pre/tools
  2. rm ../data/semantic_search/*
  3. python classify.py
  4. python compress.py ../data/semantic_search/semantic_search-edition_7-doc2vec-learn-mc_40-ng_1-tm_0.5-ch_sentence.tv2.json 2

launch & visit the web application

  1. cd poc/eb-pre
  2. python3 -m http.server 8000
  3. visit the following URL with your browser: http://localhost:8000/docs/