Original Repo: https://github.com/INK-USC/NExT

Installation Instructions

sh create_data_dirs.sh
pip install -r rqs.txt
- If on Linux, you must download pytorch with CUDA 10.1 compatibility
- For more instructions, check here
python -m spacy download en_core_web_sm
modify nltk source code: in nltk/parse/chart.py, line 680, modify function parse, change for edge in self.select(start=0, end=self._num_leaves,lhs=root): to for edge in self.select(start=0, end=self._num_leaves):
place TACRED's train.json, dev.json, test.json into data folder
run through Prepare Tacred Data.ipynb to prepare TACRED data
cd training
python pre_train_find_module.py --build_data (defaults achieve 90% f1)
python train_next_classifier.py --build_data --experiment_name="insertAnyTextHere" (defaults achieve 42.4% f1)
- there are several available params you can set from the command line
- builds data for both strict and soft labeling, only uses strict data
- data pre-processing will take sometime, due to tuning of parser.
- Note: match_batch_size == batch_size in the bilstm case

For both step 8 and 9, subsequent trials on the same dataset don't need the "--build_data" flag; data that has already been computed does not need to be computed again, it is stored to disk.

Directory Descriptions:

CCG_new : everything to do with parsing of explanations and creation of strict and soft labeling functions

main file: CCG_new/parser.py

models : model files

tests : test files, tests are a good place to understand a lot of the functions. train_util_functions.py doesn't have tests around it yet though. To run a test, run the following: pytest test_file_name.py. Example: pytest ccg_util_test.py. Some tests might not pass due to data not existing, read comments in those tests to understand how to build needed data.

training : all code to do with training of models

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

Files

README.md

Latest commit

History

README.md

File metadata and controls