diff --git a/README.md b/README.md index 25e5772..fdeb47c 100644 --- a/README.md +++ b/README.md @@ -85,6 +85,8 @@ After that, using the Brown clustering code: The resulting file is under `train-terms-c140-p1.out/paths`, which can be renamed to `clusters-train-berk.txt`. +For new datasets (diff to the penn treebank), it is highly recommended to generate new clusters following the steps shown above. + ### Training the generative model nohup build/nt-parser/nt-parser-gen -x -T [training_oracle_generative] -d [dev_oracle_generative] -t --clusters clusters-train-berk.txt --input_dim 256 --lstm_input_dim 256 --hidden_dim 256 -D 0.3 > log_gen.txt