-
Notifications
You must be signed in to change notification settings - Fork 15
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Clean PTB trees: remove empty nodes, strip functional tags #14
Clean PTB trees: remove empty nodes, strip functional tags #14
Conversation
We need to talk about the color of the bike shed. |
Also, as I understand it, you're asking me to wait for this to settle a bit? (or are you happy merging this in right away?) |
Please feel free to accept the PR as soon as you feel we reach a satisfying 2014-10-31 17:36 GMT+01:00 Eric Kow [email protected]:
|
@kowey minor refactoring done, pylint+pep8 too. I think you can safely merge now. |
In fact this is not ready for merge yet as |
Now that #15 has been merged (with apologies for my sloppiness), is it safe to merge this? |
I think so, thanks ! |
Clean PTB trees: remove empty nodes, strip functional tags
The Penn Treebank contains information that parsers do not use for training nor output:
This PR provides the infrastructure to discard such information so that trees read from the Penn Treebank contain the same information as trees output by any parser, e.g. the Stanford Parser.
Trees without empty nodes are also easier to align with discourse annotations, cf. #13 .