Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Clean PTB trees: remove empty nodes, strip functional tags #14

Merged
merged 8 commits into from
Nov 5, 2014

Conversation

moreymat
Copy link
Contributor

The Penn Treebank contains information that parsers do not use for training nor output:

  • empty nodes (for traces),
  • grammatical functions.

This PR provides the infrastructure to discard such information so that trees read from the Penn Treebank contain the same information as trees output by any parser, e.g. the Stanford Parser.

Trees without empty nodes are also easier to align with discourse annotations, cf. #13 .

@moreymat
Copy link
Contributor Author

We need to talk about the color of the bike shed.
Then I will finish to pep8 and pylint it, and provide proper tests.

@kowey kowey added the ON HOLD label Oct 31, 2014
@kowey
Copy link
Contributor

kowey commented Oct 31, 2014

Also, as I understand it, you're asking me to wait for this to settle a bit? (or are you happy merging this in right away?)

@moreymat
Copy link
Contributor Author

Please feel free to accept the PR as soon as you feel we reach a satisfying
state (for you).

2014-10-31 17:36 GMT+01:00 Eric Kow [email protected]:

Also, as I understand it, you're asking me to wait for this to settle a
bit? (or are you happy merging this in right away?)


Reply to this email directly or view it on GitHub
https://github.com/kowey/educe/pull/14#issuecomment-61288816.

@moreymat
Copy link
Contributor Author

moreymat commented Nov 3, 2014

@kowey minor refactoring done, pylint+pep8 too. I think you can safely merge now.

@moreymat
Copy link
Contributor Author

moreymat commented Nov 3, 2014

In fact this is not ready for merge yet as irit-rst-dt gather just failed. I will let you know when it is debugged.

@kowey
Copy link
Contributor

kowey commented Nov 5, 2014

Now that #15 has been merged (with apologies for my sloppiness), is it safe to merge this?

@moreymat
Copy link
Contributor Author

moreymat commented Nov 5, 2014

I think so, thanks !
At least, irit-rst-dt gather now works on my system, on PTB trees free of empty nodes and functional tags.

@kowey kowey removed the ON HOLD label Nov 5, 2014
kowey added a commit that referenced this pull request Nov 5, 2014
Clean PTB trees: remove empty nodes, strip functional tags
@kowey kowey merged commit 8655cfb into irit-melodi:master Nov 5, 2014
@moreymat moreymat deleted the ctree-remove-functional-tags branch November 5, 2014 23:47
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants