Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

remove concept of 'word' #47

Open
missinglink opened this issue Jun 11, 2019 · 2 comments
Open

remove concept of 'word' #47

missinglink opened this issue Jun 11, 2019 · 2 comments
Assignees

Comments

@missinglink
Copy link
Member

This library has the concepts of word, phrase and section

I not sure if the word concept is required as it can be represented as a single token phrase.
In fact, I think there is duplication between words and single token phrases right now.

If possible, it would be nice to remove the concept of a word, which should help clean up the code.

@missinglink
Copy link
Member Author

@Joxit 👍 or 👎 on this? I might have a look at doing it at some point when I have some time, but it will be a fairly noisy commit.

@missinglink
Copy link
Member Author

missinglink commented Apr 25, 2020

I had a think about how this might work, before attempting this I think we should first focus on the graph:

  • document existing graph relationships in readme
  • possibly refactor or rename graph relationships for clarity (and to have different verbs for phrase and word relationships)

once this is done it should be possible to delete all single-word phrase objects and simply replace them with a pointer to the word span.


The main benefit of doing this refactor would be to clean up all the classifier and solver logic, which can get quite verbose and complex.

So if we have an idea of what we'd like the graph calls to look like to improve this then we can go ahead and start introducing new graph relationships to support them.

Since relationships in the graph are cheap, we can safely build up a range of links, and also its fairly easy to monitor the use of graph relationships we'd like to deprecate (link child?) and work to gradually replace them with other relationships until the codebase is 100% migrated.

At some point we can remove the WordClassifier completely so it's no longer possible to classify word spans directly.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant