Skip to content
This repository has been archived by the owner on Oct 4, 2022. It is now read-only.

Delete the current sentence & word tokenizers/parsers #405

Open
atimmer opened this issue Nov 20, 2019 · 0 comments
Open

Delete the current sentence & word tokenizers/parsers #405

atimmer opened this issue Nov 20, 2019 · 0 comments

Comments

@atimmer
Copy link
Contributor

atimmer commented Nov 20, 2019

Explanation

The current sentence and word tokenizers/parsers take into account HTML. In #406 we will build parsers for sentences and words that assume there is not HTML in the text anymore.

When all of the text analysis library code relies on the tree instead of the old (flawed) parsers we can delete the old parsers.

Technical decisions

The files I am talking about are:

If this has not been done yet, we should also make sure that all the tests are implemented for the new parsers. Tests with HTML shouldn't be ported. Old tests:

Feedback?

@atimmer atimmer added this to the StructuredTree milestone Nov 20, 2019
@atimmer atimmer mentioned this issue Nov 20, 2019
7 tasks
@atimmer atimmer changed the title Remove HTML specific code from the sentence parser Delete the current sentence & word tokenizers/parsers Nov 22, 2019
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant