Delete the current sentence & word tokenizers/parsers #405

atimmer · 2019-11-20T16:27:49Z

Explanation

The current sentence and word tokenizers/parsers take into account HTML. In #406 we will build parsers for sentences and words that assume there is not HTML in the text anymore.

When all of the text analysis library code relies on the tree instead of the old (flawed) parsers we can delete the old parsers.

Technical decisions

The files I am talking about are:

If this has not been done yet, we should also make sure that all the tests are implemented for the new parsers. Tests with HTML shouldn't be ported. Old tests:

Feedback?

atimmer added this to the StructuredTree milestone Nov 20, 2019

atimmer mentioned this issue Nov 20, 2019

Create a linguistic parser #406

Closed

7 tasks

atimmer changed the title ~~Remove HTML specific code from the sentence parser~~ Delete the current sentence & word tokenizers/parsers Nov 22, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Delete the current sentence & word tokenizers/parsers #405

Delete the current sentence & word tokenizers/parsers #405

atimmer commented Nov 20, 2019 •

edited

Loading

Delete the current sentence & word tokenizers/parsers #405

Delete the current sentence & word tokenizers/parsers #405

Comments

atimmer commented Nov 20, 2019 • edited Loading

Explanation

Technical decisions

Feedback?

atimmer commented Nov 20, 2019 •

edited

Loading