Skip to content
This repository has been archived by the owner on Oct 4, 2022. It is now read-only.

Create a linguistic parser #406

Closed
3 of 7 tasks
atimmer opened this issue Nov 20, 2019 · 2 comments · Fixed by #459
Closed
3 of 7 tasks

Create a linguistic parser #406

atimmer opened this issue Nov 20, 2019 · 2 comments · Fixed by #459
Assignees
Labels

Comments

@atimmer
Copy link
Contributor

atimmer commented Nov 20, 2019

Explanation

A TextContainer object should have a getTree method that returns the tree based on the text. This tree should be generated by a linguistic parser that knows how to split a text into sentences and words. The getTree method should return a tree with Sentence and Word objects.

  • The Sentence object should contain the content of the sentence and the relative indexes within the text container.
  • The Word object should contain the content of the word and the relative indexes within the text container.

Better suggestions for the name of the linguistic parser are welcome. The linguistic parser should use the code we already have available in the current code. So the sentence parser can be reused. I've created an issue to track the removal of the HTML specific code from the sentence parser

Tasks

  • Create a method on the textContainer class
    • Create tokenizer for sentences based on current tokenizer
    • Create tokenizer for words based on current tokenizer
    • Call tokenizers in getTree method.
    • Return sentence objects, a sentence object contains word objects
  • Create unit tests (copy from existing tokenizer)

Technical decisions

  • The linguistic parser can use the current sentence tokenizer.

Feedback?

@maartenleenders
Copy link
Contributor

I've pushed my work to 406-create-linguistic-parser (not so much yet).

Some clarifications I've gotten during my assignment:

  • The issue includes building the linguistic parser helper functions.
  • The issue includes building the Word and Sentence objects.

Remaining tasks added to the issue ☝️

@maartenleenders maartenleenders removed their assignment Dec 12, 2019
@manuelaugustin manuelaugustin self-assigned this Jan 16, 2020
@manuelaugustin manuelaugustin removed their assignment Jan 22, 2020
@igorschoester igorschoester self-assigned this Jan 22, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants