Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Compare original tokenization to subsequent modifications #108

Closed
wants to merge 8 commits into from

Conversation

bwaldon
Copy link
Collaborator

@bwaldon bwaldon commented Nov 18, 2024

  • Addresses tokenized string showing up after server restart does not match saved tree #91: once an annotation is accepted, the latest tokenization is displayed to the user when the user subsequently views the sentence.
  • Adds 2 fields to the database: senttok and origtok. senttok is a string which reflects the sentence tokenization after any manual edits; origtok is a string reflecting the tokenization prior to any manual edits.
  • Introduces a "Revert tokenization" button in annotate mode. If the user clicks this button, the sentence is re-tokenized according to the original tokenization specified in the input .csv file (the file that SENTENCES points to in settings.cfg).

@bwaldon bwaldon requested a review from nschneid November 19, 2024 14:18
@nschneid
Copy link
Owner

Just to make sure I understand the rationale, the idea of storing the original tokenization in the database is that a user might accidentally modify the tokenization, save a tree, and wish to revert?

@bwaldon
Copy link
Collaborator Author

bwaldon commented Nov 19, 2024

That's right. The original tokenization is also available in the SENTENCES .csv, so for sentences in that file, it's not strictly necessary to store the original tokenization in the database. But looking ahead to #4 (direct sentence entry), we'll need to store information about annotated sentences that come from somewhere other than that .csv. This change anticipates that future functionality (in progress at https://github.com/bwaldon/activedop/tree/direct_entry).

@bwaldon
Copy link
Collaborator Author

bwaldon commented Nov 19, 2024

Following convo 11/19: going to revert changes that reference "original tokenization"

@bwaldon
Copy link
Collaborator Author

bwaldon commented Nov 21, 2024

I'm actually going to close this PR because I think this is functionally identical to #92!

@bwaldon bwaldon closed this Nov 21, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants