Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GH-3488: Support for writing a ColumnCorpus instance to files #3497

Open
wants to merge 10 commits into
base: master
Choose a base branch
from

Conversation

chelseagzr
Copy link

This PR addresses #3488

Current limitations:

  • Only token level or span level sequence tagging labels are supported by write_to_directory
  • The whitespace_after attribute of tokens will not be preserved after saving the corpus to files; only default_whitespace_after attribute of datasets will be preserved.

@chelseagzr chelseagzr force-pushed the gh-3488/save-column-corpus-to-files branch from 1c8f4ad to d1ce766 Compare July 11, 2024 06:38
flair/datasets/sequence_labeling.py Outdated Show resolved Hide resolved
flair/datasets/sequence_labeling.py Outdated Show resolved Hide resolved
flair/datasets/sequence_labeling.py Outdated Show resolved Hide resolved
flair/datasets/sequence_labeling.py Outdated Show resolved Hide resolved
flair/datasets/sequence_labeling.py Outdated Show resolved Hide resolved
@chelseagzr chelseagzr requested a review from mattb-zip July 17, 2024 18:24
@chelseagzr chelseagzr force-pushed the gh-3488/save-column-corpus-to-files branch from a8bc603 to 6941510 Compare July 17, 2024 18:24
Copy link
Collaborator

@alanakbik alanakbik left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for adding this!

One issue is that this adds many public methods to ColumnCorpus and ColumnDataset, which may be confusing to users since most of these are used only for a single functionality. I suggest leaving only the respective "save" and "load" method public (since these are the ones that users will call), and marking all others as internal.

flair/datasets/sequence_labeling.py Outdated Show resolved Hide resolved
flair/datasets/sequence_labeling.py Outdated Show resolved Hide resolved
flair/datasets/sequence_labeling.py Outdated Show resolved Hide resolved
flair/datasets/sequence_labeling.py Outdated Show resolved Hide resolved
@chelseagzr chelseagzr requested a review from alanakbik July 23, 2024 17:44
@chelseagzr
Copy link
Author

@alanakbik Could you please review the new changes when you have a chance? Thank you!

@chelseagzr
Copy link
Author

@alanakbik Could you please review the new changes when you have a chance? Thank you!

@alanakbik
Copy link
Collaborator

hello @chelseagzr very sorry it took so long! I updated your branch for our current master, but one of the imports got lost. Could you add it back in?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants