Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Create a dataset loader for MoNERo #67

Open
hakunanatasha opened this issue Jan 21, 2022 · 7 comments · May be fixed by #516
Open

Create a dataset loader for MoNERo #67

hakunanatasha opened this issue Jan 21, 2022 · 7 comments · May be fixed by #516
Assignees
Labels
CC BY SA Licence CoNLL Format NER Task Romanian Language

Comments

@hakunanatasha
Copy link
Collaborator

From https://www.racai.ro/en/tools/text/

@ruisi-su ruisi-su added CC BY SA Licence CoNLL Format NER Task Romanian Language labels Jan 27, 2022
@qanastek qanastek removed their assignment Mar 31, 2022
@napsternxg
Copy link

#self-assign

@hakunanatasha
Copy link
Collaborator Author

Hi @napsternxg, can you let us know if you are still working on this so we can update our project board? Please just notify us the status by Friday April 8, no worries if you are not finished but intend to work on this. Please either ping me here at @hakunanatasha or ping the discord admins (with @admins)

@napsternxg
Copy link

Hi @hakunanatasha yes I plan to work on this over the weekend.

napsternxg added a commit to napsternxg/biomedical that referenced this issue Apr 11, 2022
@jason-fries
Copy link
Member

Hi @napsternxg
Just a ping on the status of this dataset. Please let us know if you are still working on it and when you plan to submit a PR. Thanks!!

@napsternxg
Copy link

Hi @jason-fries thanks for the reminder. I have started work on this in my local branch.
Will send a PR early next week.

@napsternxg
Copy link

Details on the paper:

@inproceedings{mitrofan-etal-2019-monero,
    title = "{M}o{NER}o: a Biomedical Gold Standard Corpus for the {R}omanian Language",
    author = "Mitrofan, Maria  and
      Barbu Mititelu, Verginica  and
      Mitrofan, Grigorina",
    booktitle = "Proceedings of the 18th BioNLP Workshop and Shared Task",
    month = aug,
    year = "2019",
    address = "Florence, Italy",
    publisher = "Association for Computational Linguistics",
    url = "https://aclanthology.org/W19-5008",
    doi = "10.18653/v1/W19-5008",
    pages = "71--79",
}

The corpus is licensed under the Creative Commons License Attribution-ShareAlike 4.0 International. Hence, I have downloaded it and uploaded it in tar.gz format here for usage in the data loader.

MoNERo.tar.gz

The dataset doesn't have any offsets information hence I am going to make a text by joining the tokens via space and computing offsets on the resulting dataset.

@napsternxg napsternxg linked a pull request Apr 25, 2022 that will close this issue
8 tasks
@napsternxg
Copy link

Added PR: #516

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CC BY SA Licence CoNLL Format NER Task Romanian Language
Projects
Status: PR in Progress
Development

Successfully merging a pull request may close this issue.

5 participants