Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

The version of FewNERD #42

Open
dongguanting opened this issue Aug 26, 2022 · 9 comments
Open

The version of FewNERD #42

dongguanting opened this issue Aug 26, 2022 · 9 comments

Comments

@dongguanting
Copy link

Hi, @iofu728. It seems the open source dataset “episode-data” is the arxiv version of FewNERD? I found that the reproduced results are very different from those in the paper, maybe you use the ACL version of FewNERD in the paper?

@iofu728
Copy link
Contributor

iofu728 commented Aug 26, 2022

Hi @dongguanting,
We don't have the copyright of Few-NERD dataset. Please contact the owner of this dataset.
We already clear the Few-NERD version in our paper footnote 5. And we show all Few-NERD ACL and arxiv version results in our Github repo.

@dongguanting
Copy link
Author

dongguanting commented Aug 28, 2022

Thanks a lot for your reply. I still have a question during testing cross dataset senario. How to set up the script to execute the settings in your paper (2 datasets for training, 1 for valid, 1 for test), does this mean that it need to perform 2 rounds of training process with spans and types of 2 different ner_train.json?

@iofu728
Copy link
Contributor

iofu728 commented Aug 29, 2022

Hi @dongguanting, not really, in the Cross-Domain dataset, you only need to train once on the training set (Span+Type) and then evaluate it directly. In the training phase, the model can see all task data of both domains.
In our scripts, you can set the dataset to Domain and use the difference N to set the domain.

N=1 # 1 or 2 or 3 or 4
K=1 # 1 or 5

...
--dataset Domain \

@liyongqi2002
Copy link

Maybe you wrongly reversed the results of the ACL version and arXiv version in this repo?(f1 of FEW-NERD arxiv version is higher,but in your repo,the ACL version result is higher)
And I downloaded the arxiv version of episodes-data before (568MB, this link is already unavailable), the only version of episodes-date we can download on the FEW-NERD website (500 MB) is probably ACL version.

@iofu728
Copy link
Contributor

iofu728 commented Sep 10, 2022

Hi @liyongqi2002, thanks for the reminder. We have some problems with the presentation of the Few-NERD dataset version. I will fix it as soon as possible.
In fact, the first table is the Few-NERD arixv v5 version result(542MB, using the URL link in thunlp/Few-NERD@cb16dc4#diff-5f38df55c9cead1d256aa12d97c6ed0244fcbda7e36124fa9ec53014750d1283R23), which also use in CONTAINER and ESD.
The second table is the Few-NERD arixv v6 version result(500MB, using the URL link in thunlp/Few-NERD@e329079#diff-5f38df55c9cead1d256aa12d97c6ed0244fcbda7e36124fa9ec53014750d1283R20), which also use in ESD.

@liyongqi2002
Copy link

liyongqi2002 commented Sep 10, 2022

Hi @liyongqi2002, thanks for the reminder. We have some problems with the presentation of the Few-NERD dataset version. I will fix it as soon as possible. In fact, the first table is the Few-NERD arixv v5 version result(542MB, using the URL link in thunlp/Few-NERD@cb16dc4#diff-5f38df55c9cead1d256aa12d97c6ed0244fcbda7e36124fa9ec53014750d1283R23), which also use in CONTAINER and ESD. The second table is the Few-NERD [arixv v6](https://arxiv.org/pdf/2105.07464v6.pdf, using the URL link in thunlp/Few-NERD@e329079#diff-5f38df55c9cead1d256aa12d97c6ed0244fcbda7e36124fa9ec53014750d1283R20) version result(500MB), which also use in ESD.

Thanks for your reply, so the results that can be compared now are the results of the second table (using the 500MB episodes data, which is also presented in https://paperswithcode.com/sota/few-shot-ner-on-few-nerd-inter), is my understanding correct?

@iofu728
Copy link
Contributor

iofu728 commented Sep 10, 2022

Hi @liyongqi2002, thanks for the reminder. We have some problems with the presentation of the Few-NERD dataset version. I will fix it as soon as possible. In fact, the first table is the Few-NERD arixv v5 version result(542MB, using the URL link in thunlp/Few-NERD@cb16dc4#diff-5f38df55c9cead1d256aa12d97c6ed0244fcbda7e36124fa9ec53014750d1283R23), which also use in CONTAINER and ESD. The second table is the Few-NERD [arixv v6](https://arxiv.org/pdf/2105.07464v6.pdf, using the URL link in thunlp/Few-NERD@e329079#diff-5f38df55c9cead1d256aa12d97c6ed0244fcbda7e36124fa9ec53014750d1283R20) version result(500MB), which also use in ESD.

Thanks for your reply, so the results that can be compared now are the results of the second table (using the 500MB episodes data, which is also presented in https://paperswithcode.com/sota/few-shot-ner-on-few-nerd-inter), is my understanding correct?

Yeah, you can compare the results in the second table by using the 500MB episodes data.

@GenVr
Copy link

GenVr commented Sep 21, 2022

@dongguanting I'm also trying the code but it asks me episode-data/inter/... missing.
Where can I obtain this dataset? I downloaded the Few-NERD dataset but they are .txt files.
Thanks

@iofu728
Copy link
Contributor

iofu728 commented Sep 23, 2022

@dongguanting I'm also trying the code but it asks me episode-data/inter/... missing. Where can I obtain this dataset? I downloaded the Few-NERD dataset but they are .txt files. Thanks

Hi @GenVr, you can download the arxiv v6 version Few-NERD dataset by follow the script in their repo in https://github.com/thunlp/Few-NERD/blob/main/data/download.sh#L20-L22.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants