Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Totally Can NOT Re-Produce!!! #13

Closed
A11en0 opened this issue Sep 26, 2023 · 4 comments
Closed

Totally Can NOT Re-Produce!!! #13

A11en0 opened this issue Sep 26, 2023 · 4 comments

Comments

@A11en0
Copy link

A11en0 commented Sep 26, 2023

No description provided.

@A11en0
Copy link
Author

A11en0 commented Sep 26, 2023

  1. Your code is incomplete, so I supplemented it myself.

  2. Your paper doesn't mention the seed or hyperparameters for any dataset, making it impossible to reproduce the results as reported in the paper.

  3. I attempted to reproduce the results by manually adjusting the parameters, but it was unsuccess to reach the results reported in the paper.

Overall, I have significant doubts about the truth of this paper. Coule you please give me an answer?

@stefanhgm
Copy link
Contributor

stefanhgm commented Sep 27, 2023

Hello @A11en0,

thanks for reaching out and using our project.

  1. I also answered this request in your other issue. As stated in the readme, we use the t-few project and provide all necessary files to run it for our setup in t-few. If there is anything else missing, please let us know the exact file? We added additional files to the project on user requests in the past.

  2. The five different seeds are only provided in the code, they are 42, 1024, 0, 1, 32 (line 66 in few-shot-pretrained-100k.sh for the LLM and line 73 in evaluate_external_dataset.py). We did no parameter tuning for the LLM as stated in the paper. You can find the parameters we used for T0 in few-shot-pretrained-100k.sh and the config files configs. All hyperparameters of the baselines are in the appendix of the paper in section "3 PARAMETER TUNING FOR BASELINES". You can also find them in the code evaluate_external_dataset.py. We did an exhaustive search over all parameters.

  3. It is probably very hard to find the right parameters by hand. Please redo the experiments with the seeds and parameters pointed out above (they are the default in our code) and check if you get the correct results. We reproduced the results from scratch using the code in this repository on a different machine, so we are very confident that it is possible.

If you have any further questions, please let us know!

@YasHGoyaL27
Copy link

YasHGoyaL27 commented Oct 3, 2023

I am using the same code as yours with seed = 42, 32 shots, batch size of 4 and dataset being blood dataset with text template serialisations. I got 37.2 % accuracy compared to 67% as reported in the paper.
Could you suggest parameters that I can change to get the desired results

@stefanhgm
Copy link
Contributor

Hello @A11en0,

I hope I answered all of your questions. Please open a new issue if you have further problems.

@YasHGoyaL27: I copied your question into a new issue, since it is a more specific problem.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants