Totally Can NOT Re-Produce!!! #13

A11en0 · 2023-09-26T07:24:08Z

No description provided.

A11en0 · 2023-09-26T07:29:12Z

Your code is incomplete, so I supplemented it myself.
Your paper doesn't mention the seed or hyperparameters for any dataset, making it impossible to reproduce the results as reported in the paper.
I attempted to reproduce the results by manually adjusting the parameters, but it was unsuccess to reach the results reported in the paper.

Overall, I have significant doubts about the truth of this paper. Coule you please give me an answer?

stefanhgm · 2023-09-27T12:35:39Z

Hello @A11en0,

thanks for reaching out and using our project.

I also answered this request in your other issue. As stated in the readme, we use the t-few project and provide all necessary files to run it for our setup in t-few. If there is anything else missing, please let us know the exact file? We added additional files to the project on user requests in the past.
The five different seeds are only provided in the code, they are 42, 1024, 0, 1, 32 (line 66 in few-shot-pretrained-100k.sh for the LLM and line 73 in evaluate_external_dataset.py). We did no parameter tuning for the LLM as stated in the paper. You can find the parameters we used for T0 in few-shot-pretrained-100k.sh and the config files configs. All hyperparameters of the baselines are in the appendix of the paper in section "3 PARAMETER TUNING FOR BASELINES". You can also find them in the code evaluate_external_dataset.py. We did an exhaustive search over all parameters.
It is probably very hard to find the right parameters by hand. Please redo the experiments with the seeds and parameters pointed out above (they are the default in our code) and check if you get the correct results. We reproduced the results from scratch using the code in this repository on a different machine, so we are very confident that it is possible.

If you have any further questions, please let us know!

YasHGoyaL27 · 2023-10-03T02:22:41Z

I am using the same code as yours with seed = 42, 32 shots, batch size of 4 and dataset being blood dataset with text template serialisations. I got 37.2 % accuracy compared to 67% as reported in the paper.
Could you suggest parameters that I can change to get the desired results

stefanhgm · 2023-10-23T15:50:47Z

Hello @A11en0,

I hope I answered all of your questions. Please open a new issue if you have further problems.

@YasHGoyaL27: I copied your question into a new issue, since it is a more specific problem.

stefanhgm mentioned this issue Oct 23, 2023

Cannot reproduce performance on blood dataset with text template serialisations #15

Closed

stefanhgm closed this as completed Oct 23, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Totally Can NOT Re-Produce!!! #13

Totally Can NOT Re-Produce!!! #13

A11en0 commented Sep 26, 2023

A11en0 commented Sep 26, 2023 •

edited

Loading

stefanhgm commented Sep 27, 2023 •

edited

Loading

YasHGoyaL27 commented Oct 3, 2023 •

edited

Loading

stefanhgm commented Oct 23, 2023

Totally Can NOT Re-Produce!!! #13

Totally Can NOT Re-Produce!!! #13

Comments

A11en0 commented Sep 26, 2023

A11en0 commented Sep 26, 2023 • edited Loading

stefanhgm commented Sep 27, 2023 • edited Loading

YasHGoyaL27 commented Oct 3, 2023 • edited Loading

stefanhgm commented Oct 23, 2023

A11en0 commented Sep 26, 2023 •

edited

Loading

stefanhgm commented Sep 27, 2023 •

edited

Loading

YasHGoyaL27 commented Oct 3, 2023 •

edited

Loading