Replit Code: twitter results vs code-eval results #6

matthiasgeihs · 2023-07-12T10:21:02Z

matthiasgeihs
Jul 12, 2023

I know that this repository tries to reproduce results as best as possible, and that this is not easy.

As I am specifically interested in the Replit Code model: Do we have any info/guesses on where the difference of almost 5 percentage points between claimed performance on twitter (21.9%) vs evaluated performance here (17.1%) might come from?

abacaj · 2023-07-12T19:17:17Z

abacaj
Jul 12, 2023
Maintainer

Could be a few things, but not sure which of these would be the issue. Many of the other base models follow similar configuration and the scores are very close to what was published (xgen, mpt, starcoder, etc):

sampling temperature
prompt
post processing of output

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Replit Code: twitter results vs code-eval results #6

{{title}}

Replies: 1 comment

{{title}}

Select a reply

Replit Code: twitter results vs code-eval results #6

matthiasgeihs Jul 12, 2023

Replies: 1 comment

abacaj Jul 12, 2023 Maintainer

matthiasgeihs
Jul 12, 2023

abacaj
Jul 12, 2023
Maintainer