Replit Code: twitter results vs code-eval results #6
Unanswered
matthiasgeihs
asked this question in
Q&A
Replies: 1 comment
-
Could be a few things, but not sure which of these would be the issue. Many of the other base models follow similar configuration and the scores are very close to what was published (xgen, mpt, starcoder, etc):
|
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
I know that this repository tries to reproduce results as best as possible, and that this is not easy.
As I am specifically interested in the Replit Code model: Do we have any info/guesses on where the difference of almost 5 percentage points between claimed performance on twitter (21.9%) vs evaluated performance here (17.1%) might come from?
Beta Was this translation helpful? Give feedback.
All reactions