Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

gpt-4-1106-preview 有人测试过 test 的分数吗? #68

Open
theblackcat102 opened this issue Nov 21, 2023 · 3 comments
Open

gpt-4-1106-preview 有人测试过 test 的分数吗? #68

theblackcat102 opened this issue Nov 21, 2023 · 3 comments

Comments

@theblackcat102
Copy link

个人在 val 上采用 5-shot prompting 得到的基本分数异常的低(比 turbo 还糟糕)

gpt-4-1106-preview	computer_network	15.78947
gpt-4-1106-preview	operating_system	10.52632
gpt-4-1106-preview	computer_architecture	0.00000
gpt-4-1106-preview	college_programming	32.43243
gpt-4-1106-preview	college_physics	36.84211
gpt-4-1106-preview	college_chemistry	0.00000

@HYZ17 你们有内部试过 gpt-4-turbo 的 test 表现吗?至少我这里用 val 跑 3.5-turbo 结果和 test 蛮相近的

@HYZ17
Copy link
Collaborator

HYZ17 commented Nov 22, 2023

我们进行了小部分的科目的zero-shot测试,发现gpt-4-turbo的输出格式很多样,且有时候会拒绝给出答案。这可能是准确率低的原因之一

@houxiang676
Copy link

test没有答案怎么办啊

@HYZ17
Copy link
Collaborator

HYZ17 commented Mar 1, 2024

在这个网站https://cevalbenchmark.com/ 提交获得分数

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants