gpt-4-1106-preview 有人测试过 test 的分数吗？ #68

theblackcat102 · 2023-11-21T11:01:46Z

个人在 val 上采用 5-shot prompting 得到的基本分数异常的低（比 turbo 还糟糕）

gpt-4-1106-preview	computer_network	15.78947
gpt-4-1106-preview	operating_system	10.52632
gpt-4-1106-preview	computer_architecture	0.00000
gpt-4-1106-preview	college_programming	32.43243
gpt-4-1106-preview	college_physics	36.84211
gpt-4-1106-preview	college_chemistry	0.00000

@HYZ17 你们有内部试过 gpt-4-turbo 的 test 表现吗？至少我这里用 val 跑 3.5-turbo 结果和 test 蛮相近的

The text was updated successfully, but these errors were encountered:

HYZ17 · 2023-11-22T10:31:45Z

我们进行了小部分的科目的zero-shot测试，发现gpt-4-turbo的输出格式很多样，且有时候会拒绝给出答案。这可能是准确率低的原因之一

houxiang676 · 2024-03-01T06:26:59Z

test没有答案怎么办啊

HYZ17 · 2024-03-01T12:58:43Z

在这个网站https://cevalbenchmark.com/ 提交获得分数

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

gpt-4-1106-preview 有人测试过 test 的分数吗？ #68

gpt-4-1106-preview 有人测试过 test 的分数吗？ #68

theblackcat102 commented Nov 21, 2023

HYZ17 commented Nov 22, 2023 •

edited

Loading

houxiang676 commented Mar 1, 2024

HYZ17 commented Mar 1, 2024

gpt-4-1106-preview 有人测试过 test 的分数吗？ #68

gpt-4-1106-preview 有人测试过 test 的分数吗？ #68

Comments

theblackcat102 commented Nov 21, 2023

HYZ17 commented Nov 22, 2023 • edited Loading

houxiang676 commented Mar 1, 2024

HYZ17 commented Mar 1, 2024

HYZ17 commented Nov 22, 2023 •

edited

Loading