Skip to content

Commit

Permalink
Improve docs
Browse files Browse the repository at this point in the history
Signed-off-by: Igor Gitman <[email protected]>
  • Loading branch information
Kipok committed Dec 12, 2024
1 parent 66d2899 commit 58f3b90
Showing 1 changed file with 3 additions and 1 deletion.
4 changes: 3 additions & 1 deletion docs/pipelines/llm-as-a-judge.md
Original file line number Diff line number Diff line change
Expand Up @@ -26,8 +26,10 @@ ns generate \
++input_dir=/workspace/test-eval/eval-results/math
```

This will run the judge pipeline on all data inside `eval-results/math` folder and judge solutions from `output.jsonl` file.
This will run the judge pipeline on the data inside `eval-results/math` folder and judge solutions from `output.jsonl` file.
If you ran the benchmark with N samples (e.g. using `math:8`) and want to judge all of them, add `--num_random_seeds=8`.
Note that if you want to judge both greedy generations and samples, you'd need to run the command two times.

In this example we use gpt-4o from OpenAI, but you can use Llama-405B (that you can host on cluster yourself) or any
other models. If you have multiple benchmarks, you would need to run the command multiple times.
After the judge pipeline has finished, you can see the results by running
Expand Down

0 comments on commit 58f3b90

Please sign in to comment.