-
-
Notifications
You must be signed in to change notification settings - Fork 6.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Bench config #1126
Bench config #1126
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
❌ Changes requested.
- Reviewed the entire pull request up to f4aa2d1
- Looked at
542
lines of code in12
files - Took 1 minute and 48 seconds to review
More info
- Skipped
0
files when reviewing. - Skipped posting
5
additional comments because they didn't meet confidence threshold of50%
.
1. /gpt_engineer/benchmark/__main__.py:112
:
- Assessed confidence :
80%
- Grade:
0%
- Comment:
Theget_benchmark
function is now expected to receive aconfig
parameter, but theget_benchmark
function in thebenchmarks.load
module does not seem to handle the case where this parameter is not provided. This could lead to errors if the function is called without aconfig
parameter elsewhere in the codebase. Consider providing a default value for theconfig
parameter in theget_benchmark
function definition. - Reasoning:
The PR introduces a new configuration system for benchmarks, which allows users to specify which benchmarks to run and their parameters. The changes seem to be well implemented, with the new configuration system being integrated into the existing benchmarking system. However, there are a few potential issues that need to be addressed.
2. /gpt_engineer/benchmark/benchmarks/apps/load.py:78
:
- Assessed confidence :
80%
- Grade:
0%
- Comment:
Theload_apps
function now expects aconfig
parameter, but the function does not seem to handle the case where this parameter is not provided. This could lead to errors if the function is called without aconfig
parameter elsewhere in the codebase. Consider providing a default value for theconfig
parameter in theload_apps
function definition. - Reasoning:
The PR introduces a new configuration system for benchmarks, which allows users to specify which benchmarks to run and their parameters. The changes seem to be well implemented, with the new configuration system being integrated into the existing benchmarking system. However, there are a few potential issues that need to be addressed.
3. /gpt_engineer/benchmark/benchmarks/gpteng/load.py:206
:
- Assessed confidence :
80%
- Grade:
30%
- Comment:
Theload_gpteng
function now expects aconfig
parameter, but the function does not seem to handle the case where this parameter is not provided. This could lead to errors if the function is called without aconfig
parameter elsewhere in the codebase. Consider providing a default value for theconfig
parameter in theload_gpteng
function definition. - Reasoning:
The PR introduces a new configuration system for benchmarks, which allows users to specify which benchmarks to run and their parameters. The changes seem to be well implemented, with the new configuration system being integrated into the existing benchmarking system. However, there are a few potential issues that need to be addressed.
4. /gpt_engineer/benchmark/benchmarks/gptme/load.py:19
:
- Assessed confidence :
80%
- Grade:
0%
- Comment:
Theload_gptme
function now expects aconfig
parameter, but the function does not seem to handle the case where this parameter is not provided. This could lead to errors if the function is called without aconfig
parameter elsewhere in the codebase. Consider providing a default value for theconfig
parameter in theload_gptme
function definition. - Reasoning:
The PR introduces a new configuration system for benchmarks, which allows users to specify which benchmarks to run and their parameters. The changes seem to be well implemented, with the new configuration system being integrated into the existing benchmarking system. However, there are a few potential issues that need to be addressed.
5. /gpt_engineer/benchmark/benchmarks/mbpp/load.py:65
:
- Assessed confidence :
80%
- Grade:
0%
- Comment:
Theload_mbpp
function now expects aconfig
parameter, but the function does not seem to handle the case where this parameter is not provided. This could lead to errors if the function is called without aconfig
parameter elsewhere in the codebase. Consider providing a default value for theconfig
parameter in theload_mbpp
function definition. - Reasoning:
The PR introduces a new configuration system for benchmarks, which allows users to specify which benchmarks to run and their parameters. The changes seem to be well implemented, with the new configuration system being integrated into the existing benchmarking system. However, there are a few potential issues that need to be addressed.
Workflow ID: wflow_E5aNaWWb4Zd6TX0y
Want Ellipsis to fix these issues? Tag @ellipsis-dev
in a comment. We'll respond in a few minutes. Learn more here.
⌛ You have 5 hours remaining in your free trial. Upgrade at https://app.ellipsis.dev for $20/developer/month. If you have any questions, reach us at [email protected]
gpteng: GptengConfig = field(default_factory=GptengConfig) | ||
|
||
@classmethod | ||
def from_toml(cls, config_file: Path | str): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The from_toml
method in the BenchConfig
class does not seem to handle the case where the provided config_file
does not exist or is not a valid TOML file. This could lead to unhandled exceptions if the method is called with an invalid file path. Consider adding error handling to this method to provide a more informative error message in such cases.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍 Looks good to me!
- Performed an incremental review on dca2dc0
- Looked at
11
lines of code in1
files - Took 1 minute and 1 seconds to review
More info
- Skipped
0
files when reviewing. - Skipped posting
1
additional comments because they didn't meet confidence threshold of50%
.
1. pyproject.toml:103
:
- Assessed confidence :
0%
- Comment:
Good addition of a newline at the end of the file. This adheres to POSIX standards and ensures the file can be easily concatenated with other files. - Reasoning:
The change in the PR is adding a newline at the end of the file. This is a good practice as it ensures that the file can be easily concatenated with other files. It also adheres to the POSIX standards.
Workflow ID: wflow_I5JIqbvh7ESkjAWV
Not what you expected? You can customize the content of the reviews using rules. Learn more here.
⌛ You have 5 hours remaining in your free trial. Upgrade at https://app.ellipsis.dev for $20/developer/month. If you have any questions, reach us at [email protected]
Added a .toml for configuration of the benchmarks for increased customization and better UX. Style leans heavily on project_config.py @ErikBjare.
CHANGES COMMAND LINE INTERFACE TO THE BENCH APPLICATION.
Would be great if we can get this integrated relatively soon, since I think future work should use this config #913
Opinions? @ErikBjare @azrv @Mohit-Dhawan98 @AntonOsika
Summary:
This PR introduces a new configuration system for the benchmarking tool, allowing for greater customization of the benchmarks to run, and includes minor changes to
pyproject.toml
.Key points:
BenchConfig
class inbench_config.py
for handling benchmark configurations.__main__.py
in the benchmark directory to use the new configuration system.task_name
argument from therun
function inrun.py
.default_bench_config.toml
.test_BenchConfig.py
to test the new configuration system.pyproject.toml
.Generated with ❤️ by ellipsis.dev