Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(eval): SWE-Bench stability improvement and add utils #6177

Closed
wants to merge 6 commits into from

Conversation

xingyaoww
Copy link
Collaborator

@xingyaoww xingyaoww commented Jan 9, 2025

End-user friendly description of the problem this fixes or functionality that this introduces

  • Include this change in the Release Notes. If checked, you must provide an end-user friendly description for your change below

Give a summary of what the PR does, explaining any non-trivial design decisions

  • move SWE-Bench eval_infer.py runtime creation inside try-except, so we can properly cleanup in case of error
  • add combine completion util scripts useful for SWE-Bench evaluation/SWE-Gym rollout
  • improve memory efficiency for SWE-Bench update output script to support loading larger files

Link of any specific issues this addresses


To run this PR locally, use the following command:

docker run -it --rm   -p 3000:3000   -v /var/run/docker.sock:/var/run/docker.sock   --add-host host.docker.internal:host-gateway   -e SANDBOX_RUNTIME_CONTAINER_IMAGE=docker.all-hands.dev/all-hands-ai/runtime:ffac44a-nikolaik   --name openhands-app-ffac44a   docker.all-hands.dev/all-hands-ai/openhands:ffac44a

@xingyaoww xingyaoww requested review from enyst and neubig January 9, 2025 19:48
@xingyaoww xingyaoww marked this pull request as ready for review January 9, 2025 19:52
Copy link
Contributor

@neubig neubig left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Basically LGTM

evaluation/benchmarks/swe_bench/run_infer.py Outdated Show resolved Hide resolved
@xingyaoww
Copy link
Collaborator Author

@neubig looks like most of these changes are already merged into main in another PR 😅 -- just leaving one line bug fix

@xingyaoww xingyaoww enabled auto-merge (squash) January 17, 2025 17:46
@xingyaoww xingyaoww disabled auto-merge January 17, 2025 21:06
@xingyaoww
Copy link
Collaborator Author

We merged changes from other PRs 😓 Let's close this one!

image

@xingyaoww xingyaoww closed this Jan 17, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants