Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Need for resetting in run som scripts #60

Open
sanjari-orb opened this issue Aug 16, 2024 · 2 comments
Open

Need for resetting in run som scripts #60

sanjari-orb opened this issue Aug 16, 2024 · 2 comments

Comments

@sanjari-orb
Copy link

sanjari-orb commented Aug 16, 2024

Hi, I am looking at the scripts run_reddit_som.sh, run_shopping_som.sh, run_classifieds_som.sh. IIUC, they all involve creating batches of indices and the docker gets reset between each of these batches.

bash scripts/reset_reddit.sh

However, I think more than one example with require_reset: True can occur in every batch based on the raw config JSON files (eg: https://github.com/web-arena-x/visualwebarena/blob/main/config_files/vwa/test_classifieds.raw.json).

If that is the case, what is the point of resetting and how are we ensuring correctness of the run scripts?

@kohjingyu
Copy link
Collaborator

For Classifieds we implemented the per example reset, but there's no way at present to do it fast enough for shopping/reddit (about 2 mins for a full reset). Resetting the environment after every batch is kind of a compromise between resetting only at the end of each run (like WebArena recommends, which can sometimes lead to intermediate examples not working properly, e.g., if the cart is full by example 50) and resetting after every single example (which takes too long).

Hope that helps!

@sanjari-orb
Copy link
Author

I am not sure I follow, why are we resetting when require_reset=False instead of True? (

if instance_config.get("require_reset", False):
)

Is there some measure of how much variance in the benchmark can accumulate in this batched reset setting? Does the batch size you selected ensure that the all the examples will always work? (as opposed to not resetting at all)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants