-
Notifications
You must be signed in to change notification settings - Fork 165
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add Headful browsing to Agent Web Tooling #1080
base: main
Are you sure you want to change the base?
Add Headful browsing to Agent Web Tooling #1080
Conversation
Thank you! This looks like an excellent addition. Some questions:
Take these above suggestions as really me probing at what the workflow will be for end users. I'm open to anything that makes things as transparent and straightforward as possible. cc'ing @epatey as well here who is working on implementing a full desktop computer tool (as discussed w/ @jmsdao back in August/September). @epatey The Harmony intelligence folks have also been working on desktop computer tools (although I believe possibly using GCP rather than in a container?). It would be good to compare notes on how we are setting up and running VNC so its consistent across our images (@cmathw our current work is based on the Anthropic example so definitely fungible if there is a better way!). |
Thank you for the comments and feedback on this PR @jjallaire! With regards to each point:
Note: After pushing the new image to DockerHub, we could also update the browser example's compose.yaml back to:
|
This PR contains:
What is the current behavior? (You can also link to an open issue here)
The web_browser tooling only makes use of headless browser, it is not possible to view this browser graphically in real-time.
What is the new behavior?
This PR implements headful browsing, allowing users to view an agent's interactions with the browser in real time via a VNC viewer.
Tests have been added for this new behaviour in
test_playwright_crawler
. These are essentially just parameterising playwright's headless flag to test bothTrue
andFalse
, instead of justTrue
originally.Does this PR introduce a breaking change? (What changes might users need to make in their application due to this PR?)
This PR should be accompanied by an update to the image (aisiuk/inspect-web-browser-tool) hosted at Dockerhub for web_browser tooling. This PR's implementation defaults to headless mode. For users making use of web_browser tooling who are not aware of this PR, there should be no downstream impacts.
Other information:
This PR's implementation makes use of Docker targets, the user selects a build target in their task's docker-compose file. For example, in the
examples/browser/compose.yaml
file, the user can choose headful browsing with:If a target is not specified, or it is specified as
headless
the Dockerfile will build the headless image. In this instance, it is not necessary to specify the 127.0.0.1:5900:5900 port mapping either (this is only needed for VNC viewing).This PR achieves this broadly by:
True
(i.e. the "HEADLESS" environment variable).Note: When running the test file, a chromium browser will open/close for the headful tests. If necessary I'm open to adding decorators to these tests so they are only run when specified. These tests are not located in the
tests/
dir though and won't run callingmake tests
.