Browser management. #172

divol89 · 2024-10-02T11:38:04Z

The full potential of Agent-Zero would be possible if it were
feasible to integrate Google browser management in a visual
way, like asking to open YouTube and having it do so. Let's say
you want to automate interaction with a website instead of
creating a full bot, doing it with Agent Zero running
automatically by giving only the prompts? It should be able to
read or snapshot the website to understand what to do there?

Right now, it's not possible.

alexey2baranov · 2024-10-03T10:14:14Z

It would be great!
As I understand this requires extra tooling let's say Browser wich can navigate/analize/input webpage.
I have a limited experience with such systems. Playwright IMO is a good popular choice for managing browser in headless mode.

At every loop step such tool maight create a pdf screenshot and give it as input for a multimodal model to recognize.

divol89 · 2024-10-03T11:16:07Z

It would be great! As I understand this requires extra tooling let's say Browser wich can navigate/analize/input webpage. I have a limited experience with such systems. Playwright IMO is a good popular choice for managing browser in headless mode.

At every loop step such tool maight create a pdf screenshot and give it as input for a multimodal model to recognize.

yes exactly , imagine configuration for your agent-zero on a trading website platform running it on automatic with out strees insane .

TerminallyLazy · 2024-10-05T10:06:38Z

I have mentioned this in the discord a couple weeks or so ago-- basically some oss agent-q type functionality or possibly an integration of agent-q which could be called like a tool. It's on my todo list to work on. I've just been super busy. But I will work on something like this when I get a chance.

divol89 · 2024-10-05T12:11:39Z

I have mentioned this in the discord a couple weeks or so ago-- basically some oss agent-q type functionality or possibly an integration of agent-q which could be called like a tool. It's on my todo list to work on. I've just been super busy. But I will work on something like this when I get a chance.

great , will be waiting to collaborate if its possible .

MaximPro · 2024-10-20T02:25:29Z

please add this!

Hielkio · 2024-10-20T07:32:22Z

Tthis would be great! +1 👍

idealley · 2024-12-19T10:39:45Z

Can one imagine that Agent-Zero is piloting other frameworks, for example to use openinterpreter or anthropic computer use, to achieve this?

This would mean that the docker machine needs to be a full "computer" with the ability to launch a browser.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Browser management. #172

Browser management. #172

divol89 commented Oct 2, 2024

alexey2baranov commented Oct 3, 2024

divol89 commented Oct 3, 2024

TerminallyLazy commented Oct 5, 2024

divol89 commented Oct 5, 2024

MaximPro commented Oct 20, 2024

Hielkio commented Oct 20, 2024

idealley commented Dec 19, 2024

Browser management. #172

Browser management. #172

Comments

divol89 commented Oct 2, 2024

alexey2baranov commented Oct 3, 2024

divol89 commented Oct 3, 2024

TerminallyLazy commented Oct 5, 2024

divol89 commented Oct 5, 2024

MaximPro commented Oct 20, 2024

Hielkio commented Oct 20, 2024

idealley commented Dec 19, 2024