Skip to content

Commit

Permalink
Add new tasks and agents
Browse files Browse the repository at this point in the history
  • Loading branch information
dandansamax committed Oct 8, 2024
1 parent f582817 commit f71a975
Show file tree
Hide file tree
Showing 70 changed files with 1,501 additions and 147 deletions.
4 changes: 4 additions & 0 deletions crab-benchmark-v0/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -29,3 +29,7 @@ After setting up the environment, you can start the experiment. A brief overview
2. Start the CRAB server in the Ubuntu environment and get its IP address and port. Let's say they are `192.168.122.72` and `8000`.
3. Choose a task. As an example, we take the task with ID `a3476778-e512-40ca-b1c0-d7aab0c7f18b` from [handmade_tasks](./dataset/handmade_tasks.py). The task is: "Open the 'Tasks' app on Android, check the first incomplete task, then perform the task according to its description."
4. Run [main.py](./main.py) with the command `poetry run python -m crab-benchmark-v0.main --model gpt4o --policy single --remote-url http://192.168.122.72:8000 --task-id a3476778-e512-40ca-b1c0-d7aab0c7f18b`. In this command, `--model gpt4o` and `--policy single` determine the agent system, `--remote-url` specifies the Ubuntu environment interface, and `--task-id` indicates the task to be performed.

#### Model

For open source models, we use [VLLM](https://github.com/vllm-project/vllm) to host Pixtral model, check [here](https://docs.vllm.ai/en/latest/models/vlm.html#online-inference) for the setup commands; [SGLang](https://github.com/sgl-project/sglang) to host LLaVa-OneVision model, check [here](https://github.com/sgl-project/sglang?tab=readme-ov-file#supported-models) for the setup commands.
3 changes: 2 additions & 1 deletion crab-benchmark-v0/android_env.py
Original file line number Diff line number Diff line change
Expand Up @@ -14,6 +14,7 @@
from crab import EnvironmentConfig
from crab.actions.android_actions import (
key_press,
long_tap,
open_app_drawer,
screenshot,
setup,
Expand All @@ -24,7 +25,7 @@

ANDROID_ENV = EnvironmentConfig(
name="android",
action_space=[tap, key_press, write_text, swipe, open_app_drawer],
action_space=[tap, key_press, long_tap, write_text, swipe, open_app_drawer],
observation_space=[screenshot],
description="""A Google Pixel smartphone runs on the Android operating system. \
The interface displays a current screenshot at each step and primarily \
Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,23 @@
{
"description": "In the Android operating system, use the \"Google Map\" app to find the city name corresponding to the postal code \"63002\" in South Korea, then use the \"Calendar\" app to add a new all-day event for 1 January 2025 with the text of the found city name.",
"tasks": [
{
"task": "51b2463c-9904-4a32-81ba-507bfb89d61f",
"attribute": {
"number": "63002",
"country": "South Korea"
},
"output": "Jeju"
},
{
"task": "a3d11574-2acf-4b26-a569-a5dbc9d548ac",
"attribute": {
"content": "Jeju",
"date": "1 January 2025"
},
"output": null
}
],
"adjlist": "0 1\n1",
"id": "1005c437-50d1-465a-b3fc-833098b22bfc"
}
Original file line number Diff line number Diff line change
@@ -0,0 +1,22 @@
{
"description": "In Android, use the \"Google Map\" app to find the city name for the postal code \"2770885\" in Japan, and then, using the \"Keep Notes\" app, create a new note without a title to record the city name you found.",
"tasks": [
{
"task": "51b2463c-9904-4a32-81ba-507bfb89d61f",
"attribute": {
"number": "2770885",
"country": "Japan"
},
"output": "Chiba"
},
{
"task": "eb92a1e6-4c86-4d56-baac-95fc8397732e",
"attribute": {
"content": "Chiba"
},
"output": null
}
],
"adjlist": "0 1\n1",
"id": "12333aa0-e76d-4a5c-8657-9f897f62f62d"
}
Original file line number Diff line number Diff line change
@@ -0,0 +1,22 @@
{
"description": "In Android, using the \"Contacts\" app, find the email of the contact named John Lauphin, then using the \"Gmail\" app, send an email to that contact with the subject \"Hello John.\"",
"tasks": [
{
"task": "a3d11574-2acf-4b26-a569-a5dbc9d548ap",
"attribute": {
"name": "John Lauphin"
},
"output": "[email protected]"
},
{
"task": "0090f116-e02b-4562-a20d-b5df38be963a",
"attribute": {
"content": "Hello John",
"mail": "[email protected]"
},
"output": null
}
],
"adjlist": "0 1\n1",
"id": "2ade6a13-c7a6-4df7-8c62-77382687369e"
}
Original file line number Diff line number Diff line change
@@ -0,0 +1,15 @@
{
"description": "In Android, Using Google Map app, Find the city name of corresponding post code \"1010021\" in the country \"Japan\".",
"tasks": [
{
"task": "51b2463c-9904-4a32-81ba-507bfb89d61f",
"attribute": {
"country": "Japan",
"number": "101-0021"
},
"output": "Tokyo"
}
],
"adjlist": "0",
"id": "4190c90c-b28c-4bb3-ab5c-af3c4fde0a3d"
}
Original file line number Diff line number Diff line change
@@ -0,0 +1,21 @@
{
"description": "Open the calendar app in the Android system and find the title of an event on the date \"17 August 2024,\" then using the \"Google Drive\" app on the same Android device, create a new folder with the founded name",
"tasks": [
{
"task": "2394b768-2ca7-45e9-b41e-2aa4e9573192",
"attribute": {
"date": "17 August 2024"
},
"output": "Travel to Paris"
},
{
"task": "a3d11574-2acf-4b26-a569-a5dbc9d548ar",
"attribute": {
"content": "Travel to Paris"
},
"output": null
}
],
"adjlist": "0 1\n1",
"id": "483fbf9c-dc78-4ac2-9264-53c4f617f6cc"
}
Original file line number Diff line number Diff line change
@@ -0,0 +1,14 @@
{
"description": "In the Android system, use the calendar app to find the title of an event on the date \"16 July 2024,\".",
"tasks": [
{
"task": "2394b768-2ca7-45e9-b41e-2aa4e9573192",
"attribute": {
"date": "16 July 2024"
},
"output": "Japan"
}
],
"adjlist": "0",
"id": "4893a9b0-6477-495d-a73c-32503326e24a"
}
Original file line number Diff line number Diff line change
@@ -0,0 +1,22 @@
{
"description": "In the Android system, use the calendar app to find the title of an event on the date \"16 July 2024,\" then, using the Google Map app, find the city name of the corresponding post code \"113-8654\" in the country with same name as title.",
"tasks": [
{
"task": "2394b768-2ca7-45e9-b41e-2aa4e9573192",
"attribute": {
"date": "16 July 2024"
},
"output": "Japan"
},
{
"task": "51b2463c-9904-4a32-81ba-507bfb89d61f",
"attribute": {
"number": "113-8654",
"country": "Japan"
},
"output": null
}
],
"adjlist": "0 1\n1",
"id": "53010c40-dce4-4d72-a856-842c21059e2b"
}
Original file line number Diff line number Diff line change
@@ -0,0 +1,23 @@
{
"description": "Using the \"Google Map\" app on Android, find the distance of the shortest route from \"National University of Singapore\" to \"Nanyang Technology University,\" then using the \"Calendar\" app, add a new event with the text representing the found distance on the date 21 June 2024 as an all-day event.",
"tasks": [
{
"task": "1a1b72d7-78c9-4027-8278-86083ae01045",
"attribute": {
"place_name_1": "National University of Singapore",
"place_name_2": "Nanyang Technology University"
},
"output": "13km"
},
{
"task": "a3d11574-2acf-4b26-a569-a5dbc9d548ac",
"attribute": {
"content": "13km",
"date": "21 June 2024"
},
"output": null
}
],
"adjlist": "0 1\n1",
"id": "71ef7fd2-0ae3-49c8-8238-06b7aa985d25"
}
Original file line number Diff line number Diff line change
@@ -0,0 +1,22 @@
{
"description": "In Android, Using \"Google Map\" app, find the city name of corresponding post code \"560049\" in the country \"India\". Creat a folder with the city name in \"Google Drive \" app",
"tasks": [
{
"task": "51b2463c-9904-4a32-81ba-507bfb89d61f",
"attribute": {
"country": "India",
"number": "560049"
},
"output": "Bengaluru"
},
{
"task": "a3d11574-2acf-4b26-a569-a5dbc9d548ar",
"attribute": {
"content": "Bengaluru"
},
"output": null
}
],
"adjlist": "0 1\n1",
"id": "7891ceab-7965-4ddb-a0fc-15740c9a4e44"
}
Original file line number Diff line number Diff line change
@@ -0,0 +1,22 @@
{
"description": "In Android, use the \"Google Map\" app to find the address of the University of Sydney, then using the \"Gmail\" app, send a message to [email protected] with the found address.",
"tasks": [
{
"task": "a3d11574-2acf-4b26-a569-a5dbc9d548aw",
"attribute": {
"content": "The University of Sydney"
},
"output": "Camperdown NSW 2050 Australia"
},
{
"task": "0090f116-e02b-4562-a20d-b5df38be963a",
"attribute": {
"content": "Camperdown NSW 2050 Australia",
"mail": "[email protected]"
},
"output": null
}
],
"adjlist": "0 1\n1",
"id": "8bd51440-f959-4edc-baa5-cd03d32a5b0f"
}
Original file line number Diff line number Diff line change
@@ -0,0 +1,22 @@
{
"description": "In an Android system, use the calendar app to find the title of an event on the date \"9 August 2024\", and then, using the Gmail app, send an email to [email protected] with the event title as message.",
"tasks": [
{
"task": "2394b768-2ca7-45e9-b41e-2aa4e9573192",
"attribute": {
"date": "9 August 2024"
},
"output": "National Day of Singapore would be a public holiday"
},
{
"task": "0090f116-e02b-4562-a20d-b5df38be963a",
"attribute": {
"content": "National Day of Singapore would be a public holiday",
"mail": "[email protected]"
},
"output": null
}
],
"adjlist": "0 1\n1",
"id": "94b1836b-3111-40ad-8d07-b8a57efe7438"
}
Original file line number Diff line number Diff line change
@@ -0,0 +1,22 @@
{
"description": "In Android, Using \"Google Map\" app, Find the address of \"University of Oxford\" and send \"98801234\" the address using \"message\" App. ",
"tasks": [
{
"task": "a3d11574-2acf-4b26-a569-a5dbc9d548aw",
"attribute": {
"content": "University of Oxford"
},
"output": "Wellington Square, Oxford OX1 2JD, United Kingdom"
},
{
"task": "caa29623-1811-402d-963a-19f7eecc63d8",
"attribute": {
"content": "Wellington Square, Oxford OX1 2JD, United Kingdom",
"number": "98801234"
},
"output": null
}
],
"adjlist": "0 1\n1",
"id": "a225f7f8-6d03-4619-b57d-7a08610030d8"
}
Original file line number Diff line number Diff line change
@@ -0,0 +1,22 @@
{
"description": "In Android, Using \"Google Map\" app, Find the address of \"University of Oxford\" and send \"[email protected]\" the address using \"Gmail\" App. ",
"tasks": [
{
"task": "a3d11574-2acf-4b26-a569-a5dbc9d548aw",
"attribute": {
"content": "University of Oxford"
},
"output": "Wellington Square, Oxford OX1 2JD, United Kingdom"
},
{
"task": "0090f116-e02b-4562-a20d-b5df38be963a",
"attribute": {
"content": "Wellington Square, Oxford OX1 2JD, United Kingdom",
"mail": "[email protected]"
},
"output": null
}
],
"adjlist": "0 1\n1",
"id": "b3965b07-4683-4445-9de1-a1dedf6c73ad"
}
Original file line number Diff line number Diff line change
@@ -0,0 +1,22 @@
{
"description": "In Android, use the \"Google Map\" app to find the city name corresponding to the postcode \"110151\" in Colombia, then use the \"Clock\" app to set the time of that city in the clock and check the time gap between that city and your current city.",
"tasks": [
{
"task": "51b2463c-9904-4a32-81ba-507bfb89d61f",
"attribute": {
"number": "110151",
"country": "Columbia"
},
"output": "Bogota"
},
{
"task": "a3d11574-2acf-4b26-a569-a5dbc9d548ah",
"attribute": {
"place_name": "Bogota"
},
"output": "-5h"
}
],
"adjlist": "0 1\n1",
"id": "cf4c496b-fbbd-4701-91ea-4590fe6a66e1"
}
Original file line number Diff line number Diff line change
@@ -0,0 +1,22 @@
{
"description": "In Android, first use the \"Files\" app to find the creation date of the file /Movies/movie_list.txt, then use the \"Calendar\" app to add a new event titled \"Public Talking\" scheduled for all day on the founded day.",
"tasks": [
{
"task": "a3d11574-2acf-4b26-a569-a5dbc9d548ak",
"attribute": {
"file_path": "/Movies/movie_list.txt"
},
"output": "4 June 2024"
},
{
"task": "a3d11574-2acf-4b26-a569-a5dbc9d548ac",
"attribute": {
"content": "Public Talking",
"date": "4 June 2024"
},
"output": null
}
],
"adjlist": "0 1\n1",
"id": "d0811e47-d75f-40ce-b34b-e1ee3c8bed3f"
}
Original file line number Diff line number Diff line change
@@ -0,0 +1,21 @@
{
"description": "In Android, open the \"Contacts\" app to find the email address of the contact named Karoon Wei, then use the \"Tasks\" app to add a new task with the email address.",
"tasks": [
{
"task": "a3d11574-2acf-4b26-a569-a5dbc9d548ap",
"attribute": {
"name": "Karoon Wei"
},
"output": "[email protected]"
},
{
"task": "a3d11574-2acf-4b26-a569-a5dbc9d548af",
"attribute": {
"content": "[email protected]"
},
"output": null
}
],
"adjlist": "0 1\n1",
"id": "d7489d00-0046-4fb1-af5b-1fde7d87312c"
}
Loading

0 comments on commit f71a975

Please sign in to comment.