Agents and Planning Models Evaluation 🤖⛓

Toolkit for collecting datasets for Agents and Planning models and running evaluation pipelines.

SetUp

pip install requirements.txt

Evaluation Pipeline Configuration

We use Hydra library for evaluation pipeline. Each configuration is specified in eval.yaml format:

# @package _global_
hydra:
  job:
    name: ${agent.name}_${agent.model_name}_[YOUR_ADDITIONAL_TOKEN_OR_NOTHING]
  run:
    dir:[YOUR_PATH_TO_OUTPUT_DIR]/${hydra:job.name}
  job_logging:
    root:
      handlers: [console, file]
defaults:
  - _self_
  - data_source: hf
  - env: code_engine
  - agent: planning

Where you can define the datasource, env and agent you want to evaluate. We present several implementations for each defined in sub yamls:\

field	options
`data_source`	hf.yaml
`env`	code_engine.yaml http.yaml few_shot.yaml
`agent`	few_shot.yaml planning.yaml vanilla.yaml reflexion.yaml tree_of_thoughts.yaml adapt.yaml

Project Template Generation Evaluation

The challenge is to generate project template -- small compilable project that can be described in 1-5 sentences containing small examples of all mentioned libraries/technologies/functionality.

Dataset

Dataset of template-related repos collected GitHub are published to HuggingFace 🤗. Details about the dataset collection and source code is placed in template_generation directory.

Agent Models

To run the evaluation pipeline, please execute the following command in your console:

python3 -m src/template_generation/run_eval --multirun agent=planning agent.model_name=gpt-3.5-turbo-1106,gpt-4-1106-preview

Model	Metrics
⚠️ Coming soon ⚠️	⚠️ Coming soon ⚠️

Name		Name	Last commit message	Last commit date
Latest commit History 102 Commits
configs/template_generation		configs/template_generation
data		data
src		src
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Agents and Planning Models Evaluation 🤖⛓

SetUp

Evaluation Pipeline Configuration

Project Template Generation Evaluation

Dataset

Agent Models

About

Releases

Packages

Languages

License

JetBrains-Research/agents-eval

Folders and files

Latest commit

History

Repository files navigation

Agents and Planning Models Evaluation 🤖⛓

SetUp

Evaluation Pipeline Configuration

Project Template Generation Evaluation

Dataset

Agent Models

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages