Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bridge() solver for integrating with external agent frameworks #1181

Draft
wants to merge 38 commits into
base: main
Choose a base branch
from

Conversation

jjallaire
Copy link
Collaborator

This PR introduces the ability to integrate an external agent with no Inspect dependencies by converting
it to a Solver. The only requirements of the agent are that it use the standard OpenAI API and that it consume and produce dict values as described below. While the agent function calls the standard OpenAI API, these calls are intercepted by Inspect and sent to the requisite Inspect model provider.

Protocol

Here is the type contract for bridged solvers (you don't need to use or import these types in your agent, your dict usage should just conform to the protocol):

from openai.types.chat import ChatCompletionMessageParam

class SampleDict(TypedDict):
    model: str
    sample_id: str
    epoch: int
    messages: list[ChatCompletionMessageParam]
    metadata: dict[str, Any]
    target: list[str]

class ScoreDict(TypedDict):
    value: (
        str
        | int
        | float
        | bool
        | list[str | int | float | bool]
        | dict[str, str | int | float | bool | None]
    )
    answer: NotRequired[str]
    explanation: NotRequired[str]
    metadata: NotRequired[dict[str, Any]]

class ResultDict(TypedDict):
    output: str
    messages: NotRequired[list[ChatCompletionMessageParam]]
    scores: NotRequired[dict[str, ScoreDict]]

async def agent(sample: SampleDict) -> ResultDict: 
    ...

The agent function must be async, and should accept and return dict values as-per the type declarations (you aren't required to use these types exactly (they merely document the requirements) so long as you consume and produce dict values that match their declarations.

Returning messages is not required but is highly recommended so that people running the agent can see the full message history in the Inspect log viewer.

Returning scores is entirely optional (most agents will in fact rely on Inspect native scorers, this is here as an
escape hatch for agents that want to do their own scoring).

Example

Here is the simplest possible agent definition:

from openai import AsyncOpenAI

async def my_agent(sample: dict[str, Any]) -> dict[str, Any]:
    client = AsyncOpenAI()
    completion = await client.chat.completions.create(
        messages=sample["messages"],
        model=sample["model"]
    )
    return {
        "output": completion.choices[0].message.content
    }

Note that you should always pass the "model" along to OpenAI exactly as passed in the sample. While you are calling the standard OpenAI API, these calls are intercepted by Inspect and sent to the requisite Inspect model provider.

Here is how you can use the bridge() function to use this agent as a solver:

from inspect_ai import Task, task
from inspect_ai.dataset import Sample
from inspect_ai.scorer import includes
from inspect_ai.solver import bridge

from agents import my_agent

@task
def hello():
    return Task(
        dataset=[Sample(input="Please print the word 'hello'?", target="hello")],
        solver=bridge(my_agent),
        scorer=includes(),
    )

"""

Inspect Features

The bridge() function enables you to create an agent with zero Inspect dependencies and still get the benefit of most of Inspect's infrastructure.

Limits

Sample level token and time limits are are enforced as normal with bridged solvers. Message limits are enforced when calling the main model being evaluated (the number of messages sent to the model are counted and compared against the limit).

Observability

Agents incorporated using the bridge() function still get to take advantage of most of Inspect's core observability features --- all model calls still go through the Inspect model interface so appear in the transcript as normal. If you return messages in your result then the messages are also populated for the log viewer. Standard Python logger calls also continue to be routed into the Inspect sample log.

If you want to take advantage of additional observability features you can optionally import the Inspect transcript() function and use it as normal.

Sandboxes

If you need to execute arbitrary model generated code, you can use the Inspect sandbox() functions directly. If you need your agent to run both inside and outside of Inspect you can abstract code execution into an interface and only call sandbox() when running inside the Inspect agent wrapper.

@jjallaire jjallaire requested a review from dragonstyle January 23, 2025 17:09
@jjallaire jjallaire marked this pull request as draft January 23, 2025 17:09
@jjallaire jjallaire changed the title bridge() function for converting agents with no inspect dependencies into solvers bridge() solver for integrating with external agent frameworks Jan 23, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants