Skip to content

Commit

Permalink
Add docs for dagster-openai integration (#20013)
Browse files Browse the repository at this point in the history
## Summary & Motivation

This PR adds the docs for the `dagster-openai` integration added in PR
#19697

## How I Tested These Changes

BK
  • Loading branch information
maximearmstrong authored Mar 6, 2024
1 parent c82bfda commit 2f2b1ac
Show file tree
Hide file tree
Showing 17 changed files with 253 additions and 2 deletions.
4 changes: 4 additions & 0 deletions docs/content/_navigation.json
Original file line number Diff line number Diff line change
Expand Up @@ -836,6 +836,10 @@
}
]
},
{
"title": "OpenAI",
"path": "/integrations/openai"
},
{
"title": "Pandas",
"path": "/integrations/pandas"
Expand Down
Binary file modified docs/content/api/modules.json.gz
Binary file not shown.
Binary file modified docs/content/api/searchindex.json.gz
Binary file not shown.
Binary file modified docs/content/api/sections.json.gz
Binary file not shown.
5 changes: 5 additions & 0 deletions docs/content/integrations.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -54,6 +54,7 @@ Using our integration guides and libraries, you can extend Dagster to interopera
title="Google BigQuery"
href="/integrations/bigquery"
></ArticleListItem>
<ArticleListItem title="OpenAI" href="/integrations/openai"></ArticleListItem>
<ArticleListItem title="Pandas" href="/integrations/pandas"></ArticleListItem>
<ArticleListItem
title="Pandera"
Expand Down Expand Up @@ -181,6 +182,10 @@ Explore libraries that are maintained by the Dagster core team.
title="MySQL"
href="/_apidocs/libraries/dagster-mysql"
></ArticleListItem>
<ArticleListItem
title="OpenAI"
href="/_apidocs/libraries/dagster-openai"
></ArticleListItem>
<ArticleListItem
title="PagerDuty"
href="/_apidocs/libraries/dagster-pagerduty"
Expand Down
147 changes: 147 additions & 0 deletions docs/content/integrations/openai.mdx
Original file line number Diff line number Diff line change
@@ -0,0 +1,147 @@
---
title: "OpenAI + Dagster"
description: The `dagster-openai` library provides the ability to build OpenAI pipelines with Dagster and log OpenAI API usage metadata in Dagster Insights.
---

# OpenAI + Dagster (Experimental)

<Note>
This feature is considered <strong>experimental</strong>.
</Note>

The `dagster-openai` library allows you to build OpenAI pipelines with Dagster and log OpenAI API usage metadata in [Dagster Insights](/dagster-cloud/insights).

Using this library's <PyObject module="dagster_openai" object="OpenAIResource" />, you can easily interact with the [OpenAI REST API](https://platform.openai.com/docs/introduction) via the [OpenAI Python API](https://github.com/openai/openai-python).

When used with Dagster's [Software-defined Assets](/concepts/assets/software-defined-assets), the resource automatically logs OpenAI usage metadata in asset metadata. See the [Relevant APIs](#relevant-apis) section for more information.

---

## Getting started

Before you get started with the `dagster-openai` library, we recommend familiarizing yourself with the [OpenAI Python API library](https://github.com/openai/openai-python), which this integration uses to interact with the [OpenAI REST API](https://platform.openai.com/docs/introduction).

---

## Prerequisites

To get started, install the `dagster` and `dagster-openai` Python packages:

```bash
pip install dagster dagster-openai
```

Note that you will need an OpenAI [API key](https://platform.openai.com/api-keys) to use the resource, which can be generated in your OpenAI account.

---

## Connecting to OpenAI

The first step in using OpenAI with Dagster is to tell Dagster how to connect to an OpenAI client using an OpenAI [resource](/concepts/resources). This resource contains the credentials needed to interact with OpenAI API.

We will supply our credentials as environment variables by adding them to a `.env` file. For more information on setting environment variables in a production setting, see [Using environment variables and secrets](/guides/dagster/using-environment-variables-and-secrets).

```bash
# .env

OPENAI_API_KEY=...
```

Then, we can instruct Dagster to authorize the OpenAI resource using the environment variables:

```python startafter=start_example endbefore=end_example file=/integrations/openai/resource.py
from dagster_openai import OpenAIResource

from dagster import EnvVar

# Pull API key from environment variables
openai = OpenAIResource(
api_key=EnvVar("OPENAI_API_KEY"),
)
```

---

## Using the OpenAI resource with assets

The OpenAI resource can be used in assets in order to interact with the OpenAI API. Note that in this example, we supply our credentials as environment variables directly when instantiating the <PyObject object="Definitions" /> object.

```python startafter=start_example endbefore=end_example file=/integrations/openai/assets.py
from dagster_openai import OpenAIResource

from dagster import (
AssetExecutionContext,
Definitions,
EnvVar,
asset,
define_asset_job,
)


@asset(compute_kind="OpenAI")
def openai_asset(context: AssetExecutionContext, openai: OpenAIResource):
with openai.get_client(context) as client:
client.chat.completions.create(
model="gpt-3.5-turbo",
messages=[{"role": "user", "content": "Say this is a test."}],
)


openai_asset_job = define_asset_job(name="openai_asset_job", selection="openai_asset")

defs = Definitions(
assets=[openai_asset],
jobs=[openai_asset_job],
resources={
"openai": OpenAIResource(api_key=EnvVar("OPENAI_API_KEY")),
},
)
```

After materializing your asset, your OpenAI API usage metadata will be available in the **Events** and **Plots** tabs of your asset in the Dagster UI. If you are using [Dagster Cloud](/dagster-cloud), your usage metadata will also be available in [Dagster Insights](/dagster-cloud/insights). Refer to the [Viewing and materializing assets in the UI guide](https://docs.dagster.io/concepts/assets/software-defined-assets#viewing-and-materializing-assets-in-the-ui) for more information.

---

## Using the OpenAI resource with ops

The OpenAI resource can also be used in ops. **Note**: Currently, the OpenAI resource doesn't (out-of-the-box) log OpenAI usage metadata when used in ops.

```python startafter=start_example endbefore=end_example file=/integrations/openai/ops.py
from dagster_openai import OpenAIResource

from dagster import (
Definitions,
EnvVar,
GraphDefinition,
OpExecutionContext,
op,
)


@op
def openai_op(context: OpExecutionContext, openai: OpenAIResource):
with openai.get_client(context) as client:
client.chat.completions.create(
model="gpt-3.5-turbo",
messages=[{"role": "user", "content": "Say this is a test"}],
)


openai_op_job = GraphDefinition(name="openai_op_job", node_defs=[openai_op]).to_job()

defs = Definitions(
jobs=[openai_op_job],
resources={
"openai": OpenAIResource(api_key=EnvVar("OPENAI_API_KEY")),
},
)
```

---

## Relevant APIs

| Name | Description |
| ----------------------------------------------------------------- | ------------------------------------------------------------------------------------- |
| <PyObject module="dagster_openai" object="OpenAIResource" /> | The OpenAI resource used for handing the client |
| <PyObject module="dagster_openai" object="with_usage_metadata" /> | The function wrapper used on OpenAI API endpoint methods to log OpenAI usage metadata |
Binary file modified docs/next/public/objects.inv
Binary file not shown.
1 change: 1 addition & 0 deletions docs/sphinx/conf.py
Original file line number Diff line number Diff line change
Expand Up @@ -41,6 +41,7 @@
"../../python_modules/libraries/dagster-mlflow",
"../../python_modules/libraries/dagster-msteams",
"../../python_modules/libraries/dagster-mysql",
"../../python_modules/libraries/dagster-openai",
"../../python_modules/libraries/dagster-pagerduty",
"../../python_modules/libraries/dagster-pandas",
"../../python_modules/libraries/dagster-pandera",
Expand Down
1 change: 1 addition & 0 deletions docs/sphinx/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -59,6 +59,7 @@
sections/api/apidocs/libraries/dagster-mlflow
sections/api/apidocs/libraries/dagster-msteams
sections/api/apidocs/libraries/dagster-mysql
sections/api/apidocs/libraries/dagster-openai
sections/api/apidocs/libraries/dagster-pagerduty
sections/api/apidocs/libraries/dagster-pandas
sections/api/apidocs/libraries/dagster-pandera
Expand Down
14 changes: 14 additions & 0 deletions docs/sphinx/sections/api/apidocs/libraries/dagster-openai.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,14 @@
OpenAI (dagster-openai)
------------------------

The `dagster_openai` library provides utilities for using OpenAI with Dagster.
A good place to start with `dagster_openai` is `the guide </integrations/openai>`_.


.. currentmodule:: dagster_openai

.. autofunction:: with_usage_metadata

.. autoclass:: OpenAIResource
:members: get_client, get_client_for_asset

1 change: 1 addition & 0 deletions docs/tox.ini
Original file line number Diff line number Diff line change
Expand Up @@ -38,6 +38,7 @@ deps =
-e ../python_modules/libraries/dagster-deltalake
-e ../python_modules/libraries/dagster-deltalake-pandas
-e ../python_modules/libraries/dagster-deltalake-polars
-e ../python_modules/libraries/dagster-openai

commands =
make --directory=sphinx clean
Expand Down
Empty file.
31 changes: 31 additions & 0 deletions examples/docs_snippets/docs_snippets/integrations/openai/assets.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,31 @@
# start_example
from dagster_openai import OpenAIResource

from dagster import (
AssetExecutionContext,
Definitions,
EnvVar,
asset,
define_asset_job,
)


@asset(compute_kind="OpenAI")
def openai_asset(context: AssetExecutionContext, openai: OpenAIResource):
with openai.get_client(context) as client:
client.chat.completions.create(
model="gpt-3.5-turbo",
messages=[{"role": "user", "content": "Say this is a test."}],
)


openai_asset_job = define_asset_job(name="openai_asset_job", selection="openai_asset")

defs = Definitions(
assets=[openai_asset],
jobs=[openai_asset_job],
resources={
"openai": OpenAIResource(api_key=EnvVar("OPENAI_API_KEY")),
},
)
# end_example
30 changes: 30 additions & 0 deletions examples/docs_snippets/docs_snippets/integrations/openai/ops.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,30 @@
# start_example
from dagster_openai import OpenAIResource

from dagster import (
Definitions,
EnvVar,
GraphDefinition,
OpExecutionContext,
op,
)


@op
def openai_op(context: OpExecutionContext, openai: OpenAIResource):
with openai.get_client(context) as client:
client.chat.completions.create(
model="gpt-3.5-turbo",
messages=[{"role": "user", "content": "Say this is a test"}],
)


openai_op_job = GraphDefinition(name="openai_op_job", node_defs=[openai_op]).to_job()

defs = Definitions(
jobs=[openai_op_job],
resources={
"openai": OpenAIResource(api_key=EnvVar("OPENAI_API_KEY")),
},
)
# end_example
Original file line number Diff line number Diff line change
@@ -0,0 +1,10 @@
# start_example
from dagster_openai import OpenAIResource

from dagster import EnvVar

# Pull API key from environment variables
openai = OpenAIResource(
api_key=EnvVar("OPENAI_API_KEY"),
)
# end_example
3 changes: 3 additions & 0 deletions python_modules/libraries/dagster-openai/README.md
Original file line number Diff line number Diff line change
@@ -1 +1,4 @@
# dagster-openai

The docs for `dagster-openai` can be found
[here](https://docs.dagster.io/_apidocs/libraries/dagster-openai).
Original file line number Diff line number Diff line change
Expand Up @@ -12,7 +12,7 @@
InitResourceContext,
OpExecutionContext,
)
from dagster._annotations import experimental
from dagster._annotations import experimental, public
from dagster._core.errors import (
DagsterInvariantViolationError,
)
Expand Down Expand Up @@ -49,6 +49,7 @@ def _add_to_asset_metadata(
context.add_output_metadata(dict(counters), output_name)


@public
@experimental
def with_usage_metadata(context: AssetExecutionContext, output_name: Optional[str], func):
"""This wrapper can be used on any endpoint of the
Expand Down Expand Up @@ -141,6 +142,7 @@ def wrapper(*args, **kwargs):
return wrapper


@public
@experimental
class OpenAIResource(ConfigurableResource):
"""This resource is wrapper over the
Expand Down Expand Up @@ -212,6 +214,7 @@ def setup_for_execution(self, context: InitResourceContext) -> None:
# Set up an OpenAI client based on the API key.
self._client = Client(api_key=self.api_key)

@public
@contextmanager
def get_client(
self, context: Union[AssetExecutionContext, OpExecutionContext]
Expand Down Expand Up @@ -274,6 +277,7 @@ def openai_asset(context: AssetExecutionContext, openai: OpenAIResource):
"""
yield from self._get_client(context=context, asset_key=None)

@public
@contextmanager
def get_client_for_asset(
self, context: AssetExecutionContext, asset_key: AssetKey
Expand All @@ -288,7 +292,7 @@ def get_client_for_asset(
allowing to log the API usage metadata in the asset metadata.
This method can only be called when working with assets,
i.e. the provided ``context`` must be of type ``AssetExecutionContext.
i.e. the provided ``context`` must be of type ``AssetExecutionContext``.
:param context: The ``context`` object for computing the asset in which ``get_client`` is called.
:param asset_key: the ``asset_key`` of the asset for which a materialization should include the metadata.
Expand Down

1 comment on commit 2f2b1ac

@github-actions
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Deploy preview for dagster-docs ready!

✅ Preview
https://dagster-docs-gjxvuv1gx-elementl.vercel.app
https://master.dagster.dagster-docs.io

Built with commit 2f2b1ac.
This pull request is being automatically deployed with vercel-action

Please sign in to comment.