Skip to content

Commit

Permalink
Add docs for insights
Browse files Browse the repository at this point in the history
  • Loading branch information
shalabhc committed Oct 5, 2023
1 parent 66c675a commit 8a85c7e
Show file tree
Hide file tree
Showing 3 changed files with 105 additions and 0 deletions.
105 changes: 105 additions & 0 deletions docs/content/dagster-cloud/insights.mdx
Original file line number Diff line number Diff line change
@@ -0,0 +1,105 @@
---
title: Dagster Cloud Insights
description: "Visibility into historical usage and cost metrics."

platform_type: "cloud"
---

# Dagster Cloud Insights

<Note>
This feature is considered <strong>experimental</strong>. To get access to
Insights please request access in the [#dagster-insights](https://dagster.slack.com/archives/C05V7GETFSQ) channel in Slack.
</Note>

Insights is Dagster Cloud feature that provides visibility into historical usage and cost metrics such as run duration, credit usage and failures. This feature is available as a top level tab in the Dagster Cloud UI:

<Image
alt="Viewing the Insights tab in the Dagster UI"
src="/images/dagster-cloud/insights/insights-tab.png"
width={771}
height={536}
/>

The Insights page shows a list of metrics in the left panel. For each metric the daily, weekly or monthly aggregated values are shown in a graph in the main panel. As of October 2023 the metrics are update once a day.

## External metrics

External metrics such as Snowflake credits spent can be integrated in the Dagster Insights UI. The [`dagster-cloud`](https://pypi.org/project/dagster-cloud/) package contains utilities for capturing and submitting external metrics about data operations to Dagster Cloud via an API.

### How to enable Snowflake and dbt with Insights

If you use dbt to materialize tables in Snowflake, you can use these instructions to integrate Snowflake metrics into the Insights UI.

#### Step 1 - Instrument your dbt asset definition

You need `dagster-cloud` version 1.5.1 or newer. Instrument the dagster `@dbt_assets` function with `dbt_with_snowflake_insights`.

This passes through all the underlying events and in addition emits an `AssetObservation` for each materialization. This observation contains the dbt invocation id and unique id that get recorded in the Dagster event log.

```python
from dagster_cloud.dagster_insights import dbt_with_snowflake_insights
@dbt_assets(...)
def my_asset(context: AssetExecutionContext):
# Typically you have a `yield from dbt_resource.cli(...)`.
# Wrap the original call with `dbt_with_snowflake_insights` as below.
dbt_cli_invocation = dbt_resource.cli(["build"], context=context)
yield from dbt_with_snowflake_insights(context, dbt_cli_invocation)
```

#### Step 2 - Update your dbt_project.yml

Add the following to your `dbt_project.yml`:

```yaml
query-comment:
comment: "snowflake_dagster_dbt_v1_opaque_id[[[{{ node.unique_id }}:{{ invocation_id }}]]]"
append: true
```
This adds a comment to each query recorded in the `query_history` table in Snowflake. The comment contains the dbt unique id and invocation id. Here `append: true` is important since Snowflake strips leading comments.

#### Step 3 - Create a metrics ingestion pipeline

Create a Dagster pipeline that joins asset observation events with the Snowflake query history and calls the Dagster Cloud ingestion API. This needs a Snowflake resource that can query `query_history`. You can use a pre-defined pipeline as below:

```python
from datetime import date
from dagster_snowflake import SnowflakeResource
from dagster import Definition, EnvVar
from dagster_cloud.dagster_insights import (
create_snowflake_insights_asset_and_schedule,
)
snowflake_insights_definitions = create_snowflake_insights_asset_and_schedule(
date(2023, 10, 5),
allow_partial_partitions=True,
dry_run=False,
snowflake_resource_key="snowflake_insights",
)
defs = Definitions(
assets=[..., *snowflake_insights_definitions.assets],
schedules=[..., snowflake_insights_deifnitions.schedule],
resources={
...,
"snowflake_insights": SnowflakeResource(
account=EnvVar("SNOWFLAKE_PURINA_ACCOUNT"),
user=EnvVar("SNOWFLAKE_PURINA_USER"),
password=EnvVar("SNOWFLAKE_PURINA_PASSWORD"),
),
}
)
```

The `snowflake_resource_key` is a SnowflakeResource that has access to the `query_history` table. Once the pipeline runs, Snowflake credits should be visible in the Insights tab:

<Image
alt="Snowflake credtis in the Dagster UI"
src="/images/dagster-cloud/insights/insights-snowflake.png"
width={383}
height={349}
/>

---
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.

0 comments on commit 8a85c7e

Please sign in to comment.