From 5b6088bfe610635d46877e2490687092c0276496 Mon Sep 17 00:00:00 2001 From: Erin Cochran Date: Thu, 12 Oct 2023 20:49:12 -0400 Subject: [PATCH] [docs] [external-assets] - Round 2 (#17177) ## Summary & Motivation This PR does another round of edits on the External Assets concept page. ## How I Tested These Changes --- .../concepts/assets/external-assets.mdx | 136 ++++++++++++------ 1 file changed, 91 insertions(+), 45 deletions(-) diff --git a/docs/content/concepts/assets/external-assets.mdx b/docs/content/concepts/assets/external-assets.mdx index 393f8643e6787..014e902055299 100644 --- a/docs/content/concepts/assets/external-assets.mdx +++ b/docs/content/concepts/assets/external-assets.mdx @@ -5,47 +5,42 @@ description: External assets model assets in Dagster that are not scheduled or m # External Assets (Experimental) -An **external asset** is an asset that is not materialized by Dagster, but is tracked in the asset graph and asset catalog. This allows you to model assets in Dagster, attach metadata and events to those assets, but without scheduling their materialization with Dagster. +An **external asset** is an asset that is visible in Dagster but executed by an external process. For example, you have a process that loads data from Kafka into Amazon S3 every day. You want the S3 asset to be visible alongside your other data assets, but not triggered by Dagster. -**External assets are a good fit when data is**: +In this case, you could use an external asset to leverage Dagster's event log and tooling without using the orchestrator. This allows you to maintain data lineage, observability, and data quality without unnecessary migrations. -- Landed by an external source (e.g. an external file landing daily; Kafka landing data into Amazon S3) -- Created and processed using manual processes -- Materialized by existing pipelines with their own scheduling and infrastructure that you do not want to or need to migrate en masse +### What about Source Assets? -**With an external asset, you can:** +[Source Assets](/concepts/assets/software-defined-assets#defining-external-asset-dependencies) can be used to model data that's produced by a process Dagster doesn't control, such as a daily file drop into Amazon S3. -- Attach metadata to its definition for documentation, tracking ownership, and so on -- Track its data quality and version in Dagster +External assets can accomplish this, and more. As a result, Source Assets will be replaced with external assets in the near future. + +--- + +## Uses and limitations + +Using external assets, you can: + +- Attach metadata to asset definitions for documentation, tracking ownership, and so on +- Track the assets' [data quality](/concepts/assets/asset-checks) and [version](/guides/dagster/asset-versioning-and-caching) in Dagster - Use [asset sensors](/concepts/partitions-schedules-sensors/asset-sensors) or auto-materialize policies to update downstream assets based on updates to external assets -**You cannot, however:** - -- Schedule an external asset's materialization -- Backfill an external asset using Dagster -- Use the [Dagster UI](/concepts/webserver/ui) or [GraphQL API](/concepts/webserver/graphql) to instigate ad hoc materializations - - - What about Source Assets? A common use case for external - assets is modeling data produced by a process not under Dagster's control. For - example, a daily file drop from a third party into Amazon S3. In most systems, - these are described as sources. This includes Dagster, which - includes . As - external assets are a superset of Source Asset functionality,{" "} - - source assets will be supplanted by external assets in the near future - - . - +### Limitations + +The following aren't currently supported when using external assets: + +- Scheduling the execution of an external asset +- Backfilling an external asset using Dagster +- Using the [Dagster UI](/concepts/webserver/ui) or [GraphQL API](/concepts/webserver/graphql) to instigate ad hoc executions --- ## Relevant APIs -| Name | Description | -| ------------------------------------------------ | ------------------------------------------------------------------------------------------- | -| | Create list of objects that represent external assets | -| | An object that represents the metadata of a particular asset | +| Name | Description | +| ------------------------------- | ---------------------------------------------------------------------------------------------- | +| `external_assets_from_specs` | Creates a list of objects that represent external assets | +| | An object that represents the metadata of a particular asset | --- @@ -71,6 +66,8 @@ defs = Definitions(assets=[external_asset_from_spec(AssetSpec("file_in_s3"))]) Click the **Asset definition** tab to view how this asset is defined. +Note that the **Materialize** button is disabled, as external assets can't be executed by Dagster. + The files_in_s3 external asset in the Asset Graph of the Dagster UI -### Fully-managed assets with external asset dependencies +### Dagster-native assets with external asset dependencies Fully-managed assets can depend on external assets. In this example, the `aggregated_logs` asset depends on `processed_logs`, which is an external asset: @@ -176,21 +175,44 @@ To keep your external assets updated, you can use any of the following approache - [A REST API](#using-the-rest-api) - [Sensors](#using-sensors) -- [Using the Python API](#using-the-python-api) -- [Logging events in ops](#logging-events-in-unrelated-ops) +- [A Python API](#using-the-python-api) +- [Logging events using ops](#logging-events-using-ops) ### Using the REST API -Dagster OSS exposes a REST endpoint for reporting asset materializations. Refer to the following tabs for examples using a `curl` command, and for invoking the API in Python. +Whether you're using Dagster OSS or Dagster Cloud, you can use a REST endpoint for reporting asset materializations. The API also has endpoints for reporting [asset observations](/concepts/assets/asset-observations) and [asset check evaluations](/concepts/assets/asset-checks). + +Refer to the following tabs for examples using `curl` and Python to communicate with the API. + +#### Using curl - + + +##### Dagster Cloud + +```bash +curl --request POST \ + --url https://{organization}.dagster.cloud/{deployment}/report_asset_materialization/{asset_key} \ + --header 'Content-Type: application/json' \ + --header 'Dagster-Cloud-Api-Token: {token}' \ + --data '{ + "metadata" : { + "source": "From curl command" + } +}' +``` + +--- -The following demonstrates how to use a `curl` command in a shell script to communicate with the API: + + + +##### Dagster OSS ```bash curl --request POST \ - --url https://path/to/instance/report_asset_materialization/{asset_key}\ + --url https://{dagster_webserver_host}/report_asset_materialization/{asset_key} \ --header 'Content-Type: application/json' \ --data '{ "metadata" : { @@ -199,26 +221,50 @@ curl --request POST \ }' ``` +--- + + + + +#### Using Python + + + + +##### Dagster Cloud + +```python +import requests + +url = f"https://{organization}.dagster.cloud/{deployment}/report_asset_materialization/{asset_key}" +payload = { "metadata": { "source": "From python script" } } +headers = { "Content-Type": "application/json", "Dagster-Cloud-Api-Token": "{token}" } + +response = requests.request("POST", url, json=payload, headers=headers) +``` + +--- + - + -The following demonstrates how to invoke the API in Python using the `requests` library: +##### Dagster OSS ```python import requests -url = f"https://path/to/instance/report_asset_materialization/{asset_key}" +url = f"https://{dagster_webserver_host}/report_asset_materialization/{asset_key}" payload = { "metadata": { "source": "From python script" } } headers = { "Content-Type": "application/json" } response = requests.request("POST", url, json=payload, headers=headers) ``` +--- + -The API also has endpoints for reporting [asset observations](/concepts/assets/asset-observations) and [asset check evaluations](/concepts/assets/asset-checks). - ### Using sensors By using the `asset_events` parameter of , you can generate events to attach to external assets and then provide them directly to sensors. For example: @@ -266,7 +312,7 @@ defs = Definitions( You can insert events to attach to external assets directly from Dagster's Python API. Specifically, the API is `report_runless_asset_event` on . -For example, this would be useful when writing a hand-rolled Python script to backfill metadata: +For example, this would be useful when writing a Python script to backfill metadata: ```python file=/concepts/assets/external_assets/external_asset_events_using_python_api.py startafter=start_python_api_marker endbefore=end_python_api_marker dedent=4 from dagster import AssetMaterialization @@ -279,9 +325,9 @@ instance.report_runless_asset_event( ) ``` -### Logging events in unrelated ops +### Logging events using ops -You can log an from a bare op. In this case, use the `log_event` method of to report an asset materialization of an external asset. For example: +You can log an from an [op](/concepts/ops-jobs-graphs/ops). In this case, use the `log_event` method of to report an asset materialization of an external asset. For example: ```python file=/concepts/assets/external_assets/update_external_asset_via_op.py from dagster import (