From 797f120fd4d13675e6094ccfbe61bdb492517649 Mon Sep 17 00:00:00 2001 From: Erin Cochran Date: Thu, 12 Oct 2023 11:59:34 -0400 Subject: [PATCH 1/5] Trigger a build --- docs/content/guides/dagster-pipes.mdx | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/content/guides/dagster-pipes.mdx b/docs/content/guides/dagster-pipes.mdx index bacc8aabe489b..209ac1ea78504 100644 --- a/docs/content/guides/dagster-pipes.mdx +++ b/docs/content/guides/dagster-pipes.mdx @@ -62,7 +62,7 @@ height={393} The process starts and loads the context info provided by Dagster. While the process runs, execution data, logs, and any specified metadata are streamed back to Dagster. -After Dagster receives the data from the external process, it’ll be visible in the [Dagster UI](/concepts/webserver/ui). +After Dagster receives the data from the external process, it’ll be visible in the [Dagster UI](/concepts/webserver/ui). --- From e924d0b831e8b14d7e619ede7916e192ea3578e3 Mon Sep 17 00:00:00 2001 From: Erin Cochran Date: Thu, 12 Oct 2023 14:06:12 -0400 Subject: [PATCH 2/5] Revert "Trigger a build" This reverts commit 797f120fd4d13675e6094ccfbe61bdb492517649. --- docs/content/guides/dagster-pipes.mdx | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/content/guides/dagster-pipes.mdx b/docs/content/guides/dagster-pipes.mdx index 209ac1ea78504..bacc8aabe489b 100644 --- a/docs/content/guides/dagster-pipes.mdx +++ b/docs/content/guides/dagster-pipes.mdx @@ -62,7 +62,7 @@ height={393} The process starts and loads the context info provided by Dagster. While the process runs, execution data, logs, and any specified metadata are streamed back to Dagster. -After Dagster receives the data from the external process, it’ll be visible in the [Dagster UI](/concepts/webserver/ui). +After Dagster receives the data from the external process, it’ll be visible in the [Dagster UI](/concepts/webserver/ui). --- From bd6fb656235f333bffaffa1055134bc97a3f6aff Mon Sep 17 00:00:00 2001 From: Erin Cochran Date: Thu, 12 Oct 2023 15:43:34 -0400 Subject: [PATCH 3/5] Review comments --- .../concepts/assets/external-assets.mdx | 105 +++++++++++------- 1 file changed, 67 insertions(+), 38 deletions(-) diff --git a/docs/content/concepts/assets/external-assets.mdx b/docs/content/concepts/assets/external-assets.mdx index 393f8643e6787..53c197598df80 100644 --- a/docs/content/concepts/assets/external-assets.mdx +++ b/docs/content/concepts/assets/external-assets.mdx @@ -7,45 +7,44 @@ description: External assets model assets in Dagster that are not scheduled or m An **external asset** is an asset that is not materialized by Dagster, but is tracked in the asset graph and asset catalog. This allows you to model assets in Dagster, attach metadata and events to those assets, but without scheduling their materialization with Dagster. -**External assets are a good fit when data is**: +External assets are a good fit if: -- Landed by an external source (e.g. an external file landing daily; Kafka landing data into Amazon S3) -- Created and processed using manual processes -- Materialized by existing pipelines with their own scheduling and infrastructure that you do not want to or need to migrate en masse +- **An external source creates the data**. For example: An external file created every day, or Kafka loading data into Amazon S3 +- **Dagster isn't required for materialization**. For example: An existing pipeline materializes data using its own scheduling mechanism +- **Data is created and processed manually.** For example: Manually entering data into a spreadsheet -**With an external asset, you can:** +### What about Source Assets? + +[Source Assets](/concepts/assets/software-defined-assets#defining-external-asset-dependencies) can be used to model data that's produced by a process Dagster doesn't control, such as a daily file drop into Amazon S3. + +External assets can accomplish this, and more. As a result, Source Assets will be replaced with external assets in the near future. + +--- + +## Uses and limitations + +With an external asset, you can: - Attach metadata to its definition for documentation, tracking ownership, and so on -- Track its data quality and version in Dagster +- Track its [data quality](/concepts/assets/asset-checks) and [version](/guides/dagster/asset-versioning-and-caching) in Dagster - Use [asset sensors](/concepts/partitions-schedules-sensors/asset-sensors) or auto-materialize policies to update downstream assets based on updates to external assets -**You cannot, however:** - -- Schedule an external asset's materialization -- Backfill an external asset using Dagster -- Use the [Dagster UI](/concepts/webserver/ui) or [GraphQL API](/concepts/webserver/graphql) to instigate ad hoc materializations - - - What about Source Assets? A common use case for external - assets is modeling data produced by a process not under Dagster's control. For - example, a daily file drop from a third party into Amazon S3. In most systems, - these are described as sources. This includes Dagster, which - includes . As - external assets are a superset of Source Asset functionality,{" "} - - source assets will be supplanted by external assets in the near future - - . - +### Limitations + +The following aren't currently supported when using external assets: + +- Scheduling the execution of an external asset +- Backfilling an external asset using Dagster +- Using the [Dagster UI](/concepts/webserver/ui) or [GraphQL API](/concepts/webserver/graphql) to instigate ad hoc executions --- ## Relevant APIs -| Name | Description | -| ------------------------------------------------ | ------------------------------------------------------------------------------------------- | -| | Create list of objects that represent external assets | -| | An object that represents the metadata of a particular asset | +| Name | Description | +| ------------------------------- | ---------------------------------------------------------------------------------------------- | +| `external_assets_from_specs` | Creates a list of objects that represent external assets | +| | An object that represents the metadata of a particular asset | --- @@ -71,6 +70,8 @@ defs = Definitions(assets=[external_asset_from_spec(AssetSpec("file_in_s3"))]) Click the **Asset definition** tab to view how this asset is defined. +Note that the **Materialize** button is disabled, as external assets can't be executed by Dagster. + The files_in_s3 external asset in the Asset Graph of the Dagster UI -### Fully-managed assets with external asset dependencies +### Dagster-native assets with external asset dependencies Fully-managed assets can depend on external assets. In this example, the `aggregated_logs` asset depends on `processed_logs`, which is an external asset: @@ -177,20 +180,34 @@ To keep your external assets updated, you can use any of the following approache - [A REST API](#using-the-rest-api) - [Sensors](#using-sensors) - [Using the Python API](#using-the-python-api) -- [Logging events in ops](#logging-events-in-unrelated-ops) +- [Logging events in ops](#logging-events-in-ops) ### Using the REST API -Dagster OSS exposes a REST endpoint for reporting asset materializations. Refer to the following tabs for examples using a `curl` command, and for invoking the API in Python. +Whether you're using Dagster OSS or Dagster Cloud, you can use a REST endpoint for reporting asset materializations. Refer to the following tabs for examples using `curl` and Python to communicate with the API. -The following demonstrates how to use a `curl` command in a shell script to communicate with the API: +#### Dagster Cloud + +```bash +curl --request POST \ + --url https://{organization}.dagster.cloud/{deployment}/report_asset_materialization/{asset_key} \ + --header 'Content-Type: application/json' \ + --header 'Dagster-Cloud-Api-Token: {token}' \ + --data '{ + "metadata" : { + "source": "From curl command" + } +}' +``` + +#### Dagster OSS ```bash curl --request POST \ - --url https://path/to/instance/report_asset_materialization/{asset_key}\ + --url https://{dagster_webserver_host}/report_asset_materialization/{asset_key} \ --header 'Content-Type: application/json' \ --data '{ "metadata" : { @@ -202,12 +219,24 @@ curl --request POST \ -The following demonstrates how to invoke the API in Python using the `requests` library: +#### Dagster Cloud + +```python +import requests + +url = f"https://{organization}.dagster.cloud/{deployment}/report_asset_materialization/{asset_key}" +payload = { "metadata": { "source": "From python script" } } +headers = { "Content-Type": "application/json", "Dagster-Cloud-Api-Token": "{token}" } + +response = requests.request("POST", url, json=payload, headers=headers) +``` + +#### Dagster OSS ```python import requests -url = f"https://path/to/instance/report_asset_materialization/{asset_key}" +url = f"https://{dagster_webserver_host}/report_asset_materialization/{asset_key}" payload = { "metadata": { "source": "From python script" } } headers = { "Content-Type": "application/json" } @@ -266,7 +295,7 @@ defs = Definitions( You can insert events to attach to external assets directly from Dagster's Python API. Specifically, the API is `report_runless_asset_event` on . -For example, this would be useful when writing a hand-rolled Python script to backfill metadata: +For example, this would be useful when writing a Python script to backfill metadata: ```python file=/concepts/assets/external_assets/external_asset_events_using_python_api.py startafter=start_python_api_marker endbefore=end_python_api_marker dedent=4 from dagster import AssetMaterialization @@ -279,9 +308,9 @@ instance.report_runless_asset_event( ) ``` -### Logging events in unrelated ops +### Logging events using ops -You can log an from a bare op. In this case, use the `log_event` method of to report an asset materialization of an external asset. For example: +You can log an from an [op](/concepts/ops-jobs-graphs/ops). In this case, use the `log_event` method of to report an asset materialization of an external asset. For example: ```python file=/concepts/assets/external_assets/update_external_asset_via_op.py from dagster import ( From f38f0adbb66430a8056730dbbce5b949357624eb Mon Sep 17 00:00:00 2001 From: Erin Cochran Date: Thu, 12 Oct 2023 16:54:52 -0400 Subject: [PATCH 4/5] Another pass --- .../concepts/assets/external-assets.mdx | 55 ++++++++++++------- 1 file changed, 36 insertions(+), 19 deletions(-) diff --git a/docs/content/concepts/assets/external-assets.mdx b/docs/content/concepts/assets/external-assets.mdx index 53c197598df80..961eb4905db5e 100644 --- a/docs/content/concepts/assets/external-assets.mdx +++ b/docs/content/concepts/assets/external-assets.mdx @@ -5,13 +5,9 @@ description: External assets model assets in Dagster that are not scheduled or m # External Assets (Experimental) -An **external asset** is an asset that is not materialized by Dagster, but is tracked in the asset graph and asset catalog. This allows you to model assets in Dagster, attach metadata and events to those assets, but without scheduling their materialization with Dagster. +An **external asset** is an asset that is visible in Dagster but executed by an external process. For example, you have a process that loads data from Kafka into Amazon S3 every day. You want the S3 asset to be visible alongside your other data assets, but not triggered by Dagster. -External assets are a good fit if: - -- **An external source creates the data**. For example: An external file created every day, or Kafka loading data into Amazon S3 -- **Dagster isn't required for materialization**. For example: An existing pipeline materializes data using its own scheduling mechanism -- **Data is created and processed manually.** For example: Manually entering data into a spreadsheet +In this case, you could use an external asset to leverage Dagster's event log and tooling without using the orchestrator. This allows you to maintain data lineage, observability, and data quality without unnecessary migrations. ### What about Source Assets? @@ -23,10 +19,10 @@ External assets can accomplish this, and more. As a result, Source Assets will b ## Uses and limitations -With an external asset, you can: +Using external assets, you can: -- Attach metadata to its definition for documentation, tracking ownership, and so on -- Track its [data quality](/concepts/assets/asset-checks) and [version](/guides/dagster/asset-versioning-and-caching) in Dagster +- Attach metadata to asset definitions for documentation, tracking ownership, and so on +- Track the assets' [data quality](/concepts/assets/asset-checks) and [version](/guides/dagster/asset-versioning-and-caching) in Dagster - Use [asset sensors](/concepts/partitions-schedules-sensors/asset-sensors) or auto-materialize policies to update downstream assets based on updates to external assets ### Limitations @@ -179,17 +175,21 @@ To keep your external assets updated, you can use any of the following approache - [A REST API](#using-the-rest-api) - [Sensors](#using-sensors) -- [Using the Python API](#using-the-python-api) +- [A Python API](#using-the-python-api) - [Logging events in ops](#logging-events-in-ops) ### Using the REST API -Whether you're using Dagster OSS or Dagster Cloud, you can use a REST endpoint for reporting asset materializations. Refer to the following tabs for examples using `curl` and Python to communicate with the API. +Whether you're using Dagster OSS or Dagster Cloud, you can use a REST endpoint for reporting asset materializations. The API also has endpoints for reporting [asset observations](/concepts/assets/asset-observations) and [asset check evaluations](/concepts/assets/asset-checks). + +Refer to the following tabs for examples using `curl` and Python to communicate with the API. + +#### Using curl - + -#### Dagster Cloud +##### Dagster Cloud ```bash curl --request POST \ @@ -203,7 +203,12 @@ curl --request POST \ }' ``` -#### Dagster OSS +--- + + + + +##### Dagster OSS ```bash curl --request POST \ @@ -216,10 +221,17 @@ curl --request POST \ }' ``` +--- + - + -#### Dagster Cloud +#### Using Python + + + + +##### Dagster Cloud ```python import requests @@ -231,7 +243,12 @@ headers = { "Content-Type": "application/json", "Dagster-Cloud-Api-Token": "{tok response = requests.request("POST", url, json=payload, headers=headers) ``` -#### Dagster OSS +--- + + + + +##### Dagster OSS ```python import requests @@ -243,11 +260,11 @@ headers = { "Content-Type": "application/json" } response = requests.request("POST", url, json=payload, headers=headers) ``` +--- + -The API also has endpoints for reporting [asset observations](/concepts/assets/asset-observations) and [asset check evaluations](/concepts/assets/asset-checks). - ### Using sensors By using the `asset_events` parameter of , you can generate events to attach to external assets and then provide them directly to sensors. For example: From 4f502024c7aed8bd0da8983a89027b44d1ec6d81 Mon Sep 17 00:00:00 2001 From: Erin Cochran Date: Thu, 12 Oct 2023 17:04:25 -0400 Subject: [PATCH 5/5] Fix link --- docs/content/concepts/assets/external-assets.mdx | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/content/concepts/assets/external-assets.mdx b/docs/content/concepts/assets/external-assets.mdx index 961eb4905db5e..014e902055299 100644 --- a/docs/content/concepts/assets/external-assets.mdx +++ b/docs/content/concepts/assets/external-assets.mdx @@ -176,7 +176,7 @@ To keep your external assets updated, you can use any of the following approache - [A REST API](#using-the-rest-api) - [Sensors](#using-sensors) - [A Python API](#using-the-python-api) -- [Logging events in ops](#logging-events-in-ops) +- [Logging events using ops](#logging-events-using-ops) ### Using the REST API