-
Notifications
You must be signed in to change notification settings - Fork 1.5k
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
- Loading branch information
Showing
13 changed files
with
305 additions
and
5 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Binary file not shown.
Binary file not shown.
Binary file not shown.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
192 changes: 192 additions & 0 deletions
192
docs/content/concepts/metadata-tags/asset-metadata/table-metadata.mdx
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,192 @@ | ||
--- | ||
title: "Table metadata | Dagster Docs" | ||
description: "Table metadata can be used to provide additional context about a tabular asset, such as its schema, row count, and more." | ||
--- | ||
|
||
# Table metadata | ||
|
||
Table metadata provides additional context about a tabular asset, such as its schema, row count, and more. This metadata can be used to improve collaboration, debugging, and data quality in your data platform. | ||
|
||
Dagster supports attaching different types of table metadata to assets, including: | ||
|
||
- [**Column schema**](#attaching-column-schema): Describes the structure of the table, including column names and types | ||
- [**Row count**](#attaching-row-count): Describes the number of rows in a materialized table | ||
- [**Column-level lineage**](#attaching-column-level-lineage): Describes how a column is created and used by other assets | ||
|
||
--- | ||
|
||
## Attaching column schema | ||
|
||
### For assets defined in Dagster | ||
|
||
<Image | ||
alt="Column schema for an asset in the Dagster UI" | ||
src="/images/concepts/metadata-tags/metadata-table-schema.png" | ||
width={1793} | ||
height={652} | ||
/> | ||
|
||
You may attach column schema metadata to Dagster assets either as [definition metadata](/concepts/metadata-tags/asset-metadata#attaching-definition-metadata) or [materialization metadata](/concepts/metadata-tags/asset-metadata#attaching-materialization-metadata). If the schema of your asset is pre-defined, you can attach it as definition metadata. If the schema is only known when an asset is materialized, you can attach it as metadata to that materialization. | ||
|
||
To attach schema metadata to an asset, you will need to: | ||
|
||
1. Construct a <PyObject object="TableSchema"/> object with <PyObject object="TableColumn" /> entries describing each column in the table | ||
2. Attach the `TableSchema` object to the asset as part of the `metadata` parameter under the `dagster/column_schema` key. This can be attached to your asset definition, or to the <PyObject object="MaterializeResult" /> object returned by the asset function | ||
|
||
Below are two examples of how to attach column schema metadata to an asset, one as definition metadata and one as materialization metadata: | ||
|
||
```python file=/concepts/metadata-tags/asset_column_schema.py | ||
from dagster import AssetKey, MaterializeResult, TableColumn, TableSchema, asset | ||
|
||
|
||
# Here, we know the schema of the asset, so we can attach it to the asset decorator | ||
@asset( | ||
deps=[AssetKey("source_bar"), AssetKey("source_baz")], | ||
metadata={ | ||
"dagster/column_schema": TableSchema( | ||
columns=[ | ||
TableColumn( | ||
"name", | ||
"string", | ||
description="The name of the person", | ||
), | ||
TableColumn( | ||
"age", | ||
"int", | ||
description="The age of the person", | ||
), | ||
] | ||
) | ||
}, | ||
) | ||
def my_asset(): ... | ||
|
||
|
||
# Here, the schema isn't known until runtime | ||
@asset(deps=[AssetKey("source_bar"), AssetKey("source_baz")]) | ||
def my_other_asset(): | ||
column_names = ... | ||
column_types = ... | ||
|
||
columns = [ | ||
TableColumn(name, column_type) | ||
for name, column_type in zip(column_names, column_types) | ||
] | ||
|
||
yield MaterializeResult( | ||
metadata={"dagster/column_schema": TableSchema(columns=columns)} | ||
) | ||
``` | ||
|
||
The schema for `my_asset` will be visible in the Dagster UI. You may optionally attach <PyObject object="TableColumnConstraints"/> to each column to provide additional context about the values in the column: | ||
|
||
```python file=/concepts/metadata-tags/asset_column_schema_constraints.py | ||
from dagster import ( | ||
AssetKey, | ||
MaterializeResult, | ||
TableColumn, | ||
TableColumnConstraints, | ||
TableSchema, | ||
asset, | ||
) | ||
|
||
|
||
@asset( | ||
deps=[AssetKey("source_bar"), AssetKey("source_baz")], | ||
metadata={ | ||
"dagster/column_schema": TableSchema( | ||
columns=[ | ||
TableColumn( | ||
"name", | ||
"string", | ||
description="The name of the person", | ||
), | ||
TableColumn( | ||
"age", | ||
"int", | ||
description="The age of the person", | ||
constraints=TableColumnConstraints(nullable=False, other=[">0"]), | ||
), | ||
] | ||
) | ||
}, | ||
) | ||
def my_asset(): ... | ||
``` | ||
|
||
### For assets loaded from integrations | ||
|
||
Column schemas are currently supported in the dbt integration. Refer to the [dbt documentation](/integrations/dbt/reference) for more information. | ||
|
||
--- | ||
|
||
## Attaching row count | ||
|
||
### For assets defined in Dagster | ||
|
||
<Image | ||
alt="Row count for an asset in the Dagster UI" | ||
src="/images/concepts/metadata-tags/metadata-row-count.png" | ||
width={1921} | ||
height={559} | ||
/> | ||
|
||
You may attach row count schema metadata to Dagster assets as [materialization metadata](/concepts/metadata-tags/asset-metadata#attaching-materialization-metadata) to provide additional context about the number of rows in a materialized table. Dagster will let you track changes in the row count over time, and you can use this information to monitor data quality. | ||
|
||
To attach schema metadata to an asset, you will need to attach a numerical value to the `dagster/row_count` key in the metadata parameter of the <PyObject object="MaterializeResult" /> object returned by the asset function. | ||
|
||
Below is an example of how to attach row count metadata to an asset: | ||
|
||
```python file=/concepts/metadata-tags/asset_row_count.py | ||
import pandas as pd | ||
|
||
from dagster import AssetKey, MaterializeResult, asset | ||
|
||
|
||
@asset(deps=[AssetKey("source_bar"), AssetKey("source_baz")]) | ||
def my_asset(): | ||
my_df: pd.DataFrame = ... | ||
|
||
yield MaterializeResult(metadata={"dagster/row_count": 374}) | ||
``` | ||
|
||
--- | ||
|
||
## Attaching column-level lineage | ||
|
||
Column lineage enables data and analytics engineers alike to understand how a column is created and used in your data platform. Refer to the [Column-level lineage documentation](/concepts/metadata-tags/asset-metadata/column-level-lineage) for more information. | ||
|
||
--- | ||
|
||
## APIs in this guide | ||
|
||
| Name | Description | | ||
| -------------------------------------------- | ---------------------------------------------------------------- | | ||
| <PyObject object="asset" decorator /> | A decorator used to define assets. | | ||
| <PyObject object="MaterializeResult" /> | An object representing a successful materialization of an asset. | | ||
| <PyObject object="TableSchema" /> | An object representing the schema of a tabular asset. | | ||
| <PyObject object="TableColumn" /> | Class that defines column information for a tabular asset. | | ||
| <PyObject object="TableColumnConstraints" /> | Class that defines constraints for a column in a tabular asset. | | ||
|
||
--- | ||
|
||
## Related | ||
|
||
<ArticleList> | ||
<ArticleListItem | ||
title="Asset metadata" | ||
href="/concepts/metadata-tags/asset-metadata" | ||
></ArticleListItem> | ||
<ArticleListItem | ||
title="Column-level lineage" | ||
href="/concepts/metadata-tags/asset-metadata/column-level-lineage" | ||
></ArticleListItem> | ||
<ArticleListItem | ||
title="Metadata & tags" | ||
href="/concepts/metadata-tags" | ||
></ArticleListItem> | ||
<ArticleListItem | ||
title="Asset definitions" | ||
href="/concepts/assets/software-defined-assets" | ||
></ArticleListItem> | ||
</ArticleList> |
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added
BIN
+92.1 KB
docs/next/public/images/concepts/metadata-tags/metadata-table-schema.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file not shown.
40 changes: 40 additions & 0 deletions
40
examples/docs_snippets/docs_snippets/concepts/metadata-tags/asset_column_schema.py
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,40 @@ | ||
from dagster import AssetKey, MaterializeResult, TableColumn, TableSchema, asset | ||
|
||
|
||
# Here, we know the schema of the asset, so we can attach it to the asset decorator | ||
@asset( | ||
deps=[AssetKey("source_bar"), AssetKey("source_baz")], | ||
metadata={ | ||
"dagster/column_schema": TableSchema( | ||
columns=[ | ||
TableColumn( | ||
"name", | ||
"string", | ||
description="The name of the person", | ||
), | ||
TableColumn( | ||
"age", | ||
"int", | ||
description="The age of the person", | ||
), | ||
] | ||
) | ||
}, | ||
) | ||
def my_asset(): ... | ||
|
||
|
||
# Here, the schema isn't known until runtime | ||
@asset(deps=[AssetKey("source_bar"), AssetKey("source_baz")]) | ||
def my_other_asset(): | ||
column_names = ... | ||
column_types = ... | ||
|
||
columns = [ | ||
TableColumn(name, column_type) | ||
for name, column_type in zip(column_names, column_types) | ||
] | ||
|
||
yield MaterializeResult( | ||
metadata={"dagster/column_schema": TableSchema(columns=columns)} | ||
) |
31 changes: 31 additions & 0 deletions
31
...les/docs_snippets/docs_snippets/concepts/metadata-tags/asset_column_schema_constraints.py
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,31 @@ | ||
from dagster import ( | ||
AssetKey, | ||
MaterializeResult, | ||
TableColumn, | ||
TableColumnConstraints, | ||
TableSchema, | ||
asset, | ||
) | ||
|
||
|
||
@asset( | ||
deps=[AssetKey("source_bar"), AssetKey("source_baz")], | ||
metadata={ | ||
"dagster/column_schema": TableSchema( | ||
columns=[ | ||
TableColumn( | ||
"name", | ||
"string", | ||
description="The name of the person", | ||
), | ||
TableColumn( | ||
"age", | ||
"int", | ||
description="The age of the person", | ||
constraints=TableColumnConstraints(nullable=False, other=[">0"]), | ||
), | ||
] | ||
) | ||
}, | ||
) | ||
def my_asset(): ... |
10 changes: 10 additions & 0 deletions
10
examples/docs_snippets/docs_snippets/concepts/metadata-tags/asset_row_count.py
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,10 @@ | ||
import pandas as pd | ||
|
||
from dagster import AssetKey, MaterializeResult, asset | ||
|
||
|
||
@asset(deps=[AssetKey("source_bar"), AssetKey("source_baz")]) | ||
def my_asset(): | ||
my_df: pd.DataFrame = ... | ||
|
||
yield MaterializeResult(metadata={"dagster/row_count": 374}) |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters