Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update entities.md #5216

Merged
merged 23 commits into from
Aug 29, 2024
Merged
Changes from 21 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
88 changes: 76 additions & 12 deletions website/docs/docs/build/entities.md
Original file line number Diff line number Diff line change
Expand Up @@ -12,20 +12,86 @@ Within a semantic graph, the required parameters for an entity are `name` and `t

Entities can be specified with a single column or multiple columns. Entities (join keys) in a semantic model are identified by their name. Each entity name must be unique within a semantic model, but it doesn't have to be unique across different semantic models.

There are four entity types: primary, foreign, unique, or natural.
There are four entity types:
- [Primary](#primary) — Has only one record for each row in the table and includes every record in the data platform. This key uniquely identifies each record in the table.
- [Unique](#unique) — Contains only one record per row in the table but may have a subset of records in the data warehouse.
mirnawong1 marked this conversation as resolved.
Show resolved Hide resolved
- [Foreign](#foreign) — A field (or a set of fields) in one table that uniquely identifies a row in another table. This key links to a primary key in another table, establishing relationships between tables.
mirnawong1 marked this conversation as resolved.
Show resolved Hide resolved
- [Natural](#natural) — Columns or combinations of columns in a table that uniquely identify a record based on real-world data. This key is derived from actual data attributes.

:::tip Use entities as dimensions
You can also use entities as dimensions, which allows you to aggregate a metric to the granularity of that entity.
:::

## Entity types

MetricFlow's join logic depends on the entity `type` you use, and it also determines how to join semantic models. Refer to [Joins](/docs/build/join-logic) for more info on how to construct joins.
MetricFlow's join logic depends on the entity `type` you use and determines how to join semantic models. Refer to [Joins](/docs/build/join-logic) for more info on how to construct joins.

* **Primary —** A primary key has **only one** record for each row in the table, and it includes every record in the data platform.
* **Unique —** A unique key contains **only one** record per row in the table, but it may have a subset of records in the data warehouse. It can also include nulls.
* **Foreign —** A foreign key can include zero, one, or multiple instances of the same record. Null values may also be present.
* **Natural —** Natural keys are columns or combinations of columns in a table that uniquely identify a record based on real-world data. For instance, in a sales_person_department dimension table, the sales_person_id can serve as a natural key. You can only use natural keys for [SCD type II dimensions](/docs/build/dimensions#scd-type-ii).
### Primary
A primary key has _only one_ record for each row in the table and includes every record in the data platform. It must contain unique values and can't contain null values. Use the primary key to ensure that each record in the table is distinct and identifiable.

<Expandable alt_header="Primary key example">

For example, consider a table of employees with the following columns:

```sql
employee_id (primary key)
first_name
last_name
```
In this case, `employee_id` is the primary key. Each `employee_id` is unique and represents one specific employee. There can be no duplicate `employee_id` and can't be null.

</Expandable>

### Unique
A unique key contains _only one_ record per row in the table but may have a subset of records in the data warehouse. However, unlike the primary key, a unique key allows for null values. The unique key ensures that the column's values are distinct, except for null values.

<Expandable alt_header="Unique key example">

For example, consider a table of students with the following columns:

```sql
student_id (primary key)
email (unique key)
first_name
last_name
```

In this example, `email` is defined as a unique key. Each email address must be unique; however, multiple students can have null email addresses. This is because the unique key constraint allows for one or more null values, but non-null values must be unique. This then creates a set of records with unique emails (non-null) that could be a subset of the entire table, which includes all students.

</Expandable>

### Foreign
A foreign key is a field (or a set of fields) in one table that uniquely identifies a row in another table. The foreign key establishes a link between the data in two tables.
It can include zero, one, or multiple instances of the same record. It can also contain null values.

<Expandable alt_header="Foreign key example">

For example, consider you have two tables, `customers` and `orders`:

customers table:

```sql
customer_id (primary key)
customer_name
```

orders table:

```sql
order_id (primary key)
order_date
customer_id (foreign key)
```

In this example, the `customer_id` in the `orders` table is a foreign key that references the `customer_id` in the `customers` table. This link means each order is associated with a specific customer. However, not every order must have a customer; the `customer_id` in the orders table can be null or have the same `customer_id` for multiple orders.

</Expandable>

### Natural

Natural keys are columns or combinations of columns in a table that uniquely identify a record based on real-world data. For instance, if you have a `sales_person_department` dimension table, the `sales_person_id` can serve as a natural key. You can only use natural keys for [SCD type II dimensions](/docs/build/dimensions#scd-type-ii).

## Entities configuration

The following is the complete spec for entities:

Expand All @@ -36,12 +102,11 @@ entities:
description: A description of the field or role the entity takes in this table ## Optional
expr: The field that denotes that entity (transaction_id). ## Optional
Defaults to name if unspecified.

```

Here's an example of how to define entities in a semantic model:

``` yaml
```yaml
entities:
- name: transaction
type: primary
Expand All @@ -54,15 +119,14 @@ entities:
expr: substring(id_order from 2)
```

### Combine columns with a key
## Combine columns with a key

If a table doesn't have any key (like a primary key), use _surrogate combination_ to form a key that will help you identify a record by combining two columns. This applies to any [entity type](/docs/build/entities#entity-types). For example, you can combine `date_key` and `brand_code` from the `raw_brand_target_weekly` table to form a _surrogate key_. The following example creates a surrogate key by joining `date_key` and `brand_code` using a pipe (`|`) as a separator.

```yaml

entities:
- name: brand_target_key # Entity name or identified.
type: foreign # This can be any entity type key.
expr: date_key || '|' || brand_code # Defines the expression for linking fields to form the surrogate key.
```


Loading