Skip to content

Commit

Permalink
update time spine and dimensions docs
Browse files Browse the repository at this point in the history
  • Loading branch information
Jstein77 committed Aug 13, 2024
1 parent e517b12 commit 9028011
Show file tree
Hide file tree
Showing 3 changed files with 54 additions and 26 deletions.
16 changes: 6 additions & 10 deletions website/docs/docs/build/dimensions.md
Original file line number Diff line number Diff line change
Expand Up @@ -104,10 +104,6 @@ dimensions:

## Time

:::tip use datetime data type if using BigQuery
To use BigQuery as your data platform, time dimensions columns need to be in the datetime data type. If they are stored in another type, you can cast them to datetime using the `expr` property. Time dimensions are used to group metrics by different levels of time, such as sub-daily like hour, or week, month, quarter, year, and so on. MetricFlow supports these granularities, which can be specified using the `time_granularity` parameter.
:::

Time has additional parameters specified under the `type_params` section. When you query one or more metrics in MetricFlow using the CLI, the default time dimension for a single metric is the aggregation time dimension, which you can refer to as `metric_time` or use the dimensions' name.

You can use multiple time groups in separate metrics. For example, the `users_created` metric uses `created_at`, and the `users_deleted` metric uses `deleted_at`:
Expand All @@ -120,7 +116,7 @@ dbt sl query --metrics users_created,users_deleted --group-by metric_time__year
mf query --metrics users_created,users_deleted --group-by metric_time__year --order-by metric_time__year
```

You can set `is_partition` for time or categorical dimensions to define specific time spans. Additionally, use the `type_params` section to set `time_granularity` to adjust aggregation detail (like sub-daily (hourly), daily, weekly, and so on). For more sub-daily configuration details, refer to [sub-daily granularity](/docs/build/granularity).
You can set `is_partition` for time or categorical dimensions to define specific time spans. Additionally, use the `type_params` section to set `time_granularity` to adjust aggregation detail (hourly, daily, weekly, and so on).

<Tabs>

Expand Down Expand Up @@ -171,11 +167,11 @@ measures:

<TabItem value="time_gran" label="time_granularity">

`time_granularity` specifies the smallest level of detail that a measure or metric should be reported at, such as [sub-daily](/docs/build/granularity), daily, weekly, monthly, quarterly, or yearly. Different granularity options are available, and each metric must have a specified granularity. For example, a metric specified with weekly granularity couldn't be aggregated to a daily grain.
`time_granularity` specifies the grain of a time dimension. MetricFlow will transfom the undelying column to the sepcifies granularity i.e if you add hourly granulairty to a time dimension column we will run a `date_trunc` function to convert the timestamp to hourly. You can easily change the time grain at query time and aggregate to a coarser grain, for example from hourly to monthly. You can't go from a coarser grain to a finer grain, for example from monthly to hourly.

The current options for time granularity are day, week, month, quarter, and year.
Any granulairty supported by your engines `date_trunc` funciton are support, with the most common granularities being hour, day, week, month, quarter, and year.

Aggregation between metrics with different granularities is possible, with the Semantic Layer returning results at the highest granularity by default. For example, when querying two metrics with daily and monthly granularity, the resulting aggregation will be at the monthly level.
Aggregation between metrics with different granularities is possible, with the Semantic Layer returning results at the coarser granularity by default. For example, when querying two metrics with daily and monthly granularity, the resulting aggregation will be at the monthly level.

```yaml
dimensions:
Expand All @@ -185,14 +181,14 @@ dimensions:
expr: date_trunc('day', ts_created) # ts_created is the underlying column name from the table
is_partition: True
type_params:
time_granularity: hour # or second, or millisecond etc
time_granularity: day
- name: deleted_at
type: time
label: "Date of deletion"
expr: date_trunc('day', ts_deleted) # ts_deleted is the underlying column name from the table
is_partition: True
type_params:
time_granularity: hour # or second, or millisecond etc
time_granularity: month
measures:
- name: users_deleted
Expand Down
64 changes: 48 additions & 16 deletions website/docs/docs/build/metricflow-time-spine.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,11 +6,38 @@ sidebar_label: "MetricFlow time spine"
tags: [Metrics, Semantic Layer]
---

MetricFlow uses a timespine table to construct cumulative metrics. By default, MetricFlow expects the timespine table to be named `metricflow_time_spine` and doesn't support using a different name.
It's a common for analytics engineers to have a date dimension or "time spine" table as a base table for diffrent types of time based joins and aggregations. The structure of this table is useually a base column of daily or hourly dates, with additional columns for other time grains like fiscal quarter defined based on the date. You can join other tables to the time spine to caluclate metrics like revenue at a point in time, or to aggregate to a fiscal qarter.

To create this table, you need to create a model in your dbt project called `metricflow_time_spine` and add the following code. This example uses a `day` granularity to generate a table with one row per day. This is useful for metrics that need a daily aggregation.
MetricFlow requires you to define a time spine table as a project level configration, which will then be used is used for various time based joins and aggregations, like cumulative metrics.

<File name='metricflow_time_spine_day.sql'>
If you already have a date dimension or time spine table in you dbt project, you need to update the `model` configruation to use this table in the semantic layer. For example, with the following directory structure I can create two time spine configuration, `time_spine_hourly` and `time_spine_daily`.

![Time spine directory structure](/img/docs/building-metrics/time_spines.png)


```yaml
models:
- name: time_spine_hourly
time_spine:
standard_granularity_column: date_hour # column for the standard grain of your table
columns:
- name: date_hour
granularity: hour # set granularity at column-level for standard_granularity_column
- name: time_spine_daily
time_spine:
standard_granularity_column: date_day # column for the standard grain of your table
columns:
- name: date_day
granularity: day # set granularity at column-level for standard_granularity_column
```
Let's break down the configuration above. We're pointing to a model called `time_spine_daily`. We set the time spine configrations under the time_spine key. The `standard_granularity_column` is the lowest grain of the table, in this case hourly. It needs to refrence a column defined under the columns key, in the case `date_hour`. We will use the `standard_granularity_column` as the join key for the time spine table when joining tables in MetricFlow. The granularity of the `standard_granularity_column` is set at the column level, in this case `hour`.


If you need to create a time spine table from scratch, you can do so by adding the following code to your dbt project.
The example creates a time spine at a daily grain and an hourly grain. We recomend creating both an hourly and daily time spines, MetricFlow will use the appropriate time spine based on the granualrity of the metric selected to minimize data scans.

<File name='time_spine_daily.sql'>

<VersionBlock lastVersion="1.6">

Expand All @@ -27,7 +54,7 @@ with days as (
dbt_utils.date_spine(
'day',
"to_date('01/01/2000','mm/dd/yyyy')",
"to_date('01/01/2027','mm/dd/yyyy')"
"to_date('01/01/2025','mm/dd/yyyy')"
)
}}
Expand All @@ -39,6 +66,9 @@ final as (
)
select * from final
-- filter the time spine to a specific range
where date_day > dateadd(year, -4, current_timestamp())
and date_hour < dateadd(day, 30, current_timestamp())
```

</VersionBlock>
Expand All @@ -58,7 +88,7 @@ with days as (
dbt.date_spine(
'day',
"to_date('01/01/2000','mm/dd/yyyy')",
"to_date('01/01/2027','mm/dd/yyyy')"
"to_date('01/01/2025','mm/dd/yyyy')"
)
}}
Expand Down Expand Up @@ -86,7 +116,7 @@ with days as (
{{dbt_utils.date_spine(
'day',
"DATE(2000,01,01)",
"DATE(2030,01,01)"
"DATE(2025,01,01)"
)
}}
),
Expand All @@ -98,6 +128,9 @@ final as (
select *
from final
-- filter the time spine to a specific range
where date_day > dateadd(year, -4, current_timestamp())
and date_hour < dateadd(day, 30, current_timestamp())
```

</VersionBlock>
Expand All @@ -112,7 +145,7 @@ with days as (
{{dbt.date_spine(
'day',
"DATE(2000,01,01)",
"DATE(2030,01,01)"
"DATE(2025,01,01)"
)
}}
),
Expand All @@ -124,19 +157,15 @@ final as (
select *
from final
-- filter the time spine to a specific range
where date_day > dateadd(year, -4, current_timestamp())
and date_hour < dateadd(day, 30, current_timestamp())
```

</VersionBlock>

You only need to include the `date_day` column in the table. MetricFlow can handle broader levels of detail, but it doesn't currently support finer grains.

## Hourly time spine

This example uses `dbt.date_spine` with an `hour` granularity to generate a table with one row per hour. This is needed for hourly data aggregation and other sub-daily analyses.

WHAT ARE OTHER OPTIONS?? TO ADD BOTH, DO USERS NEED TWO FILES (HOUR AND DAY) OR CAN THEY BE COMBINED?

<File name='metricflow_time_spine_hour.sql'>
<File name='time_spine_hourly.sql'>

```sql
-- filename: metricflow_time_spine_hour.sql
Expand All @@ -152,7 +181,7 @@ with hours as (
dbt.date_spine(
'hour',
"to_date('01/01/2000','mm/dd/yyyy')",
"to_date('01/01/2030','mm/dd/yyyy')"
"to_date('01/01/2025','mm/dd/yyyy')"
)
}}
Expand All @@ -164,5 +193,8 @@ final as (
)
select * from final
-- filter the time spine to a specific range
where date_day > dateadd(year, -4, current_timestamp())
and date_hour < dateadd(day, 30, current_timestamp())
```
</File>
Binary file added website/static/img/time_spines.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.

0 comments on commit 9028011

Please sign in to comment.