-
Notifications
You must be signed in to change notification settings - Fork 984
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
add subdaily granularity #5882
add subdaily granularity #5882
Conversation
The latest updates on your projects. Learn more about Vercel for Git ↗︎
|
Editorial changes
|
||
<!--dimensions are non-aggregatable expressions that define the level of aggregation for a metric used to define how data is sliced or grouped in a metric. Since groups can't be aggregated, they're considered to be a property of the primary or unique entity of the table. | ||
|
||
Groups are defined within semantic models, alongside entities and measures, and correspond to non-aggregatable columns in your dbt model that provides categorical or time-based context. In SQL, dimensions is typically included in the GROUP BY clause.--> | ||
|
||
All dimensions require a `name`, `type` and in some cases, an `expr` parameter. The `name` for your dimension must be unique to the semantic model and can not be the same as an existing `entity` or `measure` within that same model. | ||
All dimensions require a `name` and `type` and, in some cases, can optionally include an `expr` parameter. The `name` for your Dimension must be unique within the same semantic model. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems redundant to say both "in some cases" and "optionally" - maybe pick one or the other?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yup good call. I will update.
- name: is_bulk_transaction | ||
type: categorical | ||
expr: case when quantity > 10 then true else false end | ||
``` | ||
|
||
MetricFlow requires that all dimensions have a primary entity. This is to guarantee unique dimension names. If your data source doesn't have a primary entity, you need to assign the entity a name using the `primary_entity: entity_name` key. It doesn't necessarily have to map to a column in that table and assigning the name doesn't affect query generation. | ||
Dimensions are bound to the primary entity of the semantic model in which they are defined. For example, if a dimension called `is_bulk_transaction` is defined in a model with `transaction` as a primary entity, then `is_bulk_transaction` is scoped to the `transaction` entity. To reference this dimension you would use the fully qualified dimension name `transaction__is_bulk_transaction`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Might be nice to use an example dimension name that makes it somewhat clear why we bind it to the entity name. E.g., something like transaction__country
or just changing the name to something like transaction__is_bulk
would make this feel less redundant.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed.
MetricFlow requires that all dimensions have a primary entity. This is to guarantee unique dimension names. If your data source doesn't have a primary entity, you need to assign the entity a name using the `primary_entity: entity_name` key. It doesn't necessarily have to map to a column in that table and assigning the name doesn't affect query generation. | ||
Dimensions are bound to the primary entity of the semantic model in which they are defined. For example, if a dimension called `is_bulk_transaction` is defined in a model with `transaction` as a primary entity, then `is_bulk_transaction` is scoped to the `transaction` entity. To reference this dimension you would use the fully qualified dimension name `transaction__is_bulk_transaction`. | ||
|
||
MetricFlow requires that all semantic models have a primary entity. This is to guarantee unique dimension names. If your data source doesn't have a primary entity, you need to assign the entity a name using the `primary_entity` key. It doesn't necessarily have to map to a column in that table and assigning the name doesn't affect query generation. An example of defining a primary entity for a data source that doesn't have a primary entity column is below: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we add that for a virtual primary entity like this, you should try to make the name unique? I don't think we enforce that (we should) but it's definitely helpful
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
|
||
The current options for time granularity are day, week, month, quarter, and year. | ||
Any granularity supported by your engine's `date_trunc` function will work, with the most common granularities being hour, day, week, month, quarter, and year. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This isn't quite accurate (e.g., look at the options available for snowflake). Might be better to just list the options we support.
For sub-daily options, we support these for all engines unless otherwise noted):
- nanosecond (snowflake only)
- microsecond (all engines except trino)
- millisecond
- second
- minute
- hour
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated
|
||
Aggregation between metrics with different granularities is possible, with the Semantic Layer returning results at the highest granularity by default. For example, when querying two metrics with daily and monthly granularity, the resulting aggregation will be at the monthly level. | ||
Aggregation between metrics with different granularities is possible, with the Semantic Layer returning results at the coarser granularity by default. For example, when querying two metrics with daily and monthly granularity, the resulting aggregation will be at the monthly level. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think coarsest
would be grammatically correct here
|
||
<File name='metricflow_time_spine.sql'> | ||
If you already have a date dimension or time spine table in your dbt project you can simply point MetricFlow at this table. To do this, update the `model` configuration to use this table in the semantic layer. For example, given the following directory structure, you can create two time spine configurations, `time_spine_hourly` and `time_spine_daily`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you think people migrating from the old time spine will think they need to rename the model? Not sure if we want to add a note about that (that you can keep the old name) to avoid confusion!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added an note about this.
|
||
|
||
If you need to create a time spine table from scratch, add the following code to your dbt project. | ||
The example creates a time spine at a daily grain and an hourly grain. We recommend creating both an hourly and daily time spine, MetricFlow will use the appropriate time spine based on the granularity of the metric selected to minimize data scans. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we add some more detail here? Some things I think it would be helpful to know:
- MetricFlow will use the time spine with the largest compatible granularity for a given query to ensure the most efficient query possible
- You can add a time spine for each granularity you intend to use if minor query efficiency is more important to you than setup time / space constraints
- We recommend having a time spine at the finest grain used in any of your dimensions to avoid unexpected errors
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added more context
### Conversion metrics | ||
## Default granularity for metircs | ||
|
||
It's possible to define a default time granularity for metrics that differs from the granularity of the default aggregation time dimensions (`metric_time`). This is useful if your time dimension has a very fine grain, like second or hour, but you typically query metrics rolled up at a coarser grain. The granularity can be set using the `time_granularity` parameter on the metric and defaults to `day`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would note that while it defaults to day, if day is not available because the dimension is defined at a coarser granularity, it will default to the defined granularity for the dimension!
@@ -84,7 +84,7 @@ semantic_models: | |||
- name: transaction_date | |||
type: time | |||
type_params: | |||
time_granularity: day | |||
time_granularity: day # Additional options include hour, week, month, quarter, year, and so on. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
seems weird to exclude other options like second and below if we're going to list so many. do we need this list at all?
- MetricFlow requires all dimensions to be tied to a primary entity. | ||
Dimensions have the following characteristics: | ||
|
||
- There are two types of dimensions: categorical and time. Categorical dimensions are for things you can't measure in numbers, while time dimensions represent dates. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"...while time dimensions represent dates and timestamps"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@mirnawong1 leaving comments here for what should be version blocked!
@@ -173,28 +161,34 @@ measures: | |||
|
|||
<TabItem value="time_gran" label="time_granularity"> | |||
|
|||
`time_granularity` specifies the smallest level of detail that a measure or metric should be reported at, such as daily, weekly, monthly, quarterly, or yearly. Different granularity options are available, and each metric must have a specified granularity. For example, a metric specified with weekly granularity couldn't be aggregated to a daily grain. | |||
`time_granularity` specifies the grain of a time dimension. MetricFlow will transform the underlying column to the specified granularity. For example, if you add hourly granularity to a time dimension column, MetricFlow will run a `date_trunc` function to convert the timestamp to hourly. You can easily change the time grain at query time and aggregate it to a coarser grain, for example, from hourly to monthly. However, you can't go from a coarser grain to a finer grain (monthly to hourly). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@mirnawong1 This section mentions hourly granularity, which isn't available for <=1.8. We should keep this section for 1.9+, but can we swap the word "hourly" with "daily" for <=1.8?
|
||
The current options for time granularity are day, week, month, quarter, and year. | ||
Our supported granularities are: | ||
* nanosecond (Snowflake only) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@mirnawong1 These sub-daily granularity options are showing up for all versions. Can we keep them all for 1.9+, but remove anything smaller than day for <=1.8?
is_partition: True | ||
type_params: | ||
time_granularity: day | ||
time_granularity: hour |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@mirnawong1 Can we swap in day instead of hour for <=1.8?
@@ -6,11 +6,45 @@ sidebar_label: "MetricFlow time spine" | |||
tags: [Metrics, Semantic Layer] | |||
--- | |||
|
|||
MetricFlow uses a timespine table to construct cumulative metrics. By default, MetricFlow expects the timespine table to be named `metricflow_time_spine` and doesn't support using a different name. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@mirnawong1 For this entire file - can we use this deleted text for versions <=1.8, instead of the new text? The new text should only be for 1.9+.
``` | ||
|
||
</VersionBlock> | ||
|
||
You only need to include the `date_day` column in the table. MetricFlow can handle broader levels of detail, but it doesn't currently support finer grains. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@mirnawong1 for old versions, we can update this to say:
"...but finer grains are only supported in versions 1.9+."
import SLCourses from '/snippets/_sl-course.md'; | ||
|
||
<SLCourses/> | ||
|
||
### Conversion metrics | ||
## Default granularity for metircs |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@mirnawong1 This whole section called "Default granularity for metrics" should be version blocked to 1.9+.
Also noting that "metrics" is misspelled in the title (though maybe that's already fixed in production!)
@@ -232,10 +283,20 @@ filter: | | |||
{{ TimeDimension('time_dimension', 'granularity') }} | |||
|
|||
filter: | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@mirnawong1 Can we version block this metric filter example to versions 1.8+?
resolves #5857
resolves #5908
this pr adds draft content to explain subdaily granularities in MF.
[ ] Needs PM review
[ ] Needs docs review
Outstanding questions
default_grain
andtime_granularity
? and how does it connect to the time_spine and when should a user it what? or is it up to them?