-
Notifications
You must be signed in to change notification settings - Fork 116
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Adding support for defining filters on measures #3624
Conversation
message Expression { | ||
oneof expression { | ||
google.protobuf.Value value = 1; | ||
string column = 2; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Name suggestion: identifier
and make it 1 instead of 2
} | ||
} | ||
|
||
message ConditionExpression { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Suggestion: Just name it Condition
proto/rill/runtime/v1/queries.proto
Outdated
@@ -8,6 +8,7 @@ import "rill/runtime/v1/export_format.proto"; | |||
import "rill/runtime/v1/schema.proto"; | |||
import "rill/runtime/v1/time_grain.proto"; | |||
import "validate/validate.proto"; | |||
import "expressions.proto"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- Full path like above, i.e.
rill/runtime/v1/expressions.proto
- Singular, i.e.
expression.proto
OPERATION_EQUALS = 1; | ||
OPERATION_NOT_EQUALS = 2; | ||
OPERATION_LESSER = 3; | ||
OPERATION_LESSER_OR_EQUALS = 4; | ||
OPERATION_GREATER = 5; | ||
OPERATION_GREATER_OR_EQUALS = 6; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it more common to use "equals" or "equal"?
Also, should we consider the usual shorthands, like EQ
, NEQ
, LT
, LTE
, etc.?
OPERATION_BETWEEN = 9; | ||
OPERATION_NOT_BETWEEN = 10; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Redundant – would suggest using an "and" of two expressions instead. It's also ambiguous – i.e. are they inclusive or exclusive of the start and end values?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These are from duckdb docs: https://duckdb.org/docs/sql/expressions/comparison_operators.html#between-and-is-not-null
I think we can convert them to compound operations in UI and keep the API simple.
@@ -0,0 +1,35 @@ | |||
syntax = "proto3"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Question: not sure what I think, but do you think we should consider shorter type names given the nesting of expressions and how often we'll end up looking at them printed?
I.e. using terms like expr
(expression), cond
(condition), val
(value), op
(operation), args
(operands), ident
(identifer), eq
(equal), etc.?
For example, MongoDB does that: https://www.mongodb.com/docs/manual/reference/operator/query/expr/.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would also suggest looking out for prior art for expressions expressed in protobufs. Two good places to look are:
proto/rill/runtime/v1/queries.proto
Outdated
@@ -279,7 +280,8 @@ message MetricsViewAggregationRequest { | |||
TimeRange time_range = 12; | |||
google.protobuf.Timestamp time_start = 6; // Deprecated in favor of time_range | |||
google.protobuf.Timestamp time_end = 7; // Deprecated in favor of time_range | |||
MetricsViewFilter filter = 8; | |||
Expression filter = 8; | |||
repeated MetricsViewColumnAlias aliases = 13; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why are aliases
needed in other cases than MetricsViewComparison
? For the other cases, I think it's okay to require clients to use the real names (since there's no ambiguity around base vs. comparison vs. delta values).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The idea was to use the same measure in measures
and filter
. Aggregation allows for count and count distinct so I thought those could be defined in aliases and used in measures
and filter
fields.
proto/rill/runtime/v1/queries.proto
Outdated
message MetricsViewColumnAlias { | ||
string name = 1; | ||
oneof alias { // Is this overkill to future proof this? | ||
MetricsViewMeasureAlias measure_alias = 2; | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah I think we should just flatten it. In light of the comment above, just one type MetricsViewComparisonMeasureAlias
might be enough.
proto/rill/runtime/v1/queries.proto
Outdated
MEASURE_TYPE_COUNT = 1; | ||
MEASURE_TYPE_COUNT_DISTINCT = 2; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These can already be defined with an alias through MetricsViewAggregationMeasure
proto/rill/runtime/v1/queries.proto
Outdated
@@ -279,7 +280,8 @@ message MetricsViewAggregationRequest { | |||
TimeRange time_range = 12; | |||
google.protobuf.Timestamp time_start = 6; // Deprecated in favor of time_range | |||
google.protobuf.Timestamp time_end = 7; // Deprecated in favor of time_range | |||
MetricsViewFilter filter = 8; | |||
Expression filter = 8; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Need to add support for HAVING
clauses (different from WHERE
clauses).
Maybe we refactor to Expression having
and Expression where
? Or to avoid that, could keep Expression filter
and add Expression having
or Expression measure_filter
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ya i think separating will keep it cleaner. Combined filter was not clean in the code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also having
makes more sense. We can think of using where
for the main filter in the near future.
proto/rill/runtime/v1/queries.proto
Outdated
message MetricsViewMeasureAlias { | ||
enum MeasureType { | ||
MEASURE_TYPE_UNSPECIFIED = 0; | ||
MEASURE_TYPE_BASE_VALUE = 1; | ||
MEASURE_TYPE_COMPARISON_VALUE = 2; | ||
MEASURE_TYPE_ABS_DELTA = 3; | ||
MEASURE_TYPE_REL_DELTA = 4; | ||
} | ||
|
||
string name = 1; | ||
MeasureType type = 2; | ||
repeated google.protobuf.Value args = 3; | ||
string alias = 4; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- Would incorporate
Comparison
in the name since this is specific to that use case (no other API supports base/comparison/delta values) - Should reuse (and probably rename) the
MetricsViewComparisonSortType
type so we don't have two enums doing the same
proto/rill/runtime/v1/queries.proto
Outdated
Condition where = 8; | ||
Condition having = 14; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Consider using Expression
directly, simpler to have one entrypoint (and matches the name of the file). Also slightly more flexible in that you can put something like just WHERE true
ebe7390
to
e9ff93b
Compare
b0b4a36
to
5a7526a
Compare
runtime/queries/metricsview.go
Outdated
func dimensionAliases(mv *runtimev1.MetricsViewSpec) map[string]identifier { | ||
aliases := map[string]identifier{} | ||
for _, dim := range mv.Dimensions { | ||
aliases[dim.Name] = identifier{safeName(metricsViewDimensionColumn(dim)), dim.Unnest} | ||
} | ||
return aliases | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The previous buildFilterClauseForMetricsViewFilter
implementation avoided building a lookup map since the overhead of building it (allocating/assigning a ~20 element map, plus a similar number of allocations for safeName
) would be much higher than just iterating over mv.Dimensions
a few times (most filters apply to just one or two dimensions).
Always hard to balance optimization vs. convenience, and not sure how much this matters, but would consider staying with the format of buildFilterClauseForMetricsViewFilter
(i.e. passing mv
and aliases directly as args; and maybe use a boolean like isHaving
to indicate which to use for lookups).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ya makes sense. We should really look at having some global cache per metrics view. Things would be very clean with a map. But for now i have updated to use mv and aliases
2ee87d4
to
2d27256
Compare
2d27256
to
1e49865
Compare
if q.Having != nil { | ||
// having clause needs selected columns to either be in group by or be aggregations. | ||
// so adding additional sum() around measure and comparison columns | ||
columnsTuple = fmt.Sprintf( | ||
"sum(base.%[1]s) as %[1]s, sum(comparison.%[1]s) AS %[2]s, sum(base.%[1]s - comparison.%[1]s) AS %[3]s, sum((base.%[1]s - comparison.%[1]s)/comparison.%[1]s::DOUBLE) AS %[4]s", | ||
safeName(m.Name), | ||
safeName(m.Name+"__previous"), | ||
safeName(m.Name+"__delta_abs"), | ||
safeName(m.Name+"__delta_rel"), | ||
) | ||
} else { | ||
columnsTuple = fmt.Sprintf( | ||
"base.%[1]s, comparison.%[1]s AS %[2]s, base.%[1]s - comparison.%[1]s AS %[3]s, (base.%[1]s - comparison.%[1]s)/comparison.%[1]s::DOUBLE AS %[4]s", | ||
safeName(m.Name), | ||
safeName(m.Name+"__previous"), | ||
safeName(m.Name+"__delta_abs"), | ||
safeName(m.Name+"__delta_rel"), | ||
) | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this needed because the actual aggregation happens in a sub-query? If yes, then instead of adding a new layer of aggregations, couldn't we just use a WHERE
clause in the outer select instead? (Since a HAVING
clause is basically equivalent to a WHERE
clause in a wrapped query, or am I missing something?)
If the aggregation is needed, consider using ANY_VALUE
or LAST
, which I believe we use for such cases in other places (less ambiguous than SUM
).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point. I have updated to use an outer where clause.
af2c562
to
025a62d
Compare
runtime/queries/metricsview.go
Outdated
case runtimev1.MetricsViewComparisonMeasureType_METRICS_VIEW_COMPARISON_MEASURE_TYPE_UNSPECIFIED, | ||
runtimev1.MetricsViewComparisonMeasureType_METRICS_VIEW_COMPARISON_MEASURE_TYPE_BASE_VALUE: | ||
// using `measure_0` as is causing ambiguity error in duckdb | ||
if dialect == drivers.DialectDuckDB { | ||
return "base." + safeName(alias.Name), true | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The columnIdentifierExpression
function is not specific to the MetricsViewComparison
API, so this doesn't seem safe?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, is the clause being templated in the correct place? I don't believe this was an issue before, so wondering if the ambiguity is due to some other issue
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added an outer query for duckdb to get rid of the ambiguity.
This is a backend only PR to support filters on measures