Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adding support for defining filters on measures #3624

Merged
merged 26 commits into from
Dec 18, 2023

Conversation

AdityaHegde
Copy link
Collaborator

@AdityaHegde AdityaHegde commented Dec 5, 2023

This is a backend only PR to support filters on measures

  • Define a proto structure for measure filter
  • Implement a generic buildHavingClause
  • Add to MetricsViewAggregation
  • Add to MetricsViewComparison
  • Add to MetricsViewToplist
  • Add to MetricsViewTimeSeries
  • Simplify proto structure to reuse column type from measure sort type
  • Look into using alias instead of measure name + column type
  • Performance consideration and query tweaks
  • Verify for druid

@AdityaHegde AdityaHegde self-assigned this Dec 7, 2023
message Expression {
oneof expression {
google.protobuf.Value value = 1;
string column = 2;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Name suggestion: identifier and make it 1 instead of 2

}
}

message ConditionExpression {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggestion: Just name it Condition

@@ -8,6 +8,7 @@ import "rill/runtime/v1/export_format.proto";
import "rill/runtime/v1/schema.proto";
import "rill/runtime/v1/time_grain.proto";
import "validate/validate.proto";
import "expressions.proto";
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. Full path like above, i.e. rill/runtime/v1/expressions.proto
  2. Singular, i.e. expression.proto

Comment on lines 21 to 26
OPERATION_EQUALS = 1;
OPERATION_NOT_EQUALS = 2;
OPERATION_LESSER = 3;
OPERATION_LESSER_OR_EQUALS = 4;
OPERATION_GREATER = 5;
OPERATION_GREATER_OR_EQUALS = 6;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it more common to use "equals" or "equal"?

Also, should we consider the usual shorthands, like EQ, NEQ, LT, LTE, etc.?

Comment on lines 29 to 30
OPERATION_BETWEEN = 9;
OPERATION_NOT_BETWEEN = 10;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Redundant – would suggest using an "and" of two expressions instead. It's also ambiguous – i.e. are they inclusive or exclusive of the start and end values?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These are from duckdb docs: https://duckdb.org/docs/sql/expressions/comparison_operators.html#between-and-is-not-null
I think we can convert them to compound operations in UI and keep the API simple.

@@ -0,0 +1,35 @@
syntax = "proto3";
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Question: not sure what I think, but do you think we should consider shorter type names given the nesting of expressions and how often we'll end up looking at them printed?

I.e. using terms like expr (expression), cond (condition), val (value), op (operation), args (operands), ident (identifer), eq (equal), etc.?

For example, MongoDB does that: https://www.mongodb.com/docs/manual/reference/operator/query/expr/.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would also suggest looking out for prior art for expressions expressed in protobufs. Two good places to look are:

@@ -279,7 +280,8 @@ message MetricsViewAggregationRequest {
TimeRange time_range = 12;
google.protobuf.Timestamp time_start = 6; // Deprecated in favor of time_range
google.protobuf.Timestamp time_end = 7; // Deprecated in favor of time_range
MetricsViewFilter filter = 8;
Expression filter = 8;
repeated MetricsViewColumnAlias aliases = 13;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are aliases needed in other cases than MetricsViewComparison? For the other cases, I think it's okay to require clients to use the real names (since there's no ambiguity around base vs. comparison vs. delta values).

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The idea was to use the same measure in measures and filter. Aggregation allows for count and count distinct so I thought those could be defined in aliases and used in measures and filter fields.

Comment on lines 312 to 317
message MetricsViewColumnAlias {
string name = 1;
oneof alias { // Is this overkill to future proof this?
MetricsViewMeasureAlias measure_alias = 2;
}
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah I think we should just flatten it. In light of the comment above, just one type MetricsViewComparisonMeasureAlias might be enough.

Comment on lines 322 to 323
MEASURE_TYPE_COUNT = 1;
MEASURE_TYPE_COUNT_DISTINCT = 2;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These can already be defined with an alias through MetricsViewAggregationMeasure

@@ -279,7 +280,8 @@ message MetricsViewAggregationRequest {
TimeRange time_range = 12;
google.protobuf.Timestamp time_start = 6; // Deprecated in favor of time_range
google.protobuf.Timestamp time_end = 7; // Deprecated in favor of time_range
MetricsViewFilter filter = 8;
Expression filter = 8;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Need to add support for HAVING clauses (different from WHERE clauses).

Maybe we refactor to Expression having and Expression where? Or to avoid that, could keep Expression filter and add Expression having or Expression measure_filter?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ya i think separating will keep it cleaner. Combined filter was not clean in the code.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also having makes more sense. We can think of using where for the main filter in the near future.

Comment on lines 415 to 428
message MetricsViewMeasureAlias {
enum MeasureType {
MEASURE_TYPE_UNSPECIFIED = 0;
MEASURE_TYPE_BASE_VALUE = 1;
MEASURE_TYPE_COMPARISON_VALUE = 2;
MEASURE_TYPE_ABS_DELTA = 3;
MEASURE_TYPE_REL_DELTA = 4;
}

string name = 1;
MeasureType type = 2;
repeated google.protobuf.Value args = 3;
string alias = 4;
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. Would incorporate Comparison in the name since this is specific to that use case (no other API supports base/comparison/delta values)
  2. Should reuse (and probably rename) the MetricsViewComparisonSortType type so we don't have two enums doing the same

Comment on lines 347 to 348
Condition where = 8;
Condition having = 14;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider using Expression directly, simpler to have one entrypoint (and matches the name of the file). Also slightly more flexible in that you can put something like just WHERE true

@AdityaHegde AdityaHegde force-pushed the adityahegde/filters-on-measures branch from ebe7390 to e9ff93b Compare December 14, 2023 07:26
@AdityaHegde AdityaHegde marked this pull request as ready for review December 14, 2023 08:16
@AdityaHegde AdityaHegde force-pushed the adityahegde/filters-on-measures branch from b0b4a36 to 5a7526a Compare December 14, 2023 08:39
proto/rill/runtime/v1/queries.proto Outdated Show resolved Hide resolved
proto/rill/runtime/v1/queries.proto Outdated Show resolved Hide resolved
runtime/queries/metricsview_comparison_toplist.go Outdated Show resolved Hide resolved
runtime/queries/metricsview.go Outdated Show resolved Hide resolved
runtime/queries/metricsview.go Outdated Show resolved Hide resolved
runtime/queries/metricsview.go Outdated Show resolved Hide resolved
runtime/queries/metricsview.go Outdated Show resolved Hide resolved
Comment on lines 304 to 310
func dimensionAliases(mv *runtimev1.MetricsViewSpec) map[string]identifier {
aliases := map[string]identifier{}
for _, dim := range mv.Dimensions {
aliases[dim.Name] = identifier{safeName(metricsViewDimensionColumn(dim)), dim.Unnest}
}
return aliases
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The previous buildFilterClauseForMetricsViewFilter implementation avoided building a lookup map since the overhead of building it (allocating/assigning a ~20 element map, plus a similar number of allocations for safeName) would be much higher than just iterating over mv.Dimensions a few times (most filters apply to just one or two dimensions).

Always hard to balance optimization vs. convenience, and not sure how much this matters, but would consider staying with the format of buildFilterClauseForMetricsViewFilter (i.e. passing mv and aliases directly as args; and maybe use a boolean like isHaving to indicate which to use for lookups).

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ya makes sense. We should really look at having some global cache per metrics view. Things would be very clean with a map. But for now i have updated to use mv and aliases

runtime/queries/metricsview.go Outdated Show resolved Hide resolved
runtime/queries/filterutil.go Outdated Show resolved Hide resolved
@AdityaHegde AdityaHegde force-pushed the adityahegde/filters-on-measures branch from 2ee87d4 to 2d27256 Compare December 15, 2023 12:20
@AdityaHegde AdityaHegde force-pushed the adityahegde/filters-on-measures branch from 2d27256 to 1e49865 Compare December 15, 2023 12:48
runtime/queries/metricsview.go Outdated Show resolved Hide resolved
runtime/queries/metricsview_comparison_toplist.go Outdated Show resolved Hide resolved
Comment on lines 462 to 480
if q.Having != nil {
// having clause needs selected columns to either be in group by or be aggregations.
// so adding additional sum() around measure and comparison columns
columnsTuple = fmt.Sprintf(
"sum(base.%[1]s) as %[1]s, sum(comparison.%[1]s) AS %[2]s, sum(base.%[1]s - comparison.%[1]s) AS %[3]s, sum((base.%[1]s - comparison.%[1]s)/comparison.%[1]s::DOUBLE) AS %[4]s",
safeName(m.Name),
safeName(m.Name+"__previous"),
safeName(m.Name+"__delta_abs"),
safeName(m.Name+"__delta_rel"),
)
} else {
columnsTuple = fmt.Sprintf(
"base.%[1]s, comparison.%[1]s AS %[2]s, base.%[1]s - comparison.%[1]s AS %[3]s, (base.%[1]s - comparison.%[1]s)/comparison.%[1]s::DOUBLE AS %[4]s",
safeName(m.Name),
safeName(m.Name+"__previous"),
safeName(m.Name+"__delta_abs"),
safeName(m.Name+"__delta_rel"),
)
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this needed because the actual aggregation happens in a sub-query? If yes, then instead of adding a new layer of aggregations, couldn't we just use a WHERE clause in the outer select instead? (Since a HAVING clause is basically equivalent to a WHERE clause in a wrapped query, or am I missing something?)

If the aggregation is needed, consider using ANY_VALUE or LAST, which I believe we use for such cases in other places (less ambiguous than SUM).

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point. I have updated to use an outer where clause.

@AdityaHegde AdityaHegde force-pushed the adityahegde/filters-on-measures branch from af2c562 to 025a62d Compare December 18, 2023 06:43
Comment on lines 308 to 313
case runtimev1.MetricsViewComparisonMeasureType_METRICS_VIEW_COMPARISON_MEASURE_TYPE_UNSPECIFIED,
runtimev1.MetricsViewComparisonMeasureType_METRICS_VIEW_COMPARISON_MEASURE_TYPE_BASE_VALUE:
// using `measure_0` as is causing ambiguity error in duckdb
if dialect == drivers.DialectDuckDB {
return "base." + safeName(alias.Name), true
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The columnIdentifierExpression function is not specific to the MetricsViewComparison API, so this doesn't seem safe?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, is the clause being templated in the correct place? I don't believe this was an issue before, so wondering if the ambiguity is due to some other issue

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added an outer query for duckdb to get rid of the ambiguity.

runtime/queries/metricsview_comparison_toplist_test.go Outdated Show resolved Hide resolved
runtime/queries/metricsview_timeseries_test.go Outdated Show resolved Hide resolved
runtime/testruntime/testruntime.go Outdated Show resolved Hide resolved
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants