-
Notifications
You must be signed in to change notification settings - Fork 125
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
null filling and coalescing for mv agg query #6088
base: main
Are you sure you want to change the base?
Conversation
@@ -205,6 +205,7 @@ message MetricsViewSpec { | |||
string format_d3 = 7; | |||
google.protobuf.Struct format_d3_locale = 13; | |||
bool valid_percent_of_total = 6; | |||
string treat_nulls_as = 14; // TODO what should the type, using string values will not work when coalescing numeric cols |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should just consider this to be a SQL expression to be templated into the query literally. For example:
treat_nulls_as: 0
treat_nulls_as: CAST(0 AS HUGEINT)
treat_nulls_as: "'Not available'"
@@ -83,7 +91,7 @@ func (c *sqlConnection) QueryContext(ctx context.Context, query string, args []d | |||
context.AfterFunc(ctx, func() { | |||
tctx, cancel := context.WithTimeout(context.Background(), 10*time.Second) | |||
defer cancel() | |||
r, err := http.NewRequestWithContext(tctx, http.MethodDelete, c.dsn+"/"+dr.Context.SQLQueryID, http.NoBody) | |||
r, err := http.NewRequestWithContext(tctx, http.MethodDelete, urlutil.MustJoinURL(c.dsn, dr.Context[sqlQueryIDContextKey].(string)), http.NoBody) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would be nice to avoid the cast here. Could the query ID be stored in *DruidRequest
directly so it can be accessed like dr.queryID
?
for _, arg := range args { | ||
if strings.HasPrefix(arg.Name, drivers.DialectDruid.ContextKeyArgPrefix()) { | ||
queryCtx[strings.TrimPrefix(arg.Name, drivers.DialectDruid.ContextKeyArgPrefix())] = arg.Value | ||
continue | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Based on https://pkg.go.dev/database/sql/driver (try searching for ErrRemoveArgument
on this page for details):
NamedValueChecker also allows queries to accept per-query options as a parameter by returning ErrRemoveArgument from CheckNamedValue.
So two observations here:
- I think this logic should be adapted to use
ErrRemoveArgument
- Instead of relying on name prefixes, I think the recommended approach is relying on type checks. E.g. check
if v, ok := arg.(*queryContextParam); ok { ... }
and expose a function likefunc QueryContextParam(key string, val any) any { return queryContextParam{...} }
for use outside the package.
if sn == nil { | ||
return n, nil | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This feels weird because:
- The called function
buildSpineSelect
does not sound like a function name that would have side effects (and it didn't before, but now it does by creatingolapContext
). - The early return seems unintuitive – if someone adds more functionality to this function later, they are likely to add it after the spine case handling, and might miss this early return.
@@ -689,7 +699,87 @@ func (a *AST) buildSpineSelect(alias string, spine *Spine, tr *TimeRange) (*Sele | |||
} | |||
|
|||
if spine.TimeRange != nil { | |||
return nil, errors.New("time_range not yet supported in spine") | |||
if a.dialect == drivers.DialectDruid { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nowhere else in ast.go
or astsql.go
is there a hard-coded reference to a dialect. It would be very nice to avoid adding that now. Look at the various places that a.dialect
are used – it always pushes the dialect-specific handling into the dialect implementation. I think something similar should be possible here.
for _, cjs := range cpy.CrossJoinSelects { | ||
for _, f := range cjs.DimFields { | ||
s.DimFields = append(s.DimFields, FieldNode{ | ||
Name: f.Name, | ||
DisplayName: f.DisplayName, | ||
Expr: a.sqlForMember(cpy.Alias, f.Name), | ||
}) | ||
} | ||
|
||
if len(cjs.UnionAllSelects) > 0 { | ||
// All dimensions will be same across UNION ALL SELECTS so we can just pick the first one | ||
for _, f := range cjs.UnionAllSelects[0].DimFields { | ||
s.DimFields = append(s.DimFields, FieldNode{ | ||
Name: f.Name, | ||
DisplayName: f.DisplayName, | ||
Expr: a.sqlForMember(cpy.Alias, f.Name), | ||
}) | ||
} | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is this behavior different than the handling of LeftJoinSelects
and JoinComparisonSelect
? I think this might relate to the previously mentioned issue of not setting FromSelect
?
Ideally after the wrap, these can just be set to nil
since they are handled in the nested select.
@@ -214,6 +215,10 @@ func (e *Executor) Query(ctx context.Context, qry *Query, executionTime *time.Ti | |||
return nil, err | |||
} | |||
|
|||
for k, v := range ast.olapContext { | |||
args = append(args, sql2.NamedArg{Name: drivers.DialectDruid.ContextKeyArgPrefix() + k, Value: v}) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could we avoid a Druid
dialect hard-coding here by having something like additionalArgs
instead of olapContext
?
Then when processing the spine in ast.go
, the dialect might either return extra args or a range SELECT query.
@@ -124,6 +124,7 @@ type TimeSpine struct { | |||
Start time.Time `mapstructure:"start"` | |||
End time.Time `mapstructure:"end"` | |||
Grain TimeGrain `mapstructure:"grain"` | |||
Alias string `mapstructure:"alias"` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What does this do? Ideally it should apply only to the time dimension if it's requested, and in that case, the time dimension already has an alias provided in Dimension
if q.TimeRange == nil || q.TimeRange.Start == nil || q.TimeRange.End == nil { | ||
return nil, fmt.Errorf("time range is required for null fill") | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it might still work if there's a relative time range (like q.TimeRange.IsoDuration
), and this might be needed e.g. for alerts/reports that use a time spine.
It would be resolved to a fixed time range before the query is turned into an AST. It happens here:
func (e *Executor) rewriteQueryTimeRanges(ctx context.Context, qry *Query, executionTime *time.Time) error { |
Start: s, | ||
End: e, | ||
Grain: timeDim.Compute.TimeFloor.Grain, | ||
Alias: timeDim.Name, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it should be handled inside the metricsview
package by applying it to any dimension that has a TimeFloor
applied (and erroring if zero or multiple dimensions have a TimeFloor
)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As a separate comment, can we add some tests for this feature for all of DuckDB, Druid and ClickHouse in runtime/resolvers/testdata
?
These PR adds supports for two features-
fill_missing
- This is a metrics view aggregation query level config and when a computed time dimension is used, it fills in the missing time buckets in the response.treat_nulls_as
- This is a measure level config used to configure what value to fill in for missing time buckets. This also works generally as COALESCING over non empty time buckets.Closes https://github.com/rilldata/rill-private-issues/issues/786