null filling and coalescing for mv agg query #6088

pjain1 · 2024-11-13T11:11:14Z

These PR adds supports for two features-

fill_missing - This is a metrics view aggregation query level config and when a computed time dimension is used, it fills in the missing time buckets in the response.
treat_nulls_as - This is a measure level config used to configure what value to fill in for missing time buckets. This also works generally as COALESCING over non empty time buckets.

Closes https://github.com/rilldata/rill-private-issues/issues/786

begelundmuller · 2024-11-28T11:19:06Z

proto/rill/runtime/v1/resources.proto

@@ -205,6 +205,7 @@ message MetricsViewSpec {
    string format_d3 = 7;
    google.protobuf.Struct format_d3_locale = 13;
    bool valid_percent_of_total = 6;
+    string treat_nulls_as = 14; // TODO what should the type, using string values will not work when coalescing numeric cols


I think we should just consider this to be a SQL expression to be templated into the query literally. For example:

treat_nulls_as: 0 treat_nulls_as: CAST(0 AS HUGEINT) treat_nulls_as: "'Not available'"

begelundmuller · 2024-11-28T13:37:23Z

runtime/drivers/druid/druidsqldriver/druid_api_sql_driver.go

@@ -83,7 +91,7 @@ func (c *sqlConnection) QueryContext(ctx context.Context, query string, args []d
 		context.AfterFunc(ctx, func() {
 			tctx, cancel := context.WithTimeout(context.Background(), 10*time.Second)
 			defer cancel()
-			r, err := http.NewRequestWithContext(tctx, http.MethodDelete, c.dsn+"/"+dr.Context.SQLQueryID, http.NoBody)
+			r, err := http.NewRequestWithContext(tctx, http.MethodDelete, urlutil.MustJoinURL(c.dsn, dr.Context[sqlQueryIDContextKey].(string)), http.NoBody)


Would be nice to avoid the cast here. Could the query ID be stored in *DruidRequest directly so it can be accessed like dr.queryID?

begelundmuller · 2024-11-28T13:43:43Z

runtime/drivers/druid/druidsqldriver/druid_api_sql_driver.go

+	for _, arg := range args {
+		if strings.HasPrefix(arg.Name, drivers.DialectDruid.ContextKeyArgPrefix()) {
+			queryCtx[strings.TrimPrefix(arg.Name, drivers.DialectDruid.ContextKeyArgPrefix())] = arg.Value
+			continue
+		}


Based on https://pkg.go.dev/database/sql/driver (try searching for ErrRemoveArgument on this page for details):

NamedValueChecker also allows queries to accept per-query options as a parameter by returning ErrRemoveArgument from CheckNamedValue.

So two observations here:

I think this logic should be adapted to use ErrRemoveArgument

Instead of relying on name prefixes, I think the recommended approach is relying on type checks. E.g. check if v, ok := arg.(*queryContextParam); ok { ... } and expose a function like func QueryContextParam(key string, val any) any { return queryContextParam{...} } for use outside the package.

begelundmuller · 2024-11-28T13:58:49Z

runtime/metricsview/ast.go

+		if sn == nil {
+			return n, nil
+		}


This feels weird because:

The called function buildSpineSelect does not sound like a function name that would have side effects (and it didn't before, but now it does by creating olapContext).

The early return seems unintuitive – if someone adds more functionality to this function later, they are likely to add it after the spine case handling, and might miss this early return.

begelundmuller · 2024-11-28T14:01:32Z

runtime/metricsview/ast.go

@@ -689,7 +699,87 @@ func (a *AST) buildSpineSelect(alias string, spine *Spine, tr *TimeRange) (*Sele
 	}

 	if spine.TimeRange != nil {
-		return nil, errors.New("time_range not yet supported in spine")
+		if a.dialect == drivers.DialectDruid {


Nowhere else in ast.go or astsql.go is there a hard-coded reference to a dialect. It would be very nice to avoid adding that now. Look at the various places that a.dialect are used – it always pushes the dialect-specific handling into the dialect implementation. I think something similar should be possible here.

begelundmuller · 2024-11-28T14:58:08Z

runtime/metricsview/ast.go

+	for _, cjs := range cpy.CrossJoinSelects {
+		for _, f := range cjs.DimFields {
+			s.DimFields = append(s.DimFields, FieldNode{
+				Name:        f.Name,
+				DisplayName: f.DisplayName,
+				Expr:        a.sqlForMember(cpy.Alias, f.Name),
+			})
+		}
+
+		if len(cjs.UnionAllSelects) > 0 {
+			// All dimensions will be same across UNION ALL SELECTS so we can just pick the first one
+			for _, f := range cjs.UnionAllSelects[0].DimFields {
+				s.DimFields = append(s.DimFields, FieldNode{
+					Name:        f.Name,
+					DisplayName: f.DisplayName,
+					Expr:        a.sqlForMember(cpy.Alias, f.Name),
+				})
+			}
+		}
+	}


Why is this behavior different than the handling of LeftJoinSelects and JoinComparisonSelect? I think this might relate to the previously mentioned issue of not setting FromSelect?

Ideally after the wrap, these can just be set to nil since they are handled in the nested select.

begelundmuller · 2024-11-28T15:01:39Z

runtime/metricsview/executor.go

@@ -214,6 +215,10 @@ func (e *Executor) Query(ctx context.Context, qry *Query, executionTime *time.Ti
 			return nil, err
 		}

+		for k, v := range ast.olapContext {
+			args = append(args, sql2.NamedArg{Name: drivers.DialectDruid.ContextKeyArgPrefix() + k, Value: v})


Could we avoid a Druid dialect hard-coding here by having something like additionalArgs instead of olapContext?

Then when processing the spine in ast.go, the dialect might either return extra args or a range SELECT query.

begelundmuller · 2024-11-28T15:03:19Z

runtime/metricsview/query.go

@@ -124,6 +124,7 @@ type TimeSpine struct {
 	Start time.Time `mapstructure:"start"`
 	End   time.Time `mapstructure:"end"`
 	Grain TimeGrain `mapstructure:"grain"`
+	Alias string    `mapstructure:"alias"`


What does this do? Ideally it should apply only to the time dimension if it's requested, and in that case, the time dimension already has an alias provided in Dimension

begelundmuller · 2024-11-28T15:07:03Z

runtime/queries/metricsview_aggregation.go

+		if q.TimeRange == nil || q.TimeRange.Start == nil || q.TimeRange.End == nil {
+			return nil, fmt.Errorf("time range is required for null fill")
+		}


I think it might still work if there's a relative time range (like q.TimeRange.IsoDuration), and this might be needed e.g. for alerts/reports that use a time spine.

It would be resolved to a fixed time range before the query is turned into an AST. It happens here:

rill/runtime/metricsview/executor_rewrite_time.go

Line 15 in 1d51e8c

func (e *Executor) rewriteQueryTimeRanges(ctx context.Context, qry *Query, executionTime *time.Time) error {

begelundmuller · 2024-11-28T15:10:42Z

runtime/queries/metricsview_aggregation.go

+			Start: s,
+			End:   e,
+			Grain: timeDim.Compute.TimeFloor.Grain,
+			Alias: timeDim.Name,


I think it should be handled inside the metricsview package by applying it to any dimension that has a TimeFloor applied (and erroring if zero or multiple dimensions have a TimeFloor)

begelundmuller

As a separate comment, can we add some tests for this feature for all of DuckDB, Druid and ClickHouse in runtime/resolvers/testdata?

pjain1 added 2 commits November 13, 2024 16:37

null filling for mv agg query on duckdb

2c263c7

lint

96b550b

pjain1 marked this pull request as draft November 13, 2024 11:11

nishantmonu51 added blocker A release blocker issue that should be resolved before a new release and removed blocker A release blocker issue that should be resolved before a new release labels Nov 15, 2024

pjain1 added 7 commits November 20, 2024 16:19

ch support draft

9e82a8d

clean up

e699cc7

more clean up

ab9df22

measure treat nulls as

7e59f84

fill missing

c348e25

support treat nulls as on measure

bcc3df4

druid context - skipEmptyBuckets

a6bd032

pjain1 changed the title ~~null filling for mv agg query on duckdb~~ null filling and coalescing for mv agg query Nov 25, 2024

Merge branch 'main' into time_spine

d6435c9

pjain1 marked this pull request as ready for review November 25, 2024 07:06

lint

54014ff

pjain1 requested a review from begelundmuller November 25, 2024 11:11

Merge branch 'main' into time_spine

9fc5554

begelundmuller requested changes Nov 28, 2024

View reviewed changes

begelundmuller reviewed Nov 28, 2024

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

null filling and coalescing for mv agg query #6088

null filling and coalescing for mv agg query #6088

pjain1 commented Nov 13, 2024 •

edited by begelundmuller

Loading

begelundmuller Nov 28, 2024

begelundmuller Nov 28, 2024

begelundmuller Nov 28, 2024

begelundmuller Nov 28, 2024

begelundmuller Nov 28, 2024

begelundmuller Nov 28, 2024

begelundmuller Nov 28, 2024

begelundmuller Nov 28, 2024

begelundmuller Nov 28, 2024

begelundmuller Nov 28, 2024

begelundmuller left a comment

null filling and coalescing for mv agg query #6088

Are you sure you want to change the base?

null filling and coalescing for mv agg query #6088

Conversation

pjain1 commented Nov 13, 2024 • edited by begelundmuller Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

begelundmuller left a comment

Choose a reason for hiding this comment

pjain1 commented Nov 13, 2024 •

edited by begelundmuller

Loading