Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

refactor(frontend): rework UPDATE to support subqueries #19402

Merged
merged 12 commits into from
Nov 20, 2024
Merged

Conversation

BugenZhao
Copy link
Member

@BugenZhao BugenZhao commented Nov 15, 2024

I hereby agree to the terms of the RisingWave Labs, Inc. Contributor License Agreement.

What's changed and what's your intention?

Rework UPDATE to support subqueries. Close #2672.


Previously, Update has a list of index -> expr to calculate the updated value for each column. This means that its capability cannot exceed a single Project node, so that there's no way to support subquery assignments like UPDATE (v1, v2) = (select 114, 514);.

After this PR, the input of an Update will calculate the old and new values for all columns, so that Update only needs to pick some columns with InputRef to generate the U- and U+ chunks. Complex assignments like subqueries will be handled by the optimizer, just like within a normal SELECT statement.

To support this, a new subquery kind named UpdateSet is added to allow returning multiple columns (still single row) by compositing them into a struct type.

An example demonstrating the plan:

dev=> explain update t set (v1, v2) = (select v, 666 from generate_series(1, 1) as g(v));
                                          QUERY PLAN                                          
----------------------------------------------------------------------------------------------
 BatchExchange { order: [], dist: Single }
 └─BatchUpdate { table: t, exprs: [Field($4, 0:Int32), Field($4, 1:Int32), $2] }
   └─BatchNestedLoopJoin { type: LeftOuter, predicate: true }
     ├─BatchExchange { order: [], dist: Single }
     │ └─BatchScan { table: t, columns: [v1, v2, _row_id, _rw_timestamp] }
     └─BatchMaxOneRow
       └─BatchProject { exprs: [Row(GenerateSeries(1:Int32, 1:Int32), 666:Int32) as $expr1] }
         └─BatchProjectSet { select_list: [GenerateSeries(1:Int32, 1:Int32)] }
           └─BatchValues { rows: [[]] }
(9 rows)

Checklist

  • I have written necessary rustdoc comments
  • I have added necessary unit tests and integration tests
  • All checks passed in ./risedev check (or alias, ./risedev c)

Documentation

  • My PR needs documentation updates. (Please use the Release note section below to summarize the impact on users)

Release note

If this PR includes changes that directly affect users or other significant modifications relevant to the community, kindly draft a release note to provide a concise summary of these changes. Please prioritize highlighting the impact these changes will have on users.

Signed-off-by: Bugen Zhao <[email protected]>
Signed-off-by: Bugen Zhao <[email protected]>
Signed-off-by: Bugen Zhao <[email protected]>
@BugenZhao BugenZhao changed the title refactor(frontend): rework UPDATE refactor(frontend): rework UPDATE to support subqueries Nov 18, 2024
Signed-off-by: Bugen Zhao <[email protected]>
@BugenZhao BugenZhao marked this pull request as ready for review November 18, 2024 08:20
@graphite-app graphite-app bot requested a review from a team November 18, 2024 08:20
@BugenZhao BugenZhao requested review from chenzl25, xiangjinwu and stdrc and removed request for a team November 18, 2024 08:21
Signed-off-by: Bugen Zhao <[email protected]>
Signed-off-by: Bugen Zhao <[email protected]>
Copy link
Contributor

@chenzl25 chenzl25 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A corner case

create table t (a int, b int);
create table y (a int, b int);
insert into y values(11,11),(22,22);
insert into t values(1,1),(2,2);

update t set (a,b) = (select y.a, y.b from y);
-- PG will throw an error ERROR:  more than one row returned by a subquery used as an expression
-- but RW will succeed to execute and return UPDATE 4 rows

@BugenZhao
Copy link
Member Author

BugenZhao commented Nov 19, 2024

A corner case

In this case, the query select 1, (select y.a from y) should also be rejected, but we accepted it. So I presume there's something wrong with BatchMaxOneRow. 🤔

I think it's because we don't enforce input of BatchMaxOneRow to be singleton. 🥵

@chenzl25
Copy link
Contributor

I think it's because we don't enforce input of BatchMaxOneRow to be singleton

Oh, that cloud be the problem

@BugenZhao
Copy link
Member Author

I think it's because we don't enforce input of BatchMaxOneRow to be singleton

Oh, that cloud be the problem

Will fix in #19452

Copy link
Contributor

@chenzl25 chenzl25 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! The idea to use a new kind (UpdateSet) of subquery is brilliant. Please make sure the previous corner case is fixed and add it to a testcase.

@BugenZhao BugenZhao added this pull request to the merge queue Nov 20, 2024
Merged via the queue into main with commit 5f1a59b Nov 20, 2024
30 of 32 checks passed
@BugenZhao BugenZhao deleted the bz/update-rework branch November 20, 2024 09:53
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Support subquery and FROM in UPDATE
2 participants