[Feature] (Part 1) Support to create materialized views with multi partition columns #52577

LiShuMing · 2024-11-04T04:46:36Z

Why I'm doing:

MV cannot be used if base tables have multi partition columns and cannot be converted into single partition column mv.

What I'm doing:

Support to create materialized views with multi partition columns of native/external tables.

Parser

Support mutli primaryExpression in creating mv;

mvPartitionExprs:
    primaryExpression
    | '(' primaryExpression (',' primaryExpression)* ')'
    ;

materializedViewDesc
    : (PARTITION BY mvPartitionExprs)

Change CreateMaterializedViewStatement class member variables

// MV's partition expressions
private final List<Expr> partitionByExprs;
// MV's output columns that are referred by mv's partition expressions
private List<Column> partitionColumns;
// Ref base table partition expression referred by mv's partition by expressions
private List<Expr> partitionRefTableExprs;

Analyzer

[MaterializedViewAnalyzer.java]
- Check each partition’s validations like single partition expression before.
- If a mv has multi partition expressions:
  - mv’s partition columns’ number is the same size with the ref base table’s partition column numbers(checkPartitionColumnExprs)
  - all mv’s partition expressions must be column ref rather than function calls(buildPartitionInfo)
[MVPartitionExprResolver.java]
- NOTE: original multi-table ref base tables policy can still apply for multi partition columns.
- getMVPartitionExprsChecked iterate each partition column for multi ref base tables and ensure each partition expression must have the same size of ref base tables.

create materialized view mv1
partition by (province, dt, age) 
REFRESH DEFERRED MANUAL 
properties ('partition_refresh_number' = '-1')
as 
select dt, province, age, sum(id) from t3 group by dt, province, age 
UNION ALL 
select dt, province, age, sum(id) from t4 group by dt, province, age 
;

MaterializedView

Since parser has supported multi partition columns , MaterializedView should also be changed too.

/*
 * A Materialized View is a database object that contains the results of a query.
 * - Base tables are the tables that are referenced in the view definition.
 * - Ref base tables are some special base tables of a materialized view that are referenced by mv's partition expression.
 * </p>
 * In our partition-change-tracking mechanism, we need to track the partition change of ref base tables to refresh the associated
 * partitions of the materialized view.
 * </p>
 * NOTE:
 * - A Materialized View can have multi base-tables which are tables that are referenced in the view definition.
 * - A Materialized View can have multi ref-base-tables which are tables that are referenced by mv's partition expression.
 * - A Materialized View's partition expressions can be range partitioned or list partitioned:
 *      - If mv is range partitioned, it must only have one partition column.
 *      - If mv is list partitioned, it can have multi partition columns.
 */

Main Changes:

Change member variables below to support multi partition columns.
Use LinkedHashMap instead of Map for partitionExprMaps is to keep mv’s partition expression order as user’s defined order, otherwise we cannot decide the correct order in multi partition columns case.

    @SerializedName(value = "partitionExprMaps")
    private Map<ExpressionSerializedObject, ExpressionSerializedObject> serializedPartitionExprMaps;
    // Use LinedHashMap to keep the order of partition exprs by the user's defined order which can be used when mv contains multi
    // partition columns.
    private LinkedHashMap<Expr, SlotRef> partitionExprMaps;
    
    // ref base table to partition expression
    private Optional<Map<Table, List<Expr>>> refBaseTablePartitionExprsOpt = Optional.empty();
    // ref bae table to partition column slot ref
    private Optional<Map<Table, List<SlotRef>>> refBaseTablePartitionExprsOpt = Optional.empty();
    // ref bae table to partition column
    private Optional<Map<Table, List<Column>>> refBaseTablePartitionColumnsOpt = Optional.empty()

After the variables’ change, we need to refactor all methods which use those variables which will cause a lot of files’ change.

NOTE: I do some refactors here:

move all refBaseTablePartitionExprsOpt/refBaseTablePartitionExprsOpt/refBaseTablePartitionColumnsOpt ‘s initialization intoonReloadmethod to avoid initialization in eachget` method, because those variables only need to be analyzed once in the create/restart.
rename getPartitionExpr to getRangePartitionExpr because this method only can be called in RangeBasedMV.
rename getPartitionColumn to getRangePartitionColumn because this method only can be called in RangeBasedMV.

MV Refresh

Refresh Partition Predicate

generatePartitionPredicate needs to consider multi partition columns:
- if mv only contains one partition columns(eg: dt) , partition predicates: dt in (a, b, c)
- if mv only contains one partition columns(eg: dt, area) , partition predicates: (dt=a and area='x') or (dt=b and area='x') or (dt=b and area='x'))
If partition column is an iceberg transform, we cannot use slot ref directly, needs to consider transform function:
- if mv only contains one partition columns(eg: dt, area) , and the column’s transform is not identify, partition predicates: (transform(dt)=a and area='x')

[ConnectorPartitionTraits.java]

getPartitionList api should return result as the input partition columns’ order to support mv’s partition columns are not the same with the base table.

public abstract Map<String, PListCell> getPartitionList(List<Column> partitionColumns) throws AnalysisException;

generateMVPartitionName should respect ref base table which contain multi partition columns.

MV Rewrite

[OptCompensator.java]

getExternalTableCompensatePlan needs to consider compensate partition predicates like MV Refresh.

Fixes #52576

What type of PR is this:

Does this PR entail a change in behavior?

Yes, this PR will result in a change in behavior.
No, this PR will not result in a change in behavior.

If yes, please specify the type of change:

Interface/UI changes: syntax, type conversion, expression evaluation, display information
Parameter changes: default values, similar parameters but with different default values
Policy changes: use new policy to replace old one, functionality automatically enabled
Feature removed
Miscellaneous: upgrade & downgrade compatibility, etc.

Checklist:

I have added test cases for my bug fix or my new feature
This pr needs user documentation (for new or modified features or behaviors)
- I have added documentation for my new feature or new function
This is a backport pr

Bugfix cherry-pick branch check:

satanson · 2024-11-11T12:03:18Z

fe/fe-core/src/main/java/com/starrocks/catalog/MaterializedView.java

    // ref bae table to partition column slot ref
-    private transient volatile Optional<Map<Table, SlotRef>> refBaseTablePartitionSlotsOpt = Optional.empty();
+    private Optional<Map<Table, List<SlotRef>>> refBaseTablePartitionSlotsOpt = Optional.empty();


it seems that Expr used by partition-by-clause has exactly one SlotRef and the Expr is simple, so get SlotRef is not time-cosuming, why not just use Expr in these data structure, when corresponding SlotRef is required, just obtain it from the Expr directly on-the-fly.

do we have a chance that a single partition-by expr is built from multiple SlotRef? for an example:
base table partition by (Country, Province, City, District), but MV's partitions can be mapped in such a way partition by ws_concat('#', Country, Province, City, District).

it seems that Expr used by partition-by-clause has exactly one SlotRef and the Expr is simple, so get SlotRef is not time-cosuming, why not just use Expr in these data structure, when corresponding SlotRef is required, just obtain it from the Expr directly on-the-fly.

There are two reasons:

Even SlotRef compute is not time-consuming, we can only compute it once only to avoid repeating ...

refBaseTablePartitionColumnsOpt refers refBaseTablePartitionExprsOpt and it's called multi times, so needs to make it ready first.

do we have a chance that a single partition-by expr is built from multiple SlotRef? for an example:
base table partition by (Country, Province, City, District), but MV's partitions can be mapped in such a way partition by ws_concat('#', Country, Province, City, District).

Yes, we can support it, but it's still a huge work. But we may treat ws_concat('#', Country, Province, City, District). as a generated column to simplify the partition remapping.

We can discuss this later if needed.

satanson · 2024-11-11T12:08:41Z

fe/fe-core/src/main/java/com/starrocks/catalog/MaterializedView.java

@@ -705,11 +717,12 @@ public void setQueryOutputIndices(List<Integer> queryOutputIndices) {
     * NOTE: Only one column is supported for now, support more columns in the future.
     * @return the partition column of the materialized view
     */
-    public Optional<Column> getPartitionColumn() {
+    public Optional<Column> getRangePartitionColumn() {


getRangePartitionFirstColumn

satanson · 2024-11-11T12:09:40Z

fe/fe-core/src/main/java/com/starrocks/catalog/MaterializedView.java

@@ -718,17 +731,17 @@ public Optional<Column> getPartitionColumn() {
     * NOTE: only one partition expr is supported for now.
     * @return the partition expr of the range partitioned materialized view
     */
-    public Expr getPartitionExpr() {
+    public Expr getRangePartitionExpr() {


use Optional instead?

satanson · 2024-11-11T12:14:57Z

fe/fe-core/src/main/java/com/starrocks/catalog/MaterializedView.java

+            Table table = MvUtils.getTableChecked(tableInfo);
+            List<MVPartitionExpr> mvPartitionExprs = MvUtils.getMvPartitionExpr(partitionExprMaps, table);
+            if (CollectionUtils.isEmpty(mvPartitionExprs)) {
+                LOG.info("Base table {} contains no partition expr, skip", table.getName());


would this log be printed each time that mv is refreshed or rewritten?

No. analyzeRefBaseTablePartitionExprs only called in the onReload phase which should be called in FE Restart.

satanson · 2024-11-11T12:17:20Z

fe/fe-core/src/main/java/com/starrocks/catalog/MvRefreshArbiter.java

@@ -186,23 +187,26 @@ public static MvBaseTableUpdateInfo getMvBaseTableUpdateInfo(MaterializedView mv
            if (baseUpdatedPartitionNames == null) {
                return null;
            }
-            Map<Table, Column> partitionTableAndColumns = mv.getRefBaseTablePartitionColumns();
-            if (!partitionTableAndColumns.containsKey(baseTable)) {
+            Map<Table, List<Column>> refBaseTablePartitionColumns = mv.getRefBaseTablePartitionColumns();


only MV on external tables support multi-column partition-by?

No, Native List Partition tables can also support multi partition columns which can be base tables of MV too.

satanson · 2024-11-11T12:38:35Z

fe/fe-core/src/main/java/com/starrocks/scheduler/mv/MVPCTRefreshListPartitioner.java

+            // NOTE: If target partition values contain `null partition`, the generated predicate should
+            // contain `is null` predicate rather than `in (null) or = null` because the later one is not correct.
+            if (isContainsNullPartition) {
+                IsNullPredicate isNullPredicate = new IsNullPredicate(refBaseTablePartitionExpr, false);


if this IsNullPredicate is capable be eliminated, is it eliminated actually?

I think it should be eliminated correctly, since generated partition predicates will be optimized again by Optimizer.

Signed-off-by: shuming.li <[email protected]>

sonarcloud · 2024-11-12T03:42:54Z

Quality Gate failed

Failed conditions
C Reliability Rating on New Code (required ≥ A)

See analysis details on SonarCloud

Catch issues before they fail your Quality Gate with our IDE extension SonarLint

github-actions · 2024-11-12T05:08:46Z

[Java-Extensions Incremental Coverage Report]

✅ pass : 0 / 0 (0%)

github-actions · 2024-11-12T05:09:28Z

[FE Incremental Coverage Report]

✅ pass : 616 / 702 (87.75%)

file detail

	path	covered_line	new_line	coverage	not_covered_line_detail
🔵	com/starrocks/sql/optimizer/rule/transformation/materialization/MvUtils.java	11	24	45.83%	[1543, 1544, 1545, 1547, 1548, 1549, 1550, 1551, 1552, 1553, 1555, 1556, 1557]
🔵	com/starrocks/sql/optimizer/rule/transformation/materialization/OptExpressionDuplicator.java	3	6	50.00%	[265, 266, 267]
🔵	com/starrocks/sql/optimizer/MaterializationContext.java	3	4	75.00%	[379]
🔵	com/starrocks/scheduler/PartitionBasedMvRefreshProcessor.java	18	23	78.26%	[254, 255, 256, 423, 424]
🔵	com/starrocks/scheduler/mv/MVPCTRefreshListPartitioner.java	46	58	79.31%	[171, 172, 173, 203, 207, 208, 219, 220, 230, 241, 242, 243]
🔵	com/starrocks/sql/common/SyncPartitionUtils.java	4	5	80.00%	[757]
🔵	com/starrocks/scheduler/mv/MVPCTRefreshRangePartitioner.java	12	15	80.00%	[156, 179, 181]
🔵	com/starrocks/scheduler/mv/MVPCTRefreshPlanBuilder.java	33	41	80.49%	[159, 198, 204, 309, 311, 336, 338, 393]
🔵	com/starrocks/server/LocalMetastore.java	16	19	84.21%	[3073, 3074, 3087]
🔵	com/starrocks/sql/optimizer/rule/transformation/materialization/MvPartitionCompensator.java	16	19	84.21%	[475, 694, 763]
🔵	com/starrocks/mv/analyzer/MVPartitionExprResolver.java	27	31	87.10%	[558, 560, 625, 628]
🔵	com/starrocks/sql/analyzer/MaterializedViewAnalyzer.java	122	140	87.14%	[645, 682, 683, 697, 698, 699, 700, 709, 710, 711, 733, 734, 812, 813, 850, 851, 920, 997]
🔵	com/starrocks/connector/iceberg/IcebergPartitionUtils.java	24	27	88.89%	[223, 224, 249]
🔵	com/starrocks/sql/common/RangePartitionDiffer.java	22	24	91.67%	[233, 349]
🔵	com/starrocks/sql/optimizer/rule/transformation/materialization/AggregatedTimeSeriesRewriter.java	11	12	91.67%	[139]
🔵	com/starrocks/catalog/MaterializedView.java	79	82	96.34%	[739, 742, 744]
🔵	com/starrocks/sql/optimizer/rule/transformation/materialization/MaterializedViewRewriter.java	32	33	96.97%	[2133]
🔵	com/starrocks/catalog/OlapTable.java	25	26	96.15%	[1245]
🔵	com/starrocks/connector/PartitionUtil.java	25	26	96.15%	[361]
🔵	com/starrocks/scheduler/mv/MVPCTRefreshPartitioner.java	3	3	100.00%	[]
🔵	com/starrocks/mv/analyzer/MVPartitionSlotRefResolver.java	5	5	100.00%	[]
🔵	com/starrocks/scheduler/mv/MVVersionManager.java	5	5	100.00%	[]
🔵	com/starrocks/sql/optimizer/rule/transformation/materialization/compensation/OptCompensator.java	9	9	100.00%	[]
🔵	com/starrocks/sql/ast/CreateMaterializedViewStatement.java	6	6	100.00%	[]
🔵	com/starrocks/catalog/mv/MVTimelinessArbiter.java	2	2	100.00%	[]
🔵	com/starrocks/scheduler/TableSnapshotInfo.java	1	1	100.00%	[]
🔵	com/starrocks/scheduler/mv/MVPCTMetaRepairer.java	1	1	100.00%	[]
🔵	com/starrocks/connector/partitiontraits/DefaultTraits.java	1	1	100.00%	[]
🔵	com/starrocks/catalog/MvRefreshArbiter.java	8	8	100.00%	[]
🔵	com/starrocks/catalog/mv/MVTimelinessListPartitionArbiter.java	7	7	100.00%	[]
🔵	com/starrocks/catalog/mv/MVTimelinessRangePartitionArbiter.java	12	12	100.00%	[]
🔵	com/starrocks/connector/partitiontraits/CachedPartitionTraits.java	1	1	100.00%	[]
🔵	com/starrocks/connector/partitiontraits/OlapPartitionTraits.java	1	1	100.00%	[]
🔵	com/starrocks/sql/common/ListPartitionDiffer.java	14	14	100.00%	[]
🔵	com/starrocks/sql/common/PartitionDiffer.java	5	5	100.00%	[]
🔵	com/starrocks/sql/optimizer/rule/transformation/materialization/compensation/MVCompensationBuilder.java	6	6	100.00%	[]

github-actions · 2024-11-12T05:09:35Z

[BE Incremental Coverage Report]

✅ pass : 0 / 0 (0%)

LiShuMing requested review from a team as code owners November 4, 2024 04:46

github-actions bot added the behavior_changed label Nov 4, 2024

mergify bot assigned LiShuMing Nov 4, 2024

LiShuMing changed the title ~~[Feature] (Part 1) [WIP] Support multi partition exprs for mv~~ [Feature] (Part 1) [WIP] Support to create materialized views with multi partition columns Nov 5, 2024

LiShuMing marked this pull request as draft November 6, 2024 03:27

LiShuMing force-pushed the fix/main/support_multi_partition_cols branch from 0c4d8e1 to 9667c77 Compare November 6, 2024 10:59

LiShuMing changed the title ~~[Feature] (Part 1) [WIP] Support to create materialized views with multi partition columns~~ [Feature] (Part 1) Support to create materialized views with multi partition columns Nov 6, 2024

LiShuMing marked this pull request as ready for review November 6, 2024 11:00

LiShuMing force-pushed the fix/main/support_multi_partition_cols branch 6 times, most recently from f5f20ab to c760b55 Compare November 8, 2024 07:30

satanson reviewed Nov 11, 2024

View reviewed changes

LiShuMing added 2 commits November 12, 2024 10:27

Support to create materialized views with multi partition columns

40e7ded

Signed-off-by: shuming.li <[email protected]>

fix comments

d457fe7

Signed-off-by: shuming.li <[email protected]>

LiShuMing force-pushed the fix/main/support_multi_partition_cols branch from c760b55 to d457fe7 Compare November 12, 2024 02:51

fix bugs

89d502d

Signed-off-by: shuming.li <[email protected]>

satanson approved these changes Nov 13, 2024

View reviewed changes

Seaven approved these changes Nov 13, 2024

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Feature] (Part 1) Support to create materialized views with multi partition columns #52577

[Feature] (Part 1) Support to create materialized views with multi partition columns #52577

LiShuMing commented Nov 4, 2024 •

edited

Loading

satanson Nov 11, 2024

satanson Nov 11, 2024

LiShuMing Nov 12, 2024

LiShuMing Nov 12, 2024

satanson Nov 11, 2024

satanson Nov 11, 2024

satanson Nov 11, 2024

LiShuMing Nov 12, 2024

satanson Nov 11, 2024

LiShuMing Nov 12, 2024

satanson Nov 11, 2024

LiShuMing Nov 12, 2024

sonarcloud bot commented Nov 12, 2024

github-actions bot commented Nov 12, 2024

github-actions bot commented Nov 12, 2024

github-actions bot commented Nov 12, 2024

[Feature] (Part 1) Support to create materialized views with multi partition columns #52577

Are you sure you want to change the base?

[Feature] (Part 1) Support to create materialized views with multi partition columns #52577

Conversation

LiShuMing commented Nov 4, 2024 • edited Loading

Why I'm doing:

What I'm doing:

Parser

Analyzer

MaterializedView

MV Refresh

Refresh Partition Predicate

[ConnectorPartitionTraits.java]

MV Rewrite

[OptCompensator.java]

What type of PR is this:

Checklist:

Bugfix cherry-pick branch check:

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

sonarcloud bot commented Nov 12, 2024

Quality Gate failed

github-actions bot commented Nov 12, 2024

[Java-Extensions Incremental Coverage Report]

github-actions bot commented Nov 12, 2024

[FE Incremental Coverage Report]

file detail

github-actions bot commented Nov 12, 2024

[BE Incremental Coverage Report]

LiShuMing commented Nov 4, 2024 •

edited

Loading