-
Notifications
You must be signed in to change notification settings - Fork 1.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Feature] (Part 1) Support to create materialized views with multi partition columns #52577
base: main
Are you sure you want to change the base?
[Feature] (Part 1) Support to create materialized views with multi partition columns #52577
Conversation
0c4d8e1
to
9667c77
Compare
f5f20ab
to
c760b55
Compare
// ref bae table to partition column slot ref | ||
private transient volatile Optional<Map<Table, SlotRef>> refBaseTablePartitionSlotsOpt = Optional.empty(); | ||
private Optional<Map<Table, List<SlotRef>>> refBaseTablePartitionSlotsOpt = Optional.empty(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it seems that Expr used by partition-by-clause has exactly one SlotRef and the Expr is simple, so get SlotRef is not time-cosuming, why not just use Expr in these data structure, when corresponding SlotRef is required, just obtain it from the Expr directly on-the-fly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do we have a chance that a single partition-by expr is built from multiple SlotRef? for an example:
base table partition by (Country, Province, City, District), but MV's partitions can be mapped in such a way partition by ws_concat('#', Country, Province, City, District).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it seems that Expr used by partition-by-clause has exactly one SlotRef and the Expr is simple, so get SlotRef is not time-cosuming, why not just use Expr in these data structure, when corresponding SlotRef is required, just obtain it from the Expr directly on-the-fly.
There are two reasons:
- Even
SlotRef
compute is not time-consuming, we can only compute it once only to avoid repeating ... refBaseTablePartitionColumnsOpt
refersrefBaseTablePartitionExprsOpt
and it's called multi times, so needs to make it ready first.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do we have a chance that a single partition-by expr is built from multiple SlotRef? for an example:
base table partition by (Country, Province, City, District), but MV's partitions can be mapped in such a way partition by ws_concat('#', Country, Province, City, District).
Yes, we can support it, but it's still a huge work. But we may treat ws_concat('#', Country, Province, City, District).
as a generated column to simplify the partition remapping.
We can discuss this later if needed.
@@ -705,11 +717,12 @@ public void setQueryOutputIndices(List<Integer> queryOutputIndices) { | |||
* NOTE: Only one column is supported for now, support more columns in the future. | |||
* @return the partition column of the materialized view | |||
*/ | |||
public Optional<Column> getPartitionColumn() { | |||
public Optional<Column> getRangePartitionColumn() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
getRangePartitionFirstColumn
@@ -718,17 +731,17 @@ public Optional<Column> getPartitionColumn() { | |||
* NOTE: only one partition expr is supported for now. | |||
* @return the partition expr of the range partitioned materialized view | |||
*/ | |||
public Expr getPartitionExpr() { | |||
public Expr getRangePartitionExpr() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
use Optional instead?
Table table = MvUtils.getTableChecked(tableInfo); | ||
List<MVPartitionExpr> mvPartitionExprs = MvUtils.getMvPartitionExpr(partitionExprMaps, table); | ||
if (CollectionUtils.isEmpty(mvPartitionExprs)) { | ||
LOG.info("Base table {} contains no partition expr, skip", table.getName()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
would this log be printed each time that mv is refreshed or rewritten?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No. analyzeRefBaseTablePartitionExprs
only called in the onReload
phase which should be called in FE Restart.
@@ -186,23 +187,26 @@ public static MvBaseTableUpdateInfo getMvBaseTableUpdateInfo(MaterializedView mv | |||
if (baseUpdatedPartitionNames == null) { | |||
return null; | |||
} | |||
Map<Table, Column> partitionTableAndColumns = mv.getRefBaseTablePartitionColumns(); | |||
if (!partitionTableAndColumns.containsKey(baseTable)) { | |||
Map<Table, List<Column>> refBaseTablePartitionColumns = mv.getRefBaseTablePartitionColumns(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
only MV on external tables support multi-column partition-by?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, Native List Partition tables can also support multi partition columns which can be base tables of MV too.
// NOTE: If target partition values contain `null partition`, the generated predicate should | ||
// contain `is null` predicate rather than `in (null) or = null` because the later one is not correct. | ||
if (isContainsNullPartition) { | ||
IsNullPredicate isNullPredicate = new IsNullPredicate(refBaseTablePartitionExpr, false); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if this IsNullPredicate is capable be eliminated, is it eliminated actually?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it should be eliminated correctly, since generated partition predicates will be optimized again by Optimizer.
Signed-off-by: shuming.li <[email protected]>
Signed-off-by: shuming.li <[email protected]>
c760b55
to
d457fe7
Compare
Signed-off-by: shuming.li <[email protected]>
Quality Gate failedFailed conditions See analysis details on SonarCloud Catch issues before they fail your Quality Gate with our IDE extension SonarLint |
[Java-Extensions Incremental Coverage Report]✅ pass : 0 / 0 (0%) |
[FE Incremental Coverage Report]✅ pass : 616 / 702 (87.75%) file detail
|
[BE Incremental Coverage Report]✅ pass : 0 / 0 (0%) |
Why I'm doing:
What I'm doing:
Parser
primaryExpression
in creating mv;CreateMaterializedViewStatement
class member variablesAnalyzer
checkPartitionColumnExprs
)buildPartitionInfo
)multi-table ref base tables
policy can still apply formulti partition columns
.getMVPartitionExprsChecked
iterate each partition column for multi ref base tables and ensure each partition expression must have the same size of ref base tables.MaterializedView
Since
parser
has supportedmulti partition columns
,MaterializedView
should also be changed too.Main Changes:
LinkedHashMap
instead ofMap
forpartitionExprMaps
is to keep mv’s partition expression order as user’s defined order, otherwise we cannot decide the correct order in multi partition columns case.After the variables’ change, we need to refactor all methods which use those variables which will cause a lot of files’ change.
NOTE: I do some refactors here:
refBaseTablePartitionExprsOpt
/refBaseTablePartitionExprsOpt/refBaseTablePartitionColumnsOpt‘s initialization into
onReloadmethod to avoid initialization in each
get` method, because those variables only need to be analyzed once in the create/restart.getPartitionExpr
togetRangePartitionExpr
because this method only can be called in RangeBasedMV.getPartitionColumn
togetRangePartitionColumn
because this method only can be called in RangeBasedMV.MV Refresh
Refresh Partition Predicate
generatePartitionPredicate
needs to consider multi partition columns:dt in (a, b, c)
(dt=a and area='x') or (dt=b and area='x') or (dt=b and area='x'))
iceberg
transform, we cannot use slot ref directly, needs to considertransform
function:identify
, partition predicates:(transform(dt)=a and area='x')
[ConnectorPartitionTraits.java]
getPartitionList
api should return result as the input partition columns’ order to support mv’s partition columns are not the same with the base table.generateMVPartitionName
should respect ref base table which contain multi partition columns.MV Rewrite
[OptCompensator.java]
getExternalTableCompensatePlan
needs to consider compensate partition predicates likeMV Refresh
.Fixes #52576
What type of PR is this:
Does this PR entail a change in behavior?
If yes, please specify the type of change:
Checklist:
Bugfix cherry-pick branch check: