[Enhancement] support incremental scan ranges deployment at FE side #50189

dirtysalt · 2024-08-23T07:09:06Z

Why I'm doing:

What I'm doing:

CoordinatorPreprocessor

Added assignIncrementalScanRangesToFragmentInstances interface, which recalculates the scan ranges of all fragment instances below a fragment.

Currently the distribution of node2ScanRanges is supported, but more complex data distributions such as node + driver_seq -> scan range are not supported for incremental deployment.

ExecutionSchedule

We currently support two scheduling methods: all-at-once and phased. The difference is that they schedule different sets of fragment instances.

The modification to the code is that after each collection of fragment instances is scheduled, it immediately checks to see if there are any scan nodes left in the fragment instances that have not yet been scheduled. If there are, the scheduling process will continue.

The method used for subsequent scheduling is coordinator.assignIncrementalScanRangesToDeployStates(deployer, states);

Coordinator

Adds the assignIncrementalScanRangesToDeployStates interface, which checks the scan node for more scan ranges based on the deploy states (i.e., the set of locally dispatched fragment instances).

Checks all the following fragment instances against each deploy state.
According to the fragment instances corresponding to these fragment instances, see if there are still scan nodes with scan ranges.
If so, call coordinator preprocess to recalculate node2ScanRanges under fragment instances.
Based on these fragment instances, regenerate the rpc request, and put it under the deploy state.

Arch Diagram

Fixes #50196

What type of PR is this:

Does this PR entail a change in behavior?

Yes, this PR will result in a change in behavior.
No, this PR will not result in a change in behavior.

If yes, please specify the type of change:

Interface/UI changes: syntax, type conversion, expression evaluation, display information
Parameter changes: default values, similar parameters but with different default values
Policy changes: use new policy to replace old one, functionality automatically enabled
Feature removed
Miscellaneous: upgrade & downgrade compatibility, etc.

Checklist:

I have added test cases for my bug fix or my new feature
This pr needs user documentation (for new or modified features or behaviors)
- I have added documentation for my new feature or new function
This is a backport pr

Bugfix cherry-pick branch check:

Signed-off-by: yanz <[email protected]>

...ore/src/main/java/com/starrocks/qe/scheduler/assignment/LocalFragmentAssignmentStrategy.java

fe/fe-core/src/main/java/com/starrocks/qe/DefaultCoordinator.java

fe/fe-core/src/main/java/com/starrocks/qe/scheduler/dag/PhasedExecutionSchedule.java

Signed-off-by: yanz <[email protected]>

sonarcloud · 2024-08-27T02:23:05Z

Quality Gate passed

Issues
14 New issues
0 Accepted issues

Measures
0 Security Hotspots
0.0% Coverage on New Code
0.0% Duplication on New Code

See analysis details on SonarCloud

github-actions · 2024-08-27T03:07:31Z

[BE Incremental Coverage Report]

✅ pass : 0 / 0 (0%)

github-actions · 2024-08-27T08:01:59Z

[Java-Extensions Incremental Coverage Report]

✅ pass : 0 / 0 (0%)

github-actions · 2024-08-27T08:04:11Z

[FE Incremental Coverage Report]

✅ pass : 158 / 163 (96.93%)

file detail

	path	covered_line	new_line	coverage	not_covered_line_detail
🔵	com/starrocks/planner/ScanNode.java	0	1	00.00%	[104]
🔵	com/starrocks/qe/scheduler/Coordinator.java	0	1	00.00%	[136]
🔵	com/starrocks/qe/HDFSBackendSelector.java	34	36	94.44%	[260, 267]
🔵	com/starrocks/qe/DefaultCoordinator.java	44	45	97.78%	[700]
🔵	com/starrocks/qe/scheduler/TFragmentInstanceFactory.java	8	8	100.00%	[]
🔵	com/starrocks/qe/SessionVariable.java	8	8	100.00%	[]
🔵	com/starrocks/qe/scheduler/dag/FragmentInstanceExecState.java	6	6	100.00%	[]
🔵	com/starrocks/qe/scheduler/Deployer.java	2	2	100.00%	[]
🔵	com/starrocks/qe/scheduler/dag/PhasedExecutionSchedule.java	9	9	100.00%	[]
🔵	com/starrocks/qe/scheduler/dag/AllAtOnceExecutionSchedule.java	10	10	100.00%	[]
🔵	com/starrocks/qe/CoordinatorPreprocessor.java	6	6	100.00%	[]
🔵	com/starrocks/qe/scheduler/dag/JobSpec.java	4	4	100.00%	[]
🔵	com/starrocks/qe/scheduler/assignment/FragmentAssignmentStrategyFactory.java	1	1	100.00%	[]
🔵	com/starrocks/qe/scheduler/assignment/BackendSelectorFactory.java	7	7	100.00%	[]
🔵	com/starrocks/qe/scheduler/assignment/LocalFragmentAssignmentStrategy.java	15	15	100.00%	[]
🔵	com/starrocks/planner/HdfsScanNode.java	4	4	100.00%	[]

gensrc/thrift/InternalService.thrift

stephen-shelby · 2024-08-29T06:50:12Z

I will get this error in the following session variables.

dirtysalt · 2024-08-29T08:10:14Z

I will get this error in the following session variables.

if you want to test, you can test this PR [Enhancement] support incremental scan ranges deployment at BE side by dirtysalt · Pull Request #50254 · StarRocks/starrocks
#50254

there need to be some modification on BE side too.

fe/fe-core/src/main/java/com/starrocks/qe/HDFSBackendSelector.java

gensrc/thrift/InternalService.thrift

Youngwb · 2024-08-30T03:11:35Z

Could you do not send empty scan range to all BE, maybe we can send to the incremental scan range to the worker which already has fragment instance before?

fe/fe-core/src/main/java/com/starrocks/qe/HDFSBackendSelector.java

Youngwb · 2024-08-30T03:19:34Z

How do you deal with the incremental partition in hiveTable toThfift

dirtysalt · 2024-08-31T03:39:27Z

How do you deal with the incremental partition in hiveTable toThfift

Don't handle it in this PR. for hive scan node, there is no need to handle incremental partition.

dirtysalt · 2024-08-31T03:42:40Z

Could you do not send empty scan range to all BE, maybe we can send to the incremental scan range to the worker which already has fragment instance before?

Yes. It will be more complicated, we can optimize in next PR. We can

add a extra flag to fragment instances which means it already has fragment plan.
in the end, if there is no more scan range, we have to send a empty scan range to fragment instance to notify it "all is done".

anyway, it's just an optimization.

dirtysalt · 2024-08-31T03:46:04Z

@Youngwb

I don't think the placeholder is good name, how about is_empty ?

I can not reply this comment. No problem, I can fix that.

ZiheLiu · 2024-09-02T08:25:58Z

fe/fe-core/src/main/java/com/starrocks/qe/DefaultCoordinator.java

+        if (connectContext != null) {
+            if (connectContext.getSessionVariable().isEnableConnectorIncrementalScanRanges()) {
+                jobSpec.setIncrementalScanRanges(true);
+            }
+        }


Better to move to JobSpec::Builder::commonProperties and not expose setIncrementalScanRanges.

public Builder commonProperties(ConnectContext context) { TWorkGroup newResourceGroup = prepareResourceGroup( context, ResourceGroupClassifier.QueryType.fromTQueryType(instance.queryOptions.getQuery_type())); this.resourceGroup(newResourceGroup); this.enablePipeline(isEnablePipeline(context, instance.fragments)); instance.connectContext = context; instance.enableQueue = isEnableQueue(context); instance.needQueued = needCheckQueue(); instance.enableGroupLevelQueue = instance.enableQueue && GlobalVariable.isEnableGroupLevelQueryQueue(); + instance.incrementalScanRanges = connectContextgetSessionVariable().isEnableConnectorIncrementalScanRanges(); return this; }

fe/fe-core/src/main/java/com/starrocks/qe/DefaultCoordinator.java

fe/fe-core/src/main/java/com/starrocks/qe/HDFSBackendSelector.java

fe/fe-core/src/main/java/com/starrocks/qe/scheduler/dag/AllAtOnceExecutionSchedule.java

Signed-off-by: yanz <[email protected]> Signed-off-by: zhiminr.ren <[email protected]>

dirtysalt added 2 commits August 22, 2024 10:26

add session variable

db56bc7

Signed-off-by: yanz <[email protected]>

deploy incremental scan ranges

7dfacea

Signed-off-by: yanz <[email protected]>

dirtysalt requested a review from a team as a code owner August 23, 2024 07:09

mergify bot assigned dirtysalt Aug 23, 2024

starrocks-cr bot reviewed Aug 23, 2024

View reviewed changes

...ore/src/main/java/com/starrocks/qe/scheduler/assignment/LocalFragmentAssignmentStrategy.java Show resolved Hide resolved

starrocks-cr bot reviewed Aug 23, 2024

View reviewed changes

fe/fe-core/src/main/java/com/starrocks/qe/DefaultCoordinator.java Show resolved Hide resolved

starrocks-cr bot reviewed Aug 23, 2024

View reviewed changes

fe/fe-core/src/main/java/com/starrocks/qe/scheduler/dag/PhasedExecutionSchedule.java Show resolved Hide resolved

dirtysalt changed the title ~~[Feature] support incremental scan ranges deployment~~ [Enhancement] support incremental scan ranges deployment Aug 23, 2024

dirtysalt changed the title ~~[Enhancement] support incremental scan ranges deployment~~ [Refactor] support incremental scan ranges deployment Aug 23, 2024

mergify bot mentioned this pull request Aug 23, 2024

[Refactor] support incremental scan ranges deployment #50198

Closed

24 tasks

dirtysalt enabled auto-merge (squash) August 23, 2024 08:56

send end of scan ranges to BE

219c67b

Signed-off-by: yanz <[email protected]>

wanpengfei-git added the PROTO-REVIEW label Aug 23, 2024

wanpengfei-git requested a review from a team August 23, 2024 12:08

dirtysalt changed the title ~~[Refactor] support incremental scan ranges deployment~~ [Refactor] support incremental scan ranges deployment at FE side Aug 26, 2024

dirtysalt mentioned this pull request Aug 26, 2024

To support incremental scan ranges deployment. #50196

Closed

4 tasks

dirtysalt changed the title ~~[Refactor] support incremental scan ranges deployment at FE side~~ [Enhancement] support incremental scan ranges deployment at FE side Aug 26, 2024

dirtysalt added 2 commits August 27, 2024 07:26

update ut

5fbd1d1

Signed-off-by: yanz <[email protected]>

extend fe ut

3f6b246

Signed-off-by: yanz <[email protected]>

stephen-shelby reviewed Aug 28, 2024

View reviewed changes

gensrc/thrift/InternalService.thrift Show resolved Hide resolved

Youngwb reviewed Aug 30, 2024

View reviewed changes

fe/fe-core/src/main/java/com/starrocks/qe/HDFSBackendSelector.java Show resolved Hide resolved

gensrc/thrift/InternalService.thrift Show resolved Hide resolved

Youngwb reviewed Aug 30, 2024

View reviewed changes

fe/fe-core/src/main/java/com/starrocks/qe/HDFSBackendSelector.java Show resolved Hide resolved

ZiheLiu reviewed Sep 2, 2024

View reviewed changes

tracymacding approved these changes Sep 4, 2024

View reviewed changes

ZiheLiu approved these changes Sep 4, 2024

View reviewed changes

Youngwb approved these changes Sep 4, 2024

View reviewed changes

dirtysalt merged commit 234d20a into StarRocks:main Sep 4, 2024
91 of 92 checks passed

dirtysalt deleted the fe-incremental-scan-ranges branch September 4, 2024 08:33

renzhimin7 pushed a commit to renzhimin7/starrocks that referenced this pull request Nov 7, 2024

[Refactor] support incremental scan ranges deployment (StarRocks#50189)

c8cdf3d

Signed-off-by: yanz <[email protected]> Signed-off-by: zhiminr.ren <[email protected]>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Enhancement] support incremental scan ranges deployment at FE side #50189

[Enhancement] support incremental scan ranges deployment at FE side #50189

dirtysalt commented Aug 23, 2024 •

edited

Loading

sonarcloud bot commented Aug 27, 2024

github-actions bot commented Aug 27, 2024

github-actions bot commented Aug 27, 2024

github-actions bot commented Aug 27, 2024

stephen-shelby commented Aug 29, 2024

dirtysalt commented Aug 29, 2024

Youngwb commented Aug 30, 2024 •

edited

Loading

Youngwb commented Aug 30, 2024

dirtysalt commented Aug 31, 2024

dirtysalt commented Aug 31, 2024

dirtysalt commented Aug 31, 2024

ZiheLiu Sep 2, 2024

[Enhancement] support incremental scan ranges deployment at FE side #50189

[Enhancement] support incremental scan ranges deployment at FE side #50189

Conversation

dirtysalt commented Aug 23, 2024 • edited Loading

Why I'm doing:

What I'm doing:

CoordinatorPreprocessor

ExecutionSchedule

Coordinator

Arch Diagram

What type of PR is this:

Checklist:

Bugfix cherry-pick branch check:

sonarcloud bot commented Aug 27, 2024

Quality Gate passed

github-actions bot commented Aug 27, 2024

[BE Incremental Coverage Report]

github-actions bot commented Aug 27, 2024

[Java-Extensions Incremental Coverage Report]

github-actions bot commented Aug 27, 2024

[FE Incremental Coverage Report]

file detail

stephen-shelby commented Aug 29, 2024

dirtysalt commented Aug 29, 2024

Youngwb commented Aug 30, 2024 • edited Loading

Youngwb commented Aug 30, 2024

dirtysalt commented Aug 31, 2024

dirtysalt commented Aug 31, 2024

dirtysalt commented Aug 31, 2024

ZiheLiu Sep 2, 2024

Choose a reason for hiding this comment

dirtysalt commented Aug 23, 2024 •

edited

Loading

Youngwb commented Aug 30, 2024 •

edited

Loading