Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Enhancement] support incremental scan ranges deployment at BE side #50254

Merged
merged 20 commits into from
Sep 19, 2024

Conversation

dirtysalt
Copy link
Contributor

@dirtysalt dirtysalt commented Aug 26, 2024

Why I'm doing:

What I'm doing:

Compared to the FE modification, the BE modification is simpler, after receiving the request for incremental scan ranges

  1. find the fragment context according to query_id + fragment_instance_id
  2. find the corresponding morsel queue factory according to scan_node_id
  3. morsel queue factory adds the incremental scan ranges to morsel queue

Fixes #50196

benchmark

sf100

default with file cache

Query Default X=10 X=50 X=100 X=500
Q01 3407 4464 3216 3274 3241
Q02 354 362 330 339 340
Q03 2126 2362 2031 2096 2048
Q04 1322 1381 1051 1088 1071
Q05 2707 2710 2538 2650 2596
Q06 960 1906 1146 858 1133
Q07 2495 2424 2333 2445 2440
Q08 1562 1720 1572 1637 1597
Q09 3965 4024 3605 3770 3624
Q10 2420 2485 2567 2481 2423
Q11 302 296 336 310 299
Q12 1156 1307 1039 1000 1001
Q13 2310 2463 2291 2301 2445
Q14 947 965 911 967 961
Q15 946 813 1157 924 1099
Q16 1182 925 897 891 895
Q17 879 1291 851 990 870
Q18 7486 9813 7449 7663 7770
Q19 1107 1255 1022 1045 1298
Q20 1147 990 1292 1252 1120
Q21 3369 3763 3455 3441 3518
Q22 435 476 459 446 459
SUM 42584 48195 41548 41868 42248

no file cache

Query Default X=10 X=50 X=100 X=500
Q01 3167 4149 3209 3222 3257
Q02 378 396 378 408 381
Q03 2053 2217 2097 2068 2072
Q04 1030 1418 1058 1071 1101
Q05 2494 2573 2587 2615 2706
Q06 962 822 829 966 963
Q07 2298 2383 2426 2390 2493
Q08 1563 1630 1627 1649 1693
Q09 3503 3734 3713 3612 3775
Q10 2370 2787 2583 2533 2434
Q11 468 324 336 328 323
Q12 1037 1455 1000 1137 1017
Q13 2235 2485 2309 2290 2288
Q14 946 953 1158 1000 1013
Q15 823 1168 1336 986 1245
Q16 901 900 939 954 927
Q17 853 902 881 896 906
Q18 7369 9457 7562 7761 7717
Q19 970 1021 1291 1056 1317
Q20 1201 1218 1229 2011 1204
Q21 3309 3784 3394 3501 3487
Q22 440 487 451 450 460
SUM 40370 46263 42393 42904 42779

sf1000

default with file cache

Query Default X=10 X=50 X=100 X=500
Q01 42902 44235 41353 41115 41215
Q02 2560 2618 2855 2713 2629
Q03 34505 39886 34458 34203 33721
Q04 15230 20644 14710 14893 14707
Q05 44124 46042 44598 44357 43497
Q06 16158 15289 15463 15690 15519
Q07 38524 37951 37997 38174 37767
Q08 32042 36543 31965 32340 31386
Q09 63988 73457 65299 64979 62692
Q10 37215 43144 36606 36745 36222
Q11 4488 5055 4539 4541 4482
Q12 17615 20101 17354 17617 17249
Q13 34092 33440 34711 34352 33010
Q14 19409 20599 18753 18779 18579
Q15 17575 17610 17121 17142 17232
Q16 7379 8015 7543 7560 7335
Q17 14899 21776 14447 14715 14386
Q18 0 0 0 0 0
Q19 22852 21751 22377 22565 22161
Q20 15206 17359 14863 14986 14516
Q21 51007 64570 50953 51491 49645
Q22 5792 8080 5734 5930 5525
SUM 537562 598165 533699 534887 523475

no file cache

Query Default X=10 X=50 X=100 X=500
Q01 42824 43872 41407 41565 41157
Q02 2511 2597 2669 2605 2549
Q03 34444 40297 33837 34215 33627
Q04 15217 20507 14929 14813 14666
Q05 44680 45787 43781 44366 43413
Q06 16201 15202 15397 15609 15556
Q07 38519 38537 37722 38051 37524
Q08 32101 36635 31620 32110 31194
Q09 64030 74037 64050 63648 62721
Q10 36974 41708 36374 36633 36641
Q11 4554 4485 4525 4500 4546
Q12 17664 19880 17299 17396 17320
Q13 33198 32047 34531 34543 32828
Q14 19130 20479 18532 18880 18590
Q15 17597 17930 17154 17336 17038
Q16 7253 7887 7564 7357 7387
Q17 15106 20625 14469 14576 14463
Q18 0 0 0 0 0
Q19 23098 22080 22290 22307 22440
Q20 14905 17010 14969 15094 14459
Q21 50878 64490 50734 50894 49500
Q22 5589 8109 5845 5991 5672
SUM 536473 594201 529698 532489 523291

conclusion

comparing to default conf, X=10 increases about 10% latency, but X=50/100/500 does not add latency at all.

What type of PR is this:

  • BugFix
  • Feature
  • Enhancement
  • Refactor
  • UT
  • Doc
  • Tool

Does this PR entail a change in behavior?

  • Yes, this PR will result in a change in behavior.
  • No, this PR will not result in a change in behavior.

If yes, please specify the type of change:

  • Interface/UI changes: syntax, type conversion, expression evaluation, display information
  • Parameter changes: default values, similar parameters but with different default values
  • Policy changes: use new policy to replace old one, functionality automatically enabled
  • Feature removed
  • Miscellaneous: upgrade & downgrade compatibility, etc.

Checklist:

  • I have added test cases for my bug fix or my new feature
  • This pr needs user documentation (for new or modified features or behaviors)
    • I have added documentation for my new feature or new function
  • This is a backport pr

Bugfix cherry-pick branch check:

  • I have checked the version labels which the pr will be auto-backported to the target branch
    • 3.3
    • 3.2
    • 3.1
    • 3.0
    • 2.5

@dirtysalt dirtysalt requested review from a team as code owners August 26, 2024 07:05
@wanpengfei-git wanpengfei-git requested a review from a team August 26, 2024 07:06
@dirtysalt dirtysalt changed the title [Refactor] support incremental scan ranges deployment at BE side [Enhancement] support incremental scan ranges deployment at BE side Aug 26, 2024
@dirtysalt dirtysalt force-pushed the be-incremental-scan-ranges branch 2 times, most recently from 53247b4 to 1a73d07 Compare August 28, 2024 14:30
@dirtysalt dirtysalt force-pushed the be-incremental-scan-ranges branch 2 times, most recently from e370b23 to b63725d Compare September 5, 2024 00:16
@dirtysalt dirtysalt requested a review from a team as a code owner September 5, 2024 01:43
@dirtysalt dirtysalt force-pushed the be-incremental-scan-ranges branch 2 times, most recently from 979281a to fa031db Compare September 5, 2024 09:52
stephen-shelby
stephen-shelby previously approved these changes Sep 9, 2024
for (const auto& scan_range : scan_ranges) {
if (scan_range.__isset.empty && scan_range.empty) {
if (scan_range.__isset.has_more) {
*has_more_morsel = scan_range.has_more;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

has_more only used for empt scan range? if scan_range not set empty, has_more will be false?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes.

[[maybe_unused]] bool has_more_morsel = false;
pipeline::ScanMorsel::build_scan_morsels(node_id, scan_ranges, accept_empty_scan_ranges(), &morsels,
&has_more_morsel);
DCHECK(has_more_morsel == false);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

always false?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

for current implementation, yes.

because only connector_scan_node.cpp supports incremental scan ranges delivery.

Copy link

sonarcloud bot commented Sep 19, 2024

Copy link

[Java-Extensions Incremental Coverage Report]

pass : 0 / 0 (0%)

Copy link

[BE Incremental Coverage Report]

fail : 22 / 79 (27.85%)

file detail

path covered_line new_line coverage not_covered_line_detail
🔵 be/src/exec/pipeline/fragment_executor.cpp 0 24 00.00% [856, 857, 858, 859, 860, 861, 863, 864, 865, 866, 868, 869, 870, 871, 873, 874, 875, 878, 879, 880, 881, 882, 883, 884]
🔵 be/src/exec/capture_version_node.cpp 0 3 00.00% [29, 30, 31]
🔵 be/src/exec/scan_node.cpp 0 5 00.00% [119, 120, 190, 191, 193]
🔵 be/src/service/internal_service.cpp 0 2 00.00% [427, 428]
🔵 be/src/exec/olap_scan_node.cpp 0 3 00.00% [413, 414, 416]
🔵 be/src/exec/pipeline/scan/connector_scan_operator.cpp 2 8 25.00% [603, 616, 617, 739, 740, 882]
🔵 be/src/exec/pipeline/scan/morsel.cpp 10 22 45.45% [45, 46, 48, 54, 71, 72, 73, 74, 85, 86, 157, 158]
🔵 be/src/exec/pipeline/scan/morsel.h 5 7 71.43% [355, 407]
🔵 be/src/storage/lake/tablet_reader.cpp 1 1 100.00% []
🔵 be/src/connector/connector.cpp 3 3 100.00% []
🔵 be/src/exec/pipeline/scan/scan_operator.cpp 1 1 100.00% []

Copy link

[FE Incremental Coverage Report]

pass : 43 / 46 (93.48%)

file detail

path covered_line new_line coverage not_covered_line_detail
🔵 com/starrocks/connector/RemoteFileOperations.java 0 3 00.00% [169, 177, 178]
🔵 com/starrocks/connector/AsyncTaskQueue.java 34 34 100.00% []
🔵 com/starrocks/qe/HDFSBackendSelector.java 9 9 100.00% []

@dirtysalt dirtysalt merged commit 7f57234 into StarRocks:main Sep 19, 2024
54 of 59 checks passed
@dirtysalt dirtysalt deleted the be-incremental-scan-ranges branch September 19, 2024 12:03
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

To support incremental scan ranges deployment.
9 participants