Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

To support incremental scan ranges deployment. #50196

Closed
3 tasks done
dirtysalt opened this issue Aug 23, 2024 · 0 comments · Fixed by #50189 or #50254
Closed
3 tasks done

To support incremental scan ranges deployment. #50196

dirtysalt opened this issue Aug 23, 2024 · 0 comments · Fixed by #50189 or #50254
Labels
type/enhancement Make an enhancement to StarRocks

Comments

@dirtysalt
Copy link
Contributor

dirtysalt commented Aug 23, 2024

Enhancement

In practice we have found that it can take a lot of time on the FE to get all the files that need to be scanned, and then send them down to the BE for execution. This is fine for native table, but not ideal for hive/iceberg/hudi/deltalake.

Ideally, the FE should start executing on the BE after scanning some of the files, while the FE continues to scan the rest of the files. This way, the whole execution process can be executed in parallel on both the FE and the BE, thus shortening the execution time.

Arch diagram:

image

two new session variables:

  • enable_connector_incremental_scan_ranges=true (if to enable incremental scan ranges deployment)
  • connector_incremental_scan_ranges_size=50 (if enabled, how many scan ranges delivered each round)

To achieve this goal, the following things need to be done.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
type/enhancement Make an enhancement to StarRocks
Projects
None yet
1 participant