You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
In practice we have found that it can take a lot of time on the FE to get all the files that need to be scanned, and then send them down to the BE for execution. This is fine for native table, but not ideal for hive/iceberg/hudi/deltalake.
Ideally, the FE should start executing on the BE after scanning some of the files, while the FE continues to scan the rest of the files. This way, the whole execution process can be executed in parallel on both the FE and the BE, thus shortening the execution time.
Arch diagram:
two new session variables:
enable_connector_incremental_scan_ranges=true (if to enable incremental scan ranges deployment)
connector_incremental_scan_ranges_size=50 (if enabled, how many scan ranges delivered each round)
To achieve this goal, the following things need to be done.
the FE should support incremental acquisition of scanned files.
Enhancement
In practice we have found that it can take a lot of time on the FE to get all the files that need to be scanned, and then send them down to the BE for execution. This is fine for native table, but not ideal for hive/iceberg/hudi/deltalake.
Ideally, the FE should start executing on the BE after scanning some of the files, while the FE continues to scan the rest of the files. This way, the whole execution process can be executed in parallel on both the FE and the BE, thus shortening the execution time.
Arch diagram:
two new session variables:
To achieve this goal, the following things need to be done.
getRemoteFiles
#49230The text was updated successfully, but these errors were encountered: