-
Notifications
You must be signed in to change notification settings - Fork 3.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[opt](scan) unify the local and remote scan bytes stats for all scanners #40493
base: master
Are you sure you want to change the base?
Conversation
Thank you for your contribution to Apache Doris. Since 2024-03-18, the Document has been moved to doris-website. |
run buildall |
TeamCity be ut coverage result: |
TPC-H: Total hot run time: 38319 ms
|
TPC-DS: Total hot run time: 192956 ms
|
ClickBench: Total hot run time: 31.92 s
|
cc42838
to
102b368
Compare
64f0607
to
a6d8c99
Compare
run buildall |
TPC-H: Total hot run time: 38143 ms
|
TeamCity be ut coverage result: |
TPC-DS: Total hot run time: 193085 ms
|
ClickBench: Total hot run time: 31.51 s
|
f7ce982
to
57e247a
Compare
53d8028
to
905ef11
Compare
// first need to update the last statistics in _owned_cache_stats | ||
// to the file_cache_stats in the input parameter. | ||
// Then reset _owned_cache_stats | ||
if (io_ctx->file_cache_stats) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
potential data race ?
_prefetch_status = ExecEnv::GetInstance()->buffered_reader_prefetch_thread_pool()->submit_func( | ||
[buffer_ptr = shared_from_this()]() { buffer_ptr->prefetch_buffer(); }); | ||
} | ||
|
||
void PrefetchBuffer::_update_and_reset_io_context(const IOContext* io_ctx) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
when does this method is called?
905ef11
to
481472b
Compare
…6119) Fix the bug that causes audit loader to fail. Related PR: #45167 #40493 The bug causes audit loader fail as following errors in audit.log. ``` 2024-12-27 11:47:47,001 [stream_load] |Label=audit_log_20241227_114552_856_127_0_0_1_8030|Db=__internal_schema|Table=audit_log|User=|ClientIp=10.0.1.3|Status=Success|Message=OK|Url=http://10.0.1.4:8040/api/_load_error_log?file=__shard_7/error_log_insert_stmt_c24ed0d941f59867-ec08b8542bc2a4a1_c24ed0d941f59867_ec08b8542bc2a4a1|TotalRows=34|LoadedRows=0|FilteredRows=34|UnselectedRows=0|LoadBytes=6887|StartTime=2024-12-27 11:45:52.858|FinishTime=2024-12-27 11:45:52.888 ``` The detail error is: ``` curl http://10.0.1.4:8040/api/_load_error_log?file=__shard_7/error_log_insert_stmt_c24ed0d941f59867-ec08b8542bc2a4a1_c24ed0d941f59867_ec08b8542bc2a4a1 Reason: actual column number in csv file is more than schema column number.actual number: 29, schema column number: 27; line delimiter: [ ], column separator: [ ], result values: ``` Co-authored-by: derenli <[email protected]>
Previously, only olap table's query has local and remote bytes read statistics.
This PR add these stats for all scanners.
Use
CachedRemoteFileReader
no matterenable_file_cache
is true or falsePreviously, if
enable_file_cache
is true, we useCachedRemoteFileReader
.Otherwise, we use raw file reader to read data.
In order to unify the query stats, in this PR, I use
CachedRemoteFileReader
no matter
enable_file_cache
is true or false.When reading data, if cache is disable,
CachedRemoteFileReader
will usethe raw file reader in it directly.
Add
_update_bytes_and_rows_read()
interface inVScanner
This method will be called after each
get_block()
method.It will update the scan bytes and rows in query statistics.
So that we can get real time statistics when querying system table
backend_active_tasks
Add
REMOTE_SCAN_BYTES
andLOCAL_SCAN_BYTES
columns inbackend_active_tasks
REMOTE_SCAN_BYTES
is bytes read from remote fs.LOCAL_SCAN_BYTES
is bytes read from local disks.And
SCAN_BYTES
is now the sum ofREMOTE_SCAN_BYTES
andLOCAL_SCAN_BYTES
Add new columns for audit log table
local_scan_bytes
remote_scan_bytes
shuffle_bytes
shuffle_rows
cloud_cluster_name