Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[opt](scan) unify the local and remote scan bytes stats for all scanners #40493

Draft
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

morningman
Copy link
Contributor

@morningman morningman commented Sep 6, 2024

Previously, only olap table's query has local and remote bytes read statistics.
This PR add these stats for all scanners.

  1. Use CachedRemoteFileReader no matter enable_file_cache is true or false

    Previously, if enable_file_cache is true, we use CachedRemoteFileReader.
    Otherwise, we use raw file reader to read data.

    In order to unify the query stats, in this PR, I use CachedRemoteFileReader
    no matter enable_file_cache is true or false.

    When reading data, if cache is disable, CachedRemoteFileReader will use
    the raw file reader in it directly.

  2. Add _update_bytes_and_rows_read() interface in VScanner

    This method will be called after each get_block() method.
    It will update the scan bytes and rows in query statistics.
    So that we can get real time statistics when querying system table backend_active_tasks

  3. Add REMOTE_SCAN_BYTES and LOCAL_SCAN_BYTES columns in backend_active_tasks

    REMOTE_SCAN_BYTES is bytes read from remote fs.
    LOCAL_SCAN_BYTES is bytes read from local disks.

    And SCAN_BYTES is now the sum of REMOTE_SCAN_BYTES and LOCAL_SCAN_BYTES

  4. Add new columns for audit log table

    • local_scan_bytes
    • remote_scan_bytes
    • shuffle_bytes
    • shuffle_rows
    • cloud_cluster_name

@doris-robot
Copy link

Thank you for your contribution to Apache Doris.
Don't know what should be done next? See How to process your PR

Since 2024-03-18, the Document has been moved to doris-website.
See Doris Document.

@morningman
Copy link
Contributor Author

run buildall

@doris-robot
Copy link

TeamCity be ut coverage result:
Function Coverage: 36.85% (9381/25459)
Line Coverage: 28.22% (77347/274058)
Region Coverage: 27.63% (39951/144596)
Branch Coverage: 24.26% (20330/83804)
Coverage Report: http://coverage.selectdb-in.cc/coverage/43bed06d610a32f367bace5706bb4db40f91b189_43bed06d610a32f367bace5706bb4db40f91b189/report/index.html

@doris-robot
Copy link

TPC-H: Total hot run time: 38319 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit 43bed06d610a32f367bace5706bb4db40f91b189, data reload: false

------ Round 1 ----------------------------------
q1	17607	4418	4400	4400
q2	2030	184	190	184
q3	11834	963	1119	963
q4	10507	749	787	749
q5	7746	2879	2865	2865
q6	229	142	138	138
q7	961	624	597	597
q8	9310	2055	2090	2055
q9	7265	6541	6571	6541
q10	6992	2185	2232	2185
q11	451	246	242	242
q12	403	226	233	226
q13	17754	3108	3039	3039
q14	284	237	241	237
q15	541	479	482	479
q16	516	442	417	417
q17	988	646	686	646
q18	7384	6894	6993	6894
q19	1398	1006	1067	1006
q20	666	338	324	324
q21	3893	3155	3110	3110
q22	1112	1037	1022	1022
Total cold run time: 109871 ms
Total hot run time: 38319 ms

----- Round 2, with runtime_filter_mode=off -----
q1	4458	4281	4339	4281
q2	387	281	275	275
q3	2867	2663	2716	2663
q4	1973	1653	1632	1632
q5	5617	5726	5724	5724
q6	230	141	137	137
q7	2257	1857	1826	1826
q8	3302	3442	3467	3442
q9	8837	8843	8892	8843
q10	3579	3429	3383	3383
q11	654	535	514	514
q12	839	678	667	667
q13	15163	3252	3318	3252
q14	325	287	279	279
q15	559	493	485	485
q16	538	504	511	504
q17	1847	1558	1568	1558
q18	8112	7985	7919	7919
q19	1767	1573	1685	1573
q20	2158	1916	1914	1914
q21	5683	5554	5505	5505
q22	1146	1032	1081	1032
Total cold run time: 72298 ms
Total hot run time: 57408 ms

@doris-robot
Copy link

TPC-DS: Total hot run time: 192956 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
TPC-DS sf100 test result on commit 43bed06d610a32f367bace5706bb4db40f91b189, data reload: false

query1	1243	880	909	880
query2	6428	1918	1918	1918
query3	10675	3958	4118	3958
query4	59913	26613	23106	23106
query5	5326	506	512	506
query6	395	176	161	161
query7	5774	309	300	300
query8	321	236	244	236
query9	9030	2525	2493	2493
query10	490	283	265	265
query11	17432	14927	15479	14927
query12	148	100	99	99
query13	1552	395	403	395
query14	10777	8169	7514	7514
query15	249	189	179	179
query16	7360	529	472	472
query17	1170	647	611	611
query18	1533	327	320	320
query19	293	155	157	155
query20	133	113	113	113
query21	210	107	115	107
query22	4756	4624	4764	4624
query23	34112	33430	33388	33388
query24	5923	2926	2860	2860
query25	523	378	386	378
query26	674	157	162	157
query27	1753	274	284	274
query28	3926	2064	2060	2060
query29	652	411	405	405
query30	229	157	164	157
query31	951	747	759	747
query32	80	51	56	51
query33	478	313	290	290
query34	883	490	480	480
query35	810	726	698	698
query36	1042	930	973	930
query37	139	84	81	81
query38	4036	3883	3823	3823
query39	1458	1388	1387	1387
query40	195	111	110	110
query41	45	49	45	45
query42	116	96	96	96
query43	533	468	473	468
query44	1152	749	756	749
query45	195	164	160	160
query46	1092	771	735	735
query47	1907	1767	1802	1767
query48	373	301	300	300
query49	764	434	434	434
query50	823	422	424	422
query51	7034	6928	6777	6777
query52	98	85	83	83
query53	251	208	180	180
query54	560	451	463	451
query55	74	76	81	76
query56	282	258	261	258
query57	1194	1048	1083	1048
query58	209	222	226	222
query59	3196	2747	2924	2747
query60	299	266	276	266
query61	116	100	97	97
query62	753	645	646	645
query63	220	194	182	182
query64	2689	655	690	655
query65	3246	3129	3142	3129
query66	613	337	341	337
query67	15485	15399	15400	15399
query68	3063	596	588	588
query69	402	290	279	279
query70	1195	1134	1109	1109
query71	334	274	274	274
query72	6110	4020	3951	3951
query73	741	321	327	321
query74	9173	8845	8980	8845
query75	3392	2663	2702	2663
query76	1474	994	1055	994
query77	529	327	321	321
query78	9907	9221	9098	9098
query79	1044	546	526	526
query80	698	509	510	509
query81	459	228	230	228
query82	235	138	134	134
query83	176	153	148	148
query84	260	78	84	78
query85	678	292	282	282
query86	307	299	267	267
query87	4331	4252	4181	4181
query88	3419	2407	2333	2333
query89	382	287	287	287
query90	2032	198	192	192
query91	125	99	99	99
query92	59	50	51	50
query93	1059	532	538	532
query94	668	294	294	294
query95	347	249	248	248
query96	585	265	263	263
query97	3269	3058	3112	3058
query98	211	206	202	202
query99	1594	1278	1259	1259
Total cold run time: 304649 ms
Total hot run time: 192956 ms

@doris-robot
Copy link

ClickBench: Total hot run time: 31.92 s
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/clickbench-tools
ClickBench test result on commit 43bed06d610a32f367bace5706bb4db40f91b189, data reload: false

query1	0.04	0.04	0.04
query2	0.08	0.04	0.04
query3	0.23	0.05	0.05
query4	1.68	0.07	0.06
query5	0.51	0.50	0.51
query6	1.13	0.74	0.72
query7	0.02	0.01	0.01
query8	0.05	0.04	0.04
query9	0.53	0.49	0.48
query10	0.53	0.58	0.55
query11	0.15	0.11	0.12
query12	0.15	0.12	0.12
query13	0.59	0.59	0.60
query14	1.40	1.42	1.43
query15	0.83	0.80	0.80
query16	0.37	0.41	0.38
query17	1.06	1.01	1.00
query18	0.21	0.20	0.19
query19	1.95	1.72	1.79
query20	0.01	0.01	0.01
query21	15.41	0.69	0.68
query22	4.13	6.56	2.30
query23	18.27	1.39	1.40
query24	2.06	0.24	0.21
query25	0.16	0.08	0.07
query26	0.28	0.18	0.18
query27	0.09	0.08	0.08
query28	13.28	1.02	0.99
query29	12.69	3.36	3.36
query30	0.25	0.05	0.06
query31	2.88	0.39	0.40
query32	3.25	0.48	0.48
query33	2.98	3.00	3.01
query34	16.95	4.45	4.43
query35	4.52	4.46	4.43
query36	0.66	0.47	0.47
query37	0.18	0.15	0.16
query38	0.16	0.14	0.14
query39	0.05	0.04	0.04
query40	0.16	0.13	0.12
query41	0.08	0.04	0.05
query42	0.06	0.05	0.05
query43	0.04	0.04	0.04
Total cold run time: 110.11 s
Total hot run time: 31.92 s

@morningman morningman changed the title [opt](scan) add scanBytesFromRemoteStorage and scanBytesFromLocalStorage for external table query [opt](scan) unify the local and remote scan bytes stats for all scanners Sep 7, 2024
@morningman
Copy link
Contributor Author

run buildall

@doris-robot
Copy link

TPC-H: Total hot run time: 38143 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit a6d8c99fde14ebaea4f113f07f8c4c80196348f8, data reload: false

------ Round 1 ----------------------------------
q1	17633	4439	4309	4309
q2	2028	195	185	185
q3	11705	961	1140	961
q4	10513	814	844	814
q5	7755	2886	2876	2876
q6	227	138	135	135
q7	960	617	600	600
q8	9315	2094	2124	2094
q9	7248	6579	6539	6539
q10	7002	2143	2239	2143
q11	435	241	243	241
q12	386	223	225	223
q13	17764	3094	3052	3052
q14	280	231	234	231
q15	531	496	492	492
q16	530	431	443	431
q17	991	673	766	673
q18	7450	6813	6771	6771
q19	1394	1263	1047	1047
q20	708	351	337	337
q21	3985	3069	2978	2978
q22	1115	1012	1011	1011
Total cold run time: 109955 ms
Total hot run time: 38143 ms

----- Round 2, with runtime_filter_mode=off -----
q1	4388	4345	4271	4271
q2	368	260	273	260
q3	2951	2651	2643	2643
q4	1989	1681	1657	1657
q5	5644	5679	5722	5679
q6	237	141	142	141
q7	2207	1810	1816	1810
q8	3323	3439	3466	3439
q9	8887	8737	8805	8737
q10	3553	3363	3328	3328
q11	632	510	531	510
q12	823	665	634	634
q13	15686	3225	3304	3225
q14	317	298	278	278
q15	518	494	482	482
q16	528	495	486	486
q17	1847	1548	1587	1548
q18	8108	7838	7825	7825
q19	1742	1588	1613	1588
q20	2136	1904	1922	1904
q21	5689	5543	5434	5434
q22	1116	1041	1065	1041
Total cold run time: 72689 ms
Total hot run time: 56920 ms

@doris-robot
Copy link

TeamCity be ut coverage result:
Function Coverage: 36.86% (9383/25459)
Line Coverage: 28.22% (77353/274073)
Region Coverage: 27.64% (39962/144597)
Branch Coverage: 24.27% (20337/83804)
Coverage Report: http://coverage.selectdb-in.cc/coverage/a6d8c99fde14ebaea4f113f07f8c4c80196348f8_a6d8c99fde14ebaea4f113f07f8c4c80196348f8/report/index.html

@doris-robot
Copy link

TPC-DS: Total hot run time: 193085 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
TPC-DS sf100 test result on commit a6d8c99fde14ebaea4f113f07f8c4c80196348f8, data reload: false

query1	1275	893	870	870
query2	6366	1917	1902	1902
query3	10628	3951	4031	3951
query4	59323	24251	23260	23260
query5	5356	504	514	504
query6	402	167	169	167
query7	5761	302	285	285
query8	303	213	206	206
query9	8684	2449	2469	2449
query10	485	286	263	263
query11	18024	15055	15042	15042
query12	150	102	105	102
query13	1602	397	383	383
query14	11073	7409	7283	7283
query15	253	182	182	182
query16	7243	499	506	499
query17	1087	568	557	557
query18	2000	305	296	296
query19	287	209	144	144
query20	116	112	109	109
query21	204	109	102	102
query22	4960	4664	4586	4586
query23	34156	33388	33383	33383
query24	5995	2959	2810	2810
query25	527	396	381	381
query26	687	162	151	151
query27	1788	274	279	274
query28	3768	2032	2019	2019
query29	662	401	409	401
query30	222	157	155	155
query31	961	717	772	717
query32	86	51	55	51
query33	449	278	288	278
query34	869	480	468	468
query35	856	700	722	700
query36	1062	965	916	916
query37	143	87	84	84
query38	4048	3893	3885	3885
query39	1441	1390	1404	1390
query40	202	117	116	116
query41	49	47	47	47
query42	120	97	99	97
query43	499	474	441	441
query44	1093	768	750	750
query45	201	169	169	169
query46	1102	745	742	742
query47	1919	1789	1821	1789
query48	392	305	328	305
query49	805	458	450	450
query50	826	411	420	411
query51	7011	6818	6847	6818
query52	100	88	88	88
query53	250	178	181	178
query54	574	474	465	465
query55	80	78	85	78
query56	280	264	275	264
query57	1184	1068	1103	1068
query58	234	240	232	232
query59	2915	2965	2771	2771
query60	306	280	288	280
query61	125	123	122	122
query62	764	666	650	650
query63	215	190	186	186
query64	2896	761	764	761
query65	3179	3177	3125	3125
query66	684	350	367	350
query67	15458	15363	15486	15363
query68	3403	588	578	578
query69	406	285	289	285
query70	1192	1126	1087	1087
query71	354	283	279	279
query72	6341	4191	4183	4183
query73	748	324	328	324
query74	9218	8831	8895	8831
query75	3406	2637	2686	2637
query76	1531	1021	962	962
query77	522	370	316	316
query78	9931	9087	9021	9021
query79	2013	526	535	526
query80	986	502	510	502
query81	573	231	229	229
query82	233	132	143	132
query83	179	151	151	151
query84	267	75	80	75
query85	1031	313	278	278
query86	411	294	303	294
query87	4541	4251	4254	4251
query88	3946	2392	2375	2375
query89	396	277	273	273
query90	1879	192	190	190
query91	129	99	105	99
query92	64	49	51	49
query93	2054	538	541	538
query94	802	296	285	285
query95	359	252	257	252
query96	604	266	267	266
query97	3179	3045	3085	3045
query98	220	198	195	195
query99	1591	1271	1256	1256
Total cold run time: 309274 ms
Total hot run time: 193085 ms

@doris-robot
Copy link

ClickBench: Total hot run time: 31.51 s
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/clickbench-tools
ClickBench test result on commit a6d8c99fde14ebaea4f113f07f8c4c80196348f8, data reload: false

query1	0.05	0.05	0.04
query2	0.09	0.05	0.04
query3	0.22	0.05	0.06
query4	1.69	0.08	0.08
query5	0.49	0.50	0.49
query6	1.13	0.74	0.72
query7	0.02	0.01	0.01
query8	0.06	0.05	0.04
query9	0.55	0.50	0.49
query10	0.55	0.56	0.53
query11	0.16	0.11	0.12
query12	0.15	0.12	0.12
query13	0.60	0.59	0.59
query14	1.38	1.43	1.47
query15	0.83	0.81	0.80
query16	0.37	0.37	0.35
query17	0.97	1.04	0.99
query18	0.21	0.19	0.21
query19	1.91	1.79	1.73
query20	0.02	0.01	0.01
query21	15.39	0.65	0.64
query22	4.05	7.42	2.20
query23	18.25	1.29	1.29
query24	2.06	0.21	0.21
query25	0.15	0.09	0.08
query26	0.25	0.17	0.18
query27	0.07	0.07	0.08
query28	13.36	1.01	0.99
query29	12.58	3.34	3.34
query30	0.24	0.06	0.05
query31	2.88	0.39	0.38
query32	3.28	0.48	0.47
query33	2.98	3.01	3.01
query34	16.94	4.37	4.38
query35	4.48	4.39	4.53
query36	0.66	0.47	0.50
query37	0.18	0.16	0.15
query38	0.15	0.15	0.14
query39	0.04	0.04	0.03
query40	0.15	0.14	0.12
query41	0.10	0.04	0.05
query42	0.07	0.05	0.05
query43	0.04	0.04	0.04
Total cold run time: 109.8 s
Total hot run time: 31.51 s

@morningman morningman force-pushed the external_scan_bytes branch 2 times, most recently from f7ce982 to 57e247a Compare September 12, 2024 04:17
// first need to update the last statistics in _owned_cache_stats
// to the file_cache_stats in the input parameter.
// Then reset _owned_cache_stats
if (io_ctx->file_cache_stats) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

potential data race ?

_prefetch_status = ExecEnv::GetInstance()->buffered_reader_prefetch_thread_pool()->submit_func(
[buffer_ptr = shared_from_this()]() { buffer_ptr->prefetch_buffer(); });
}

void PrefetchBuffer::_update_and_reset_io_context(const IOContext* io_ctx) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

when does this method is called?

morningman added a commit to morningman/doris that referenced this pull request Nov 21, 2024
1
2

1

6

7

8

audit

audit

add ut

2

3

1

tmp fix
morningman added a commit to morningman/doris that referenced this pull request Dec 9, 2024
yiguolei pushed a commit that referenced this pull request Dec 10, 2024
…ers for 2.1 (#45167)

pick part of #40493

TODO: not working with s3 reader
morningman pushed a commit that referenced this pull request Dec 30, 2024
…6119)

Fix the bug that causes audit loader to fail.
Related PR: #45167 #40493

The bug causes audit loader fail as following errors in audit.log.
```
2024-12-27 11:47:47,001 [stream_load] |Label=audit_log_20241227_114552_856_127_0_0_1_8030|Db=__internal_schema|Table=audit_log|User=|ClientIp=10.0.1.3|Status=Success|Message=OK|Url=http://10.0.1.4:8040/api/_load_error_log?file=__shard_7/error_log_insert_stmt_c24ed0d941f59867-ec08b8542bc2a4a1_c24ed0d941f59867_ec08b8542bc2a4a1|TotalRows=34|LoadedRows=0|FilteredRows=34|UnselectedRows=0|LoadBytes=6887|StartTime=2024-12-27 11:45:52.858|FinishTime=2024-12-27 11:45:52.888
```
The detail error is:
```
curl http://10.0.1.4:8040/api/_load_error_log?file=__shard_7/error_log_insert_stmt_c24ed0d941f59867-ec08b8542bc2a4a1_c24ed0d941f59867_ec08b8542bc2a4a1

Reason: actual column number in csv file is  more than  schema column number.actual number: 29, schema column number: 27; line delimiter: [
], column separator: [  ], result values:
```

Co-authored-by: derenli <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants