Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[feat](clone) Speed clone tablet via batch small file downloading #45061

Merged
merged 2 commits into from
Dec 9, 2024

Conversation

w41ter
Copy link
Contributor

@w41ter w41ter commented Dec 5, 2024

What problem does this PR solve?

Issue Number: close #xxx

Related PR: #xxx

Problem Summary:

Downloading small files is too slow and might cause the clone tablet task to time out. This PR supports a batch downloading API to speed up the downloading of small files.

Before

succeed to copy tablet 10088, total file size: 19256126 B, cost: 78674 ms, rate: 0.244758 MB/s

After

succeed to copy tablet 30157, total files: 20006, total file size: 19311624 B, cost: 4016 ms, rate: 4.80867 MB/s

Release note

None

Check List (For Author)

  • Test

    • Regression test
    • Unit Test
    • Manual test (add detailed scripts or steps below)
    • No need to test or manual test. Explain why:
      • This is a refactor/code format and no logic has been changed.
      • Previous test can cover this change.
      • No code files have been changed.
      • Other reason
  • Behavior changed:

    • No.
    • Yes.
  • Does this need documentation?

    • No.
    • Yes.

PR: apache/doris-website#1476

Check List (For Reviewer who merge this PR)

  • Confirm the release note
  • Confirm test cases
  • Confirm document
  • Add branch pick label

@doris-robot
Copy link

Thank you for your contribution to Apache Doris.
Don't know what should be done next? See How to process your PR.

Please clearly describe your PR:

  1. What problem was fixed (it's best to include specific error reporting information). How it was fixed.
  2. Which behaviors were modified. What was the previous behavior, what is it now, why was it modified, and what possible impacts might there be.
  3. What features were added. Why was this function added?
  4. Which code was refactored and why was this part of the code refactored?
  5. Which functions were optimized and what is the difference before and after the optimization?

@w41ter
Copy link
Contributor Author

w41ter commented Dec 5, 2024

run buildall

@doris-robot
Copy link

TPC-H: Total hot run time: 41039 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit a3fc48b4e87d0afbae4e4ebdd84735678c2fb097, data reload: false

------ Round 1 ----------------------------------
q1	17600	7497	7256	7256
q2	2210	1309	1277	1277
q3	9969	1154	1173	1154
q4	10227	721	677	677
q5	7621	2893	2664	2664
q6	238	146	147	146
q7	1002	636	606	606
q8	9273	1938	1930	1930
q9	6819	6618	6457	6457
q10	7043	2283	2344	2283
q11	479	266	265	265
q12	415	216	219	216
q13	17792	2997	3024	2997
q14	244	212	210	210
q15	567	537	513	513
q16	647	589	589	589
q17	1001	514	546	514
q18	7618	6686	6593	6593
q19	1348	960	1076	960
q20	473	184	183	183
q21	4331	3330	3238	3238
q22	388	311	321	311
Total cold run time: 107305 ms
Total hot run time: 41039 ms

----- Round 2, with runtime_filter_mode=off -----
q1	7230	7199	7248	7199
q2	333	231	227	227
q3	2940	3069	2980	2980
q4	2089	1836	1825	1825
q5	5735	5614	5694	5614
q6	235	149	150	149
q7	2226	1802	1785	1785
q8	3443	3588	3532	3532
q9	9046	9137	9244	9137
q10	3621	3613	3572	3572
q11	609	495	503	495
q12	837	653	628	628
q13	13040	3277	3244	3244
q14	305	276	288	276
q15	564	562	510	510
q16	685	642	649	642
q17	1875	1603	1587	1587
q18	8338	7764	7810	7764
q19	4538	1615	1512	1512
q20	2123	1871	1937	1871
q21	5829	5465	5297	5297
q22	629	563	617	563
Total cold run time: 76270 ms
Total hot run time: 60409 ms

@doris-robot
Copy link

TPC-DS: Total hot run time: 196525 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
TPC-DS sf100 test result on commit a3fc48b4e87d0afbae4e4ebdd84735678c2fb097, data reload: false

query1	1408	797	777	777
query2	6251	2019	1987	1987
query3	10691	4509	4342	4342
query4	66189	28974	23694	23694
query5	4919	484	456	456
query6	433	172	170	170
query7	5593	304	301	301
query8	306	226	223	223
query9	9574	2736	2730	2730
query10	458	252	246	246
query11	17585	15374	16080	15374
query12	156	107	107	107
query13	1620	463	418	418
query14	9821	6558	6674	6558
query15	226	210	188	188
query16	7098	486	467	467
query17	1127	566	594	566
query18	1936	300	297	297
query19	227	163	152	152
query20	114	107	109	107
query21	207	100	104	100
query22	4749	4383	4498	4383
query23	35277	35154	34853	34853
query24	5677	2526	2557	2526
query25	509	429	418	418
query26	661	154	155	154
query27	2345	311	299	299
query28	4599	2496	2465	2465
query29	660	440	468	440
query30	203	145	141	141
query31	967	791	820	791
query32	68	52	53	52
query33	405	278	295	278
query34	895	515	531	515
query35	866	749	741	741
query36	1079	981	936	936
query37	122	71	75	71
query38	4519	4275	4271	4271
query39	1482	1386	1419	1386
query40	197	97	96	96
query41	46	43	42	42
query42	109	95	102	95
query43	536	483	486	483
query44	1184	807	820	807
query45	188	166	162	162
query46	1155	717	772	717
query47	1962	1848	1826	1826
query48	411	319	306	306
query49	722	383	376	376
query50	841	390	395	390
query51	7329	7067	7099	7067
query52	97	88	87	87
query53	258	183	184	183
query54	511	390	391	390
query55	79	76	76	76
query56	266	234	227	227
query57	1236	1117	1138	1117
query58	206	200	231	200
query59	3271	3106	2883	2883
query60	265	272	235	235
query61	110	130	134	130
query62	815	697	674	674
query63	216	195	199	195
query64	1388	692	616	616
query65	3259	3201	3190	3190
query66	708	306	359	306
query67	16061	15746	15550	15550
query68	4122	564	575	564
query69	432	250	257	250
query70	1137	1132	1101	1101
query71	352	257	254	254
query72	6113	4085	4142	4085
query73	811	359	369	359
query74	10282	8997	9007	8997
query75	3410	2678	2672	2672
query76	2125	1176	1069	1069
query77	491	280	273	273
query78	10590	9565	9414	9414
query79	1476	611	606	606
query80	897	426	425	425
query81	495	231	231	231
query82	1253	120	118	118
query83	244	154	146	146
query84	280	76	75	75
query85	906	297	294	294
query86	353	309	306	306
query87	4680	4600	4548	4548
query88	3764	2236	2196	2196
query89	431	292	304	292
query90	2017	191	189	189
query91	136	102	105	102
query92	66	50	53	50
query93	1964	544	543	543
query94	822	296	289	289
query95	355	256	254	254
query96	623	277	280	277
query97	2851	2682	2649	2649
query98	213	208	191	191
query99	1611	1322	1300	1300
Total cold run time: 319750 ms
Total hot run time: 196525 ms

@doris-robot
Copy link

ClickBench: Total hot run time: 32.68 s
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/clickbench-tools
ClickBench test result on commit a3fc48b4e87d0afbae4e4ebdd84735678c2fb097, data reload: false

query1	0.04	0.03	0.02
query2	0.07	0.03	0.03
query3	0.24	0.07	0.07
query4	1.62	0.10	0.11
query5	0.41	0.41	0.42
query6	1.15	0.68	0.68
query7	0.02	0.02	0.01
query8	0.04	0.04	0.03
query9	0.60	0.53	0.51
query10	0.56	0.57	0.58
query11	0.14	0.11	0.11
query12	0.13	0.11	0.11
query13	0.63	0.62	0.61
query14	2.84	2.71	2.72
query15	0.93	0.85	0.83
query16	0.39	0.37	0.40
query17	1.08	1.02	1.00
query18	0.21	0.21	0.21
query19	1.91	1.87	2.05
query20	0.01	0.01	0.01
query21	15.37	0.60	0.59
query22	2.84	2.29	2.10
query23	17.10	0.91	0.80
query24	3.25	0.47	2.33
query25	0.23	0.19	0.15
query26	0.43	0.14	0.15
query27	0.04	0.03	0.03
query28	9.85	1.14	1.07
query29	12.55	3.26	3.23
query30	0.24	0.07	0.06
query31	2.88	0.40	0.39
query32	3.23	0.49	0.47
query33	3.02	3.09	3.02
query34	16.98	4.54	4.56
query35	4.51	4.61	4.56
query36	0.68	0.49	0.50
query37	0.09	0.06	0.06
query38	0.05	0.03	0.04
query39	0.03	0.02	0.02
query40	0.17	0.13	0.13
query41	0.08	0.02	0.02
query42	0.03	0.03	0.02
query43	0.03	0.03	0.03
Total cold run time: 106.7 s
Total hot run time: 32.68 s

@doris-robot
Copy link

TeamCity be ut coverage result:
Function Coverage: 38.53% (10031/26032)
Line Coverage: 29.56% (84206/284891)
Region Coverage: 28.65% (43252/150973)
Branch Coverage: 25.23% (21978/87110)
Coverage Report: http://coverage.selectdb-in.cc/coverage/a3fc48b4e87d0afbae4e4ebdd84735678c2fb097_a3fc48b4e87d0afbae4e4ebdd84735678c2fb097/report/index.html

@w41ter w41ter force-pushed the batch_downloading branch from a3fc48b to c9d5081 Compare December 9, 2024 02:25
@w41ter
Copy link
Contributor Author

w41ter commented Dec 9, 2024

run buildall

Copy link
Contributor

@dataroaring dataroaring left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link
Contributor

github-actions bot commented Dec 9, 2024

PR approved by at least one committer and no changes requested.

@github-actions github-actions bot added approved Indicates a PR has been approved by one committer. reviewed labels Dec 9, 2024
Copy link
Contributor

github-actions bot commented Dec 9, 2024

PR approved by anyone and no changes requested.

@doris-robot
Copy link

TPC-H: Total hot run time: 40248 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit c9d508164472166f1b4dd11021519cfb47bb2e63, data reload: false

------ Round 1 ----------------------------------
q1	17675	7627	7364	7364
q2	2046	184	163	163
q3	10646	1087	1187	1087
q4	10479	793	708	708
q5	7584	2741	2681	2681
q6	235	150	149	149
q7	1020	627	598	598
q8	9234	1900	1934	1900
q9	6720	6528	6543	6528
q10	7004	2339	2364	2339
q11	471	273	255	255
q12	432	229	224	224
q13	17774	3039	3066	3039
q14	249	214	214	214
q15	563	535	517	517
q16	680	591	570	570
q17	988	666	544	544
q18	7304	6658	6843	6658
q19	1402	1022	1063	1022
q20	472	191	186	186
q21	4046	3304	3186	3186
q22	377	316	322	316
Total cold run time: 107401 ms
Total hot run time: 40248 ms

----- Round 2, with runtime_filter_mode=off -----
q1	7316	7243	7261	7243
q2	331	233	238	233
q3	2967	2962	2902	2902
q4	2079	1849	1880	1849
q5	5671	5709	5706	5706
q6	243	147	142	142
q7	2308	1816	1822	1816
q8	3385	3533	3570	3533
q9	8906	9035	9067	9035
q10	3620	3568	3586	3568
q11	621	511	501	501
q12	822	655	631	631
q13	10605	3209	3259	3209
q14	303	296	289	289
q15	583	524	525	524
q16	692	663	646	646
q17	1885	1650	1634	1634
q18	8453	7803	7580	7580
q19	1734	1555	1462	1462
q20	2148	1856	1895	1856
q21	5558	5522	5458	5458
q22	677	579	567	567
Total cold run time: 70907 ms
Total hot run time: 60384 ms

@doris-robot
Copy link

TPC-DS: Total hot run time: 197451 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
TPC-DS sf100 test result on commit c9d508164472166f1b4dd11021519cfb47bb2e63, data reload: false

query1	1259	956	960	956
query2	6256	2031	2062	2031
query3	10940	4245	4334	4245
query4	67536	29737	23542	23542
query5	4963	478	487	478
query6	424	183	170	170
query7	5539	299	305	299
query8	319	241	235	235
query9	8713	2667	2670	2667
query10	424	244	251	244
query11	17071	15230	15899	15230
query12	150	102	106	102
query13	1448	420	413	413
query14	10115	7622	7315	7315
query15	211	201	181	181
query16	6957	488	495	488
query17	1066	560	581	560
query18	1527	309	312	309
query19	218	156	145	145
query20	118	110	108	108
query21	240	106	97	97
query22	4743	4654	4769	4654
query23	35115	34391	34426	34391
query24	5873	2505	2501	2501
query25	503	414	418	414
query26	648	159	159	159
query27	1853	294	294	294
query28	4657	2498	2469	2469
query29	708	469	444	444
query30	209	157	151	151
query31	1028	875	848	848
query32	64	55	54	54
query33	427	300	323	300
query34	942	508	525	508
query35	891	764	759	759
query36	1082	970	989	970
query37	123	72	75	72
query38	4499	4420	4383	4383
query39	1526	1472	1495	1472
query40	200	105	100	100
query41	46	44	45	44
query42	112	99	124	99
query43	554	512	503	503
query44	1212	823	829	823
query45	185	172	177	172
query46	1196	750	750	750
query47	2089	1998	1937	1937
query48	425	331	311	311
query49	754	384	405	384
query50	877	402	412	402
query51	7439	7215	7103	7103
query52	98	88	92	88
query53	259	187	185	185
query54	515	433	392	392
query55	79	76	82	76
query56	276	248	235	235
query57	1249	1143	1085	1085
query58	223	216	207	207
query59	3172	3076	3011	3011
query60	270	242	237	237
query61	109	107	106	106
query62	770	678	680	678
query63	218	190	182	182
query64	1298	681	643	643
query65	3296	3205	3212	3205
query66	688	317	316	316
query67	15985	15591	15696	15591
query68	3798	599	681	599
query69	418	251	252	251
query70	1184	1035	1142	1035
query71	355	257	246	246
query72	6242	4109	4037	4037
query73	792	369	350	350
query74	10057	8965	9018	8965
query75	3405	2698	2696	2696
query76	1902	1029	1040	1029
query77	450	279	292	279
query78	10455	9450	9562	9450
query79	1684	611	602	602
query80	1210	444	441	441
query81	488	230	236	230
query82	1238	123	121	121
query83	224	146	146	146
query84	270	73	71	71
query85	905	309	303	303
query86	383	296	300	296
query87	4834	4527	4493	4493
query88	3732	2161	2132	2132
query89	426	293	298	293
query90	1948	187	188	187
query91	138	105	102	102
query92	62	49	57	49
query93	1948	559	559	559
query94	793	299	290	290
query95	359	255	263	255
query96	619	276	288	276
query97	2871	2656	2669	2656
query98	214	192	194	192
query99	1618	1345	1297	1297
Total cold run time: 319001 ms
Total hot run time: 197451 ms

@doris-robot
Copy link

TeamCity be ut coverage result:
Function Coverage: 38.53% (10031/26031)
Line Coverage: 29.54% (84168/284976)
Region Coverage: 28.63% (43245/151031)
Branch Coverage: 25.21% (21971/87140)
Coverage Report: http://coverage.selectdb-in.cc/coverage/c9d508164472166f1b4dd11021519cfb47bb2e63_c9d508164472166f1b4dd11021519cfb47bb2e63/report/index.html

@doris-robot
Copy link

ClickBench: Total hot run time: 32.11 s
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/clickbench-tools
ClickBench test result on commit c9d508164472166f1b4dd11021519cfb47bb2e63, data reload: false

query1	0.03	0.03	0.03
query2	0.07	0.04	0.03
query3	0.24	0.07	0.07
query4	1.61	0.10	0.10
query5	0.44	0.38	0.40
query6	1.19	0.65	0.65
query7	0.02	0.02	0.02
query8	0.04	0.03	0.03
query9	0.57	0.52	0.50
query10	0.56	0.56	0.55
query11	0.15	0.11	0.11
query12	0.14	0.12	0.11
query13	0.62	0.59	0.60
query14	2.73	2.86	2.72
query15	0.90	0.81	0.82
query16	0.39	0.38	0.37
query17	1.04	1.03	1.06
query18	0.23	0.20	0.21
query19	1.87	1.87	2.06
query20	0.02	0.01	0.01
query21	15.36	0.58	0.56
query22	2.98	2.14	1.55
query23	17.11	0.84	0.76
query24	2.98	1.43	0.90
query25	0.22	0.16	0.12
query26	0.44	0.14	0.14
query27	0.04	0.04	0.04
query28	10.51	1.09	1.08
query29	12.55	3.23	3.21
query30	0.25	0.06	0.06
query31	2.86	0.38	0.37
query32	3.27	0.47	0.47
query33	3.00	2.99	3.07
query34	17.07	4.51	4.44
query35	4.55	4.44	4.55
query36	0.65	0.49	0.49
query37	0.09	0.06	0.06
query38	0.05	0.04	0.04
query39	0.04	0.02	0.03
query40	0.16	0.14	0.13
query41	0.07	0.02	0.02
query42	0.03	0.02	0.02
query43	0.03	0.04	0.02
Total cold run time: 107.17 s
Total hot run time: 32.11 s

Copy link
Contributor

@deardeng deardeng left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@w41ter w41ter merged commit 63cc1af into apache:master Dec 9, 2024
25 of 28 checks passed
@w41ter w41ter deleted the batch_downloading branch December 9, 2024 07:16
github-actions bot pushed a commit that referenced this pull request Dec 9, 2024
…5061)

Downloading small files is too slow and might cause the clone tablet
task to time out. This PR supports a batch downloading API to speed up
the downloading of small files.

Before

```
succeed to copy tablet 10088, total file size: 19256126 B, cost: 78674 ms, rate: 0.244758 MB/s
```

After

```
succeed to copy tablet 30157, total files: 20006, total file size: 19311624 B, cost: 4016 ms, rate: 4.80867 MB/s
```
w41ter added a commit to w41ter/incubator-doris that referenced this pull request Dec 10, 2024
…ache#45061)

Downloading small files is too slow and might cause the clone tablet
task to time out. This PR supports a batch downloading API to speed up
the downloading of small files.

Before

```
succeed to copy tablet 10088, total file size: 19256126 B, cost: 78674 ms, rate: 0.244758 MB/s
```

After

```
succeed to copy tablet 30157, total files: 20006, total file size: 19311624 B, cost: 4016 ms, rate: 4.80867 MB/s
```
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by one committer. dev/2.1.x dev/3.0.4-merged reviewed
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants