Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Enhancement](compaction)Optimize compaction task permit allocation #45197

Conversation

Yukang-Lian
Copy link
Collaborator

@Yukang-Lian Yukang-Lian commented Dec 9, 2024

What problem does this PR solve?

Issue Number: close #xxx

Related PR: #xxx

Problem Summary:

Problem

The current implementation of compaction task submission reserves permits before task execution, which can lead to inefficient resource utilization. Tasks waiting in the thread pool queue may hold permits, potentially blocking other tasks from being executed.

Solution

Change total_permits_for_compaction_score to 1,000,000, which will effectively remove the limit on total permits. The original purpose of total permits was to control the memory of compaction tasks, but currently, memory is controlled by individual compaction tasks, so total permits are no longer serving any purpose. If no memory issues arise in the next two versions after making this change, we will remove the permits mechanism.

Release note

None

Check List (For Author)

  • Test

    • Regression test
    • Unit Test
    • Manual test (add detailed scripts or steps below)
    • No need to test or manual test. Explain why:
      • This is a refactor/code format and no logic has been changed.
      • Previous test can cover this change.
      • No code files have been changed.
      • Other reason
  • Behavior changed:

    • No.
    • Yes.
  • Does this need documentation?

    • No.
    • Yes.

Check List (For Reviewer who merge this PR)

  • Confirm the release note
  • Confirm test cases
  • Confirm document
  • Add branch pick label

@doris-robot
Copy link

Thank you for your contribution to Apache Doris.
Don't know what should be done next? See How to process your PR.

Please clearly describe your PR:

  1. What problem was fixed (it's best to include specific error reporting information). How it was fixed.
  2. Which behaviors were modified. What was the previous behavior, what is it now, why was it modified, and what possible impacts might there be.
  3. What features were added. Why was this function added?
  4. Which code was refactored and why was this part of the code refactored?
  5. Which functions were optimized and what is the difference before and after the optimization?

@Yukang-Lian
Copy link
Collaborator Author

run buildall

Copy link
Contributor

github-actions bot commented Dec 9, 2024

clang-tidy review says "All clean, LGTM! 👍"

@doris-robot
Copy link

TPC-H: Total hot run time: 40046 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit 159599a604213c5f0f9e1863b14d3fd7a4530262, data reload: false

------ Round 1 ----------------------------------
q1	17585	7523	7291	7291
q2	2046	179	165	165
q3	10553	1175	1156	1156
q4	10227	737	765	737
q5	7627	2766	2686	2686
q6	240	147	149	147
q7	1012	623	606	606
q8	9235	1854	1956	1854
q9	6618	6563	6522	6522
q10	7078	2273	2359	2273
q11	466	260	258	258
q12	432	228	223	223
q13	17764	3006	3055	3006
q14	239	209	210	209
q15	586	531	529	529
q16	655	591	573	573
q17	997	513	547	513
q18	7331	6859	6626	6626
q19	1340	988	1011	988
q20	480	177	181	177
q21	4058	3191	3205	3191
q22	384	324	316	316
Total cold run time: 106953 ms
Total hot run time: 40046 ms

----- Round 2, with runtime_filter_mode=off -----
q1	7232	7276	7228	7228
q2	328	229	231	229
q3	2910	2809	3064	2809
q4	2124	1784	1836	1784
q5	5670	5733	5608	5608
q6	230	139	136	136
q7	2187	1790	1818	1790
q8	3372	3525	3518	3518
q9	9052	9033	9063	9033
q10	3594	3570	3527	3527
q11	599	506	507	506
q12	822	628	602	602
q13	12165	3178	3229	3178
q14	298	280	277	277
q15	564	520	511	511
q16	713	643	638	638
q17	1831	1597	1559	1559
q18	7909	7472	7564	7472
q19	1650	1521	1511	1511
q20	2073	1806	1821	1806
q21	5493	5161	5236	5161
q22	640	545	576	545
Total cold run time: 71456 ms
Total hot run time: 59428 ms

dataroaring
dataroaring previously approved these changes Dec 9, 2024
Copy link
Contributor

@dataroaring dataroaring left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@github-actions github-actions bot added the approved Indicates a PR has been approved by one committer. label Dec 9, 2024
Copy link
Contributor

github-actions bot commented Dec 9, 2024

PR approved by at least one committer and no changes requested.

Copy link
Contributor

github-actions bot commented Dec 9, 2024

PR approved by anyone and no changes requested.

@doris-robot
Copy link

TPC-DS: Total hot run time: 190574 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
TPC-DS sf100 test result on commit 159599a604213c5f0f9e1863b14d3fd7a4530262, data reload: false

query1	950	401	369	369
query2	6512	2135	2074	2074
query3	6705	221	217	217
query4	34099	23568	23490	23490
query5	4359	463	445	445
query6	300	211	198	198
query7	4623	306	304	304
query8	302	242	245	242
query9	9492	2713	2717	2713
query10	473	250	260	250
query11	18325	15119	15396	15119
query12	157	111	105	105
query13	1664	402	400	400
query14	8973	6608	8233	6608
query15	285	185	192	185
query16	8226	434	502	434
query17	1817	585	579	579
query18	2150	303	304	303
query19	411	149	145	145
query20	120	110	111	110
query21	203	104	107	104
query22	4388	4175	4175	4175
query23	35257	34915	34223	34223
query24	11593	2465	2414	2414
query25	684	375	377	375
query26	1803	149	152	149
query27	2873	269	266	266
query28	7961	2451	2417	2417
query29	1054	419	401	401
query30	297	158	154	154
query31	1037	800	800	800
query32	96	55	56	55
query33	775	292	301	292
query34	997	502	514	502
query35	891	726	743	726
query36	1103	931	947	931
query37	274	73	72	72
query38	4321	4324	4224	4224
query39	1459	1449	1443	1443
query40	287	96	99	96
query41	47	43	44	43
query42	111	98	96	96
query43	536	493	507	493
query44	1306	795	799	795
query45	189	164	159	159
query46	1183	686	717	686
query47	1968	1846	1869	1846
query48	400	305	304	304
query49	1280	391	399	391
query50	790	383	372	372
query51	7343	7198	7060	7060
query52	100	86	87	86
query53	248	177	180	177
query54	1247	422	399	399
query55	79	75	77	75
query56	250	241	260	241
query57	1252	1128	1127	1127
query58	228	205	217	205
query59	3258	3062	3063	3062
query60	276	264	250	250
query61	106	103	101	101
query62	882	662	684	662
query63	211	185	179	179
query64	5040	660	619	619
query65	3303	3209	3263	3209
query66	1428	357	307	307
query67	16190	15749	15566	15566
query68	5073	552	531	531
query69	410	245	251	245
query70	1150	1134	1143	1134
query71	338	249	246	246
query72	6316	4198	4002	4002
query73	779	353	352	352
query74	10530	8993	8963	8963
query75	3401	2683	2647	2647
query76	3071	1118	1163	1118
query77	526	264	278	264
query78	10375	9492	9457	9457
query79	1095	604	588	588
query80	739	412	480	412
query81	508	235	231	231
query82	358	124	118	118
query83	172	148	139	139
query84	239	77	72	72
query85	1447	305	290	290
query86	445	308	292	292
query87	4702	4559	4637	4559
query88	4073	2174	2207	2174
query89	400	299	304	299
query90	2180	186	178	178
query91	136	103	103	103
query92	63	51	51	51
query93	1492	541	533	533
query94	1038	291	271	271
query95	354	243	247	243
query96	604	274	282	274
query97	2872	2755	2665	2665
query98	238	204	190	190
query99	1536	1324	1314	1314
Total cold run time: 303207 ms
Total hot run time: 190574 ms

@doris-robot
Copy link

ClickBench: Total hot run time: 32.61 s
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/clickbench-tools
ClickBench test result on commit 159599a604213c5f0f9e1863b14d3fd7a4530262, data reload: false

query1	0.05	0.03	0.03
query2	0.06	0.04	0.02
query3	0.24	0.08	0.06
query4	1.61	0.11	0.11
query5	0.44	0.43	0.42
query6	1.17	0.65	0.67
query7	0.02	0.02	0.02
query8	0.04	0.03	0.03
query9	0.58	0.52	0.51
query10	0.57	0.56	0.57
query11	0.14	0.10	0.12
query12	0.14	0.12	0.11
query13	0.61	0.61	0.59
query14	2.83	2.76	2.82
query15	0.90	0.84	0.84
query16	0.38	0.40	0.38
query17	1.07	1.06	1.05
query18	0.23	0.21	0.21
query19	1.94	1.86	2.03
query20	0.02	0.01	0.01
query21	15.35	0.63	0.60
query22	2.48	2.40	1.55
query23	17.03	1.22	0.76
query24	3.33	1.27	1.99
query25	0.30	0.09	0.15
query26	0.48	0.13	0.12
query27	0.05	0.04	0.04
query28	9.55	1.10	1.07
query29	12.54	3.20	3.18
query30	0.26	0.06	0.07
query31	2.88	0.37	0.38
query32	3.27	0.46	0.46
query33	3.01	2.97	3.06
query34	17.22	4.46	4.45
query35	4.49	4.54	4.50
query36	0.65	0.49	0.50
query37	0.08	0.06	0.06
query38	0.05	0.04	0.03
query39	0.03	0.02	0.02
query40	0.17	0.13	0.13
query41	0.08	0.02	0.02
query42	0.04	0.02	0.02
query43	0.04	0.03	0.03
Total cold run time: 106.42 s
Total hot run time: 32.61 s

@Yukang-Lian Yukang-Lian force-pushed the Optimize_Compaction_Task_Permit_Allocation branch from 159599a to 1084df5 Compare December 9, 2024 11:24
@github-actions github-actions bot removed the approved Indicates a PR has been approved by one committer. label Dec 9, 2024
Copy link
Contributor

github-actions bot commented Dec 9, 2024

clang-tidy review says "All clean, LGTM! 👍"

@Yukang-Lian
Copy link
Collaborator Author

run buildall

@doris-robot
Copy link

TeamCity be ut coverage result:
Function Coverage: 38.80% (10103/26037)
Line Coverage: 29.69% (84685/285194)
Region Coverage: 28.77% (43487/151145)
Branch Coverage: 25.32% (22090/87228)
Coverage Report: http://coverage.selectdb-in.cc/coverage/1084df50d9f01420ec9bb1be037012e453fc8e5e_1084df50d9f01420ec9bb1be037012e453fc8e5e/report/index.html

@doris-robot
Copy link

TPC-H: Total hot run time: 39867 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit 1084df50d9f01420ec9bb1be037012e453fc8e5e, data reload: false

------ Round 1 ----------------------------------
q1	17602	7542	7323	7323
q2	2050	182	168	168
q3	10625	1102	1154	1102
q4	10574	743	739	739
q5	7610	2724	2593	2593
q6	239	149	147	147
q7	1002	630	606	606
q8	9261	1859	1927	1859
q9	6639	6469	6461	6461
q10	7026	2321	2348	2321
q11	466	255	253	253
q12	431	232	224	224
q13	17769	3065	3054	3054
q14	246	208	206	206
q15	578	552	526	526
q16	653	591	591	591
q17	969	573	565	565
q18	7236	6673	6760	6673
q19	1345	1057	975	975
q20	467	192	184	184
q21	4132	3143	2976	2976
q22	390	322	321	321
Total cold run time: 107310 ms
Total hot run time: 39867 ms

----- Round 2, with runtime_filter_mode=off -----
q1	7223	7262	7239	7239
q2	340	233	229	229
q3	2917	2785	2945	2785
q4	2057	1814	1804	1804
q5	5663	5653	5638	5638
q6	229	147	141	141
q7	2241	1765	1796	1765
q8	3379	3570	3483	3483
q9	9080	9038	9073	9038
q10	3595	3551	3565	3551
q11	601	501	507	501
q12	791	603	604	603
q13	10159	3245	3193	3193
q14	321	270	267	267
q15	575	534	535	534
q16	668	666	635	635
q17	1878	1622	1629	1622
q18	8226	7688	7473	7473
q19	1705	1563	1588	1563
q20	2119	1909	1864	1864
q21	5717	5546	5423	5423
q22	683	573	560	560
Total cold run time: 70167 ms
Total hot run time: 59911 ms

@doris-robot
Copy link

TPC-DS: Total hot run time: 197678 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
TPC-DS sf100 test result on commit 1084df50d9f01420ec9bb1be037012e453fc8e5e, data reload: false

query1	1237	969	950	950
query2	6250	2025	1965	1965
query3	10967	4585	4669	4585
query4	33584	23682	23431	23431
query5	3502	473	448	448
query6	293	198	179	179
query7	3984	298	313	298
query8	297	241	249	241
query9	9596	2731	2731	2731
query10	478	253	253	253
query11	17730	15379	15263	15263
query12	156	102	102	102
query13	1556	423	400	400
query14	9506	7289	7369	7289
query15	251	212	189	189
query16	8006	470	474	470
query17	1538	590	597	590
query18	2173	326	303	303
query19	358	159	159	159
query20	117	109	112	109
query21	212	111	103	103
query22	4973	4365	4319	4319
query23	35906	34506	34770	34506
query24	10964	2539	2433	2433
query25	638	427	420	420
query26	1247	207	157	157
query27	2437	302	292	292
query28	7484	2505	2449	2449
query29	836	418	407	407
query30	237	153	152	152
query31	1041	878	818	818
query32	100	55	54	54
query33	775	292	301	292
query34	1049	536	552	536
query35	875	768	781	768
query36	1129	964	965	964
query37	149	81	74	74
query38	4501	4497	4462	4462
query39	1513	1469	1482	1469
query40	207	108	104	104
query41	48	44	43	43
query42	117	100	95	95
query43	539	491	504	491
query44	1322	849	837	837
query45	198	170	172	170
query46	1200	728	750	728
query47	2057	1913	1954	1913
query48	419	320	327	320
query49	927	423	397	397
query50	842	416	404	404
query51	7679	7254	7251	7251
query52	105	93	91	91
query53	259	188	198	188
query54	1207	408	427	408
query55	86	85	77	77
query56	291	245	244	244
query57	1237	1091	1109	1091
query58	231	205	199	199
query59	3120	2976	2950	2950
query60	268	246	238	238
query61	110	110	105	105
query62	849	658	670	658
query63	214	198	187	187
query64	3964	661	629	629
query65	3302	3259	3237	3237
query66	735	309	299	299
query67	16037	15765	15503	15503
query68	4519	555	555	555
query69	415	250	255	250
query70	1183	1162	1130	1130
query71	371	252	250	250
query72	6379	4162	4086	4086
query73	779	365	366	365
query74	10216	8999	9085	8999
query75	3352	2691	2691	2691
query76	2561	1112	1050	1050
query77	419	294	288	288
query78	10374	9541	9423	9423
query79	1114	596	613	596
query80	883	437	442	437
query81	557	234	238	234
query82	544	122	117	117
query83	241	145	149	145
query84	231	75	70	70
query85	1229	303	372	303
query86	371	287	293	287
query87	4713	4716	4533	4533
query88	3271	2241	2195	2195
query89	424	299	292	292
query90	1879	192	188	188
query91	138	104	101	101
query92	61	51	51	51
query93	1679	551	536	536
query94	689	298	299	298
query95	345	252	254	252
query96	629	279	288	279
query97	2871	2672	2701	2672
query98	211	202	201	201
query99	1539	1317	1329	1317
Total cold run time: 299402 ms
Total hot run time: 197678 ms

@doris-robot
Copy link

ClickBench: Total hot run time: 32.72 s
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/clickbench-tools
ClickBench test result on commit 1084df50d9f01420ec9bb1be037012e453fc8e5e, data reload: false

query1	0.03	0.03	0.03
query2	0.07	0.03	0.04
query3	0.23	0.07	0.07
query4	1.62	0.10	0.11
query5	0.43	0.40	0.40
query6	1.16	0.66	0.65
query7	0.02	0.02	0.01
query8	0.05	0.04	0.02
query9	0.57	0.52	0.51
query10	0.55	0.54	0.54
query11	0.14	0.10	0.11
query12	0.14	0.11	0.12
query13	0.61	0.61	0.60
query14	2.72	2.87	2.76
query15	0.92	0.83	0.83
query16	0.38	0.38	0.39
query17	1.06	1.03	1.05
query18	0.23	0.21	0.21
query19	1.96	1.88	2.02
query20	0.01	0.01	0.01
query21	15.36	0.58	0.58
query22	2.51	1.64	1.47
query23	17.07	0.94	0.84
query24	3.08	1.51	1.39
query25	0.23	0.18	0.07
query26	0.52	0.13	0.13
query27	0.05	0.05	0.04
query28	10.33	1.10	1.08
query29	12.55	3.20	3.18
query30	0.25	0.06	0.07
query31	2.87	0.39	0.37
query32	3.26	0.48	0.46
query33	3.01	3.04	3.04
query34	16.81	4.49	4.45
query35	4.52	4.49	4.51
query36	0.65	0.49	0.48
query37	0.09	0.06	0.06
query38	0.04	0.03	0.03
query39	0.03	0.03	0.03
query40	0.17	0.13	0.13
query41	0.08	0.02	0.02
query42	0.04	0.02	0.02
query43	0.04	0.03	0.03
Total cold run time: 106.46 s
Total hot run time: 32.72 s

Copy link
Contributor

@dataroaring dataroaring left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@github-actions github-actions bot added the approved Indicates a PR has been approved by one committer. label Dec 10, 2024
Copy link
Contributor

PR approved by at least one committer and no changes requested.

@dataroaring dataroaring merged commit 82d799a into apache:master Dec 10, 2024
26 of 28 checks passed
github-actions bot pushed a commit that referenced this pull request Dec 10, 2024
…45197)

The current implementation of compaction task submission reserves
permits before task execution, which can lead to inefficient resource
utilization. Tasks waiting in the thread pool queue may hold permits,
potentially blocking other tasks from being executed.

## Solution

Change total_permits_for_compaction_score to 1,000,000, which will
effectively remove the limit on total permits. The original purpose of
total permits was to control the memory of compaction tasks, but
currently, memory is controlled by individual compaction tasks, so total
permits are no longer serving any purpose. If no memory issues arise in
the next two versions after making this change, we will remove the
permits mechanism.
github-actions bot pushed a commit that referenced this pull request Dec 10, 2024
…45197)

The current implementation of compaction task submission reserves
permits before task execution, which can lead to inefficient resource
utilization. Tasks waiting in the thread pool queue may hold permits,
potentially blocking other tasks from being executed.

## Solution

Change total_permits_for_compaction_score to 1,000,000, which will
effectively remove the limit on total permits. The original purpose of
total permits was to control the memory of compaction tasks, but
currently, memory is controlled by individual compaction tasks, so total
permits are no longer serving any purpose. If no memory issues arise in
the next two versions after making this change, we will remove the
permits mechanism.
yiguolei pushed a commit that referenced this pull request Dec 11, 2024
yiguolei pushed a commit that referenced this pull request Dec 11, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by one committer. dev/2.1.8-merged dev/3.0.4-merged reviewed
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants