Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[improve](move-memtable) reduce memory usage by sharing flush tokens #45813

Closed
wants to merge 1 commit into from

Conversation

kaijchen
Copy link
Contributor

What problem does this PR solve?

Fix OOM due to too many flush tokens being created.
Share downstream flush tokens by index.

Check List (For Author)

  • Test

    • Regression test
    • Unit Test
    • Manual test (add detailed scripts or steps below)
    • No need to test or manual test. Explain why:
      • This is a refactor/code format and no logic has been changed.
      • Previous test can cover this change.
      • No code files have been changed.
      • Other reason
  • Behavior changed:

    • No.
    • Yes.
  • Does this need documentation?

    • No.
    • Yes.

Check List (For Reviewer who merge this PR)

  • Confirm the release note
  • Confirm test cases
  • Confirm document
  • Add branch pick label

@Thearas
Copy link
Contributor

Thearas commented Dec 23, 2024

Thank you for your contribution to Apache Doris.
Don't know what should be done next? See How to process your PR.

Please clearly describe your PR:

  1. What problem was fixed (it's best to include specific error reporting information). How it was fixed.
  2. Which behaviors were modified. What was the previous behavior, what is it now, why was it modified, and what possible impacts might there be.
  3. What features were added. Why was this function added?
  4. Which code was refactored and why was this part of the code refactored?
  5. Which functions were optimized and what is the difference before and after the optimization?

@kaijchen
Copy link
Contributor Author

run buildall

@doris-robot
Copy link

TPC-H: Total hot run time: 40024 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit b36e1449c0c0b27d0e27246fca043bfbdf26d965, data reload: false

------ Round 1 ----------------------------------
q1	17622	7622	7350	7350
q2	2054	184	186	184
q3	10522	1157	1167	1157
q4	10234	688	700	688
q5	7611	2771	2694	2694
q6	244	150	153	150
q7	991	629	624	624
q8	9248	1877	1931	1877
q9	6683	6544	6468	6468
q10	7060	2293	2388	2293
q11	474	266	274	266
q12	425	221	222	221
q13	17823	2983	2966	2966
q14	252	208	215	208
q15	555	498	487	487
q16	662	587	583	583
q17	987	554	557	554
q18	7412	6837	6544	6544
q19	1345	1049	1078	1049
q20	466	197	185	185
q21	4362	3166	3231	3166
q22	385	315	310	310
Total cold run time: 107417 ms
Total hot run time: 40024 ms

----- Round 2, with runtime_filter_mode=off -----
q1	7236	7229	7220	7220
q2	332	230	231	230
q3	2915	2820	3110	2820
q4	2106	1787	1807	1787
q5	5626	5644	5610	5610
q6	223	135	133	133
q7	2165	1767	1810	1767
q8	3363	3559	3477	3477
q9	8939	8931	8975	8931
q10	3583	3593	3542	3542
q11	610	498	504	498
q12	844	642	595	595
q13	12845	3135	3135	3135
q14	311	273	277	273
q15	551	514	507	507
q16	686	645	627	627
q17	1836	1559	1564	1559
q18	7879	7537	7359	7359
q19	1685	1519	1549	1519
q20	2026	1848	1810	1810
q21	5424	5357	5293	5293
q22	632	568	550	550
Total cold run time: 71817 ms
Total hot run time: 59242 ms

@doris-robot
Copy link

TPC-DS: Total hot run time: 189851 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
TPC-DS sf100 test result on commit b36e1449c0c0b27d0e27246fca043bfbdf26d965, data reload: false

query1	967	373	384	373
query2	6530	2459	2413	2413
query3	6707	214	211	211
query4	34044	23359	23623	23359
query5	4307	507	490	490
query6	276	202	205	202
query7	4620	320	319	319
query8	304	249	238	238
query9	9571	2781	2752	2752
query10	479	270	256	256
query11	17998	15100	15289	15100
query12	167	109	106	106
query13	1683	433	446	433
query14	11316	6687	6685	6685
query15	305	174	199	174
query16	8238	410	470	410
query17	1764	587	586	586
query18	2161	328	293	293
query19	362	151	151	151
query20	113	110	109	109
query21	211	104	102	102
query22	4575	4223	4023	4023
query23	34249	33328	33197	33197
query24	11254	2421	2445	2421
query25	665	383	386	383
query26	1793	167	150	150
query27	2894	326	331	326
query28	7900	2450	2449	2449
query29	1024	402	413	402
query30	301	152	149	149
query31	1044	806	814	806
query32	102	59	57	57
query33	791	315	298	298
query34	962	518	511	511
query35	925	745	739	739
query36	1124	943	963	943
query37	286	77	80	77
query38	4223	4128	4085	4085
query39	1493	1441	1419	1419
query40	286	108	110	108
query41	48	43	44	43
query42	115	98	103	98
query43	542	493	480	480
query44	1219	808	802	802
query45	181	167	178	167
query46	1164	698	714	698
query47	1949	1817	1843	1817
query48	408	335	328	328
query49	1285	393	385	385
query50	822	374	395	374
query51	7165	7106	7148	7106
query52	106	101	98	98
query53	256	181	180	180
query54	1274	417	397	397
query55	86	79	81	79
query56	253	240	244	240
query57	1243	1109	1145	1109
query58	250	221	229	221
query59	3349	3071	3077	3071
query60	265	238	258	238
query61	112	105	108	105
query62	885	695	672	672
query63	214	185	201	185
query64	5040	688	644	644
query65	3272	3173	3254	3173
query66	1410	310	319	310
query67	15847	15550	15534	15534
query68	5605	561	569	561
query69	437	257	253	253
query70	1171	1121	1128	1121
query71	458	260	256	256
query72	6430	4078	4203	4078
query73	789	361	368	361
query74	10571	8772	9006	8772
query75	3394	2625	2627	2625
query76	3363	1249	1193	1193
query77	547	281	363	281
query78	10308	9518	11564	9518
query79	1280	604	604	604
query80	816	437	447	437
query81	523	226	232	226
query82	222	123	121	121
query83	238	151	157	151
query84	240	70	73	70
query85	1651	333	303	303
query86	497	301	302	301
query87	4521	4337	4461	4337
query88	3581	2241	2228	2228
query89	398	299	286	286
query90	2132	194	194	194
query91	145	105	116	105
query92	67	53	53	53
query93	1072	551	562	551
query94	995	289	285	285
query95	363	254	256	254
query96	619	291	287	287
query97	2846	2688	2662	2662
query98	213	197	219	197
query99	1562	1344	1321	1321
Total cold run time: 303828 ms
Total hot run time: 189851 ms

@doris-robot
Copy link

TeamCity be ut coverage result:
Function Coverage: 38.79% (10092/26015)
Line Coverage: 29.77% (85092/285851)
Region Coverage: 28.89% (43448/150396)
Branch Coverage: 25.42% (22143/87114)
Coverage Report: http://coverage.selectdb-in.cc/coverage/b36e1449c0c0b27d0e27246fca043bfbdf26d965_b36e1449c0c0b27d0e27246fca043bfbdf26d965/report/index.html

@doris-robot
Copy link

ClickBench: Total hot run time: 31.63 s
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/clickbench-tools
ClickBench test result on commit b36e1449c0c0b27d0e27246fca043bfbdf26d965, data reload: false

query1	0.03	0.03	0.04
query2	0.06	0.04	0.03
query3	0.24	0.08	0.07
query4	1.60	0.10	0.11
query5	0.43	0.41	0.44
query6	1.14	0.67	0.65
query7	0.03	0.02	0.02
query8	0.04	0.03	0.02
query9	0.57	0.52	0.51
query10	0.55	0.60	0.57
query11	0.14	0.10	0.10
query12	0.14	0.11	0.10
query13	0.60	0.61	0.60
query14	2.74	2.86	2.77
query15	0.90	0.84	0.82
query16	0.38	0.38	0.40
query17	1.07	0.99	1.06
query18	0.23	0.22	0.20
query19	1.96	1.81	2.04
query20	0.01	0.01	0.01
query21	15.37	0.63	0.59
query22	2.71	1.96	1.38
query23	16.95	0.90	0.96
query24	3.00	0.27	1.36
query25	0.24	0.11	0.05
query26	0.44	0.14	0.14
query27	0.05	0.04	0.05
query28	10.98	1.11	1.07
query29	12.56	3.27	3.28
query30	0.24	0.07	0.06
query31	2.85	0.38	0.39
query32	3.23	0.47	0.47
query33	3.04	3.14	3.11
query34	16.96	4.46	4.47
query35	4.50	4.47	4.46
query36	0.71	0.49	0.50
query37	0.10	0.07	0.06
query38	0.04	0.04	0.03
query39	0.04	0.02	0.02
query40	0.17	0.12	0.13
query41	0.07	0.02	0.02
query42	0.04	0.02	0.03
query43	0.04	0.03	0.03
Total cold run time: 107.19 s
Total hot run time: 31.63 s

@kaijchen kaijchen closed this Dec 26, 2024
@@ -79,6 +79,7 @@ class TabletStream {
RuntimeProfile::Counter* _add_segment_timer = nullptr;
RuntimeProfile::Counter* _close_wait_timer = nullptr;
LoadStreamMgr* _load_stream_mgr = nullptr;
std::shared_ptr<FlushTokens> _flush_tokens;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be better to use one flush token for one tablet.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants