Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[improvement](segment) reduce memory usage when open segments #46570

Merged
merged 2 commits into from
Jan 9, 2025

Conversation

jacktengg
Copy link
Contributor

@jacktengg jacktengg commented Jan 7, 2025

What problem does this PR solve?

Issue Number: close #xxx

Related PR: #xxx

Problem Summary:
When there are a lot of segments in one rowset, it will consume plenty of memory if open all the segments all at once. This PR open segments one by one and release the Segment object immediately if it's not need to be kept for later use, thus reduce memory footprints dramatically.

Release note

None

Check List (For Author)

  • Test

    • Regression test
    • Unit Test
    • Manual test (add detailed scripts or steps below)
    • No need to test or manual test. Explain why:
      • This is a refactor/code format and no logic has been changed.
      • Previous test can cover this change.
      • No code files have been changed.
      • Other reason
  • Behavior changed:

    • No.
    • Yes.
  • Does this need documentation?

    • No.
    • Yes.

Check List (For Reviewer who merge this PR)

  • Confirm the release note
  • Confirm test cases
  • Confirm document
  • Add branch pick label

@hello-stephen
Copy link
Contributor

Thank you for your contribution to Apache Doris.
Don't know what should be done next? See How to process your PR.

Please clearly describe your PR:

  1. What problem was fixed (it's best to include specific error reporting information). How it was fixed.
  2. Which behaviors were modified. What was the previous behavior, what is it now, why was it modified, and what possible impacts might there be.
  3. What features were added. Why was this function added?
  4. Which code was refactored and why was this part of the code refactored?
  5. Which functions were optimized and what is the difference before and after the optimization?

@jacktengg jacktengg force-pushed the 0107-opt-open-segments branch from e971b1a to 36370c7 Compare January 7, 2025 14:56
@jacktengg
Copy link
Contributor Author

run buildall

@jacktengg jacktengg force-pushed the 0107-opt-open-segments branch 2 times, most recently from 061c170 to a2e2bcf Compare January 7, 2025 15:04
@jacktengg
Copy link
Contributor Author

run buildall

@jacktengg jacktengg force-pushed the 0107-opt-open-segments branch from a2e2bcf to 6921b73 Compare January 7, 2025 15:06
@jacktengg
Copy link
Contributor Author

run buildall

@jacktengg jacktengg force-pushed the 0107-opt-open-segments branch from 6921b73 to 842ce50 Compare January 7, 2025 15:14
@jacktengg
Copy link
Contributor Author

run buildall

@doris-robot
Copy link

TPC-H: Total hot run time: 34231 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit 842ce50923648cfaa4368993d68f68f27b336cb0, data reload: false

------ Round 1 ----------------------------------
q1	17576	6701	6551	6551
q2	2049	313	168	168
q3	10409	1443	791	791
q4	10211	963	465	465
q5	7614	2423	2260	2260
q6	231	202	156	156
q7	1010	823	609	609
q8	9264	1530	1320	1320
q9	5726	5211	5389	5211
q10	6829	2391	1886	1886
q11	522	284	264	264
q12	360	396	233	233
q13	17789	3919	3137	3137
q14	252	244	222	222
q15	619	518	506	506
q16	629	623	590	590
q17	607	908	327	327
q18	7192	6491	6446	6446
q19	2426	990	595	595
q20	303	316	187	187
q21	2855	2201	1992	1992
q22	359	348	315	315
Total cold run time: 104832 ms
Total hot run time: 34231 ms

----- Round 2, with runtime_filter_mode=off -----
q1	6769	6606	6669	6606
q2	234	327	229	229
q3	2302	2680	2326	2326
q4	1395	1810	1370	1370
q5	4390	5041	4900	4900
q6	195	174	144	144
q7	2152	2043	1843	1843
q8	2646	2833	2718	2718
q9	7416	7384	7420	7384
q10	3030	3363	2914	2914
q11	583	511	514	511
q12	687	768	627	627
q13	3538	3897	3322	3322
q14	283	318	287	287
q15	587	510	494	494
q16	654	695	650	650
q17	1229	1776	1254	1254
q18	7930	7429	7466	7429
q19	878	1205	1112	1112
q20	2018	2056	1892	1892
q21	5606	5175	5106	5106
q22	645	618	584	584
Total cold run time: 55167 ms
Total hot run time: 53702 ms

if (seg_start == seg_end) {
seg_start = 0;
seg_end = segments.size();
_segments_rows));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

把这个 _segment_rows 属性去掉吧,实际这个就这里用了一次,我们如果缓存下来,我怕有问题

if (seg_start == seg_end) {
seg_start = 0;
seg_end = segments.size();
_segments_rows));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should also check rowid_conversion != nullptr, to avoid core

@@ -40,7 +40,7 @@ struct RowSetSplits {
// if segment_offsets is not empty, means we only scan
// [pair.first, pair.second) segment in rs_reader, only effective in dup key
// and pipeline
std::pair<int, int> segment_offsets;
std::pair<int64_t, int64_t> segment_offsets;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove get_segment_num_rows method from this class

const auto& tmp_segments = segment_cache_handle.get_segments();
_segments_rows[i] = tmp_segments[0]->num_rows();
if (i >= seg_start && i < seg_end) {
segments[i] = tmp_segments[0];
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这行代码是什么意思?

}
if (_read_context->record_rowids) {
_segments_rows.resize(segment_count);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这里需要加一个comment,解释一下什么时候会走到这里

auto segment_count = _rowset->num_segments();
std::vector<segment_v2::SegmentSharedPtr> segments(segment_count);
auto [seg_start, seg_end] = _segment_offsets;
if (seg_start == seg_end) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这里也需要加一下注释,为什么== 的时候end =segment_count 了

@doris-robot
Copy link

TPC-DS: Total hot run time: 198259 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
TPC-DS sf100 test result on commit 842ce50923648cfaa4368993d68f68f27b336cb0, data reload: false

query1	1317	970	920	920
query2	6376	2487	2516	2487
query3	11081	4776	4827	4776
query4	33105	26098	23611	23611
query5	3694	613	467	467
query6	282	195	203	195
query7	3995	495	309	309
query8	326	237	219	219
query9	9148	2674	2663	2663
query10	446	320	243	243
query11	17987	15334	15216	15216
query12	166	107	101	101
query13	1557	525	404	404
query14	10387	7436	7998	7436
query15	238	212	195	195
query16	8490	582	460	460
query17	1579	765	568	568
query18	2049	418	309	309
query19	196	195	159	159
query20	136	117	123	117
query21	214	121	106	106
query22	4681	4757	4537	4537
query23	34323	33912	33783	33783
query24	6456	2346	2339	2339
query25	467	490	416	416
query26	795	302	161	161
query27	1975	464	339	339
query28	5780	2505	2497	2497
query29	590	552	434	434
query30	207	176	151	151
query31	1001	952	866	866
query32	75	57	56	56
query33	481	361	295	295
query34	788	838	526	526
query35	816	823	773	773
query36	1022	1055	971	971
query37	114	96	72	72
query38	4279	4430	4148	4148
query39	1511	1489	1452	1452
query40	202	117	103	103
query41	51	43	46	43
query42	121	103	102	102
query43	533	551	523	523
query44	1367	853	831	831
query45	187	176	176	176
query46	897	1098	669	669
query47	2092	2008	1982	1982
query48	392	431	319	319
query49	716	480	382	382
query50	639	692	396	396
query51	7072	6920	6974	6920
query52	106	107	96	96
query53	230	254	183	183
query54	492	526	436	436
query55	84	79	90	79
query56	259	254	248	248
query57	1247	1275	1166	1166
query58	259	236	234	234
query59	3197	3648	3419	3419
query60	280	263	245	245
query61	110	110	110	110
query62	884	826	785	785
query63	238	192	199	192
query64	2838	1039	655	655
query65	3382	3286	3346	3286
query66	835	419	310	310
query67	16325	16062	15599	15599
query68	7772	699	516	516
query69	495	294	245	245
query70	1196	1152	1124	1124
query71	396	285	248	248
query72	6367	3983	3865	3865
query73	648	758	375	375
query74	10120	9160	9167	9160
query75	3213	3168	2678	2678
query76	3267	1206	789	789
query77	469	382	269	269
query78	11106	10080	9495	9495
query79	1946	781	604	604
query80	596	540	442	442
query81	496	265	225	225
query82	205	148	122	122
query83	160	162	141	141
query84	236	91	70	70
query85	729	401	303	303
query86	379	309	294	294
query87	4553	4427	4510	4427
query88	4008	2190	2185	2185
query89	397	319	296	296
query90	1884	184	183	183
query91	128	139	111	111
query92	72	54	51	51
query93	1054	712	537	537
query94	632	382	284	284
query95	331	261	250	250
query96	484	617	272	272
query97	2920	2943	2807	2807
query98	234	204	198	198
query99	1499	1542	1427	1427
Total cold run time: 291157 ms
Total hot run time: 198259 ms

@doris-robot
Copy link

ClickBench: Total hot run time: 31.54 s
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/clickbench-tools
ClickBench test result on commit 842ce50923648cfaa4368993d68f68f27b336cb0, data reload: false

query1	0.03	0.03	0.04
query2	0.07	0.03	0.04
query3	0.23	0.07	0.07
query4	1.62	0.10	0.10
query5	0.41	0.43	0.42
query6	1.15	0.66	0.65
query7	0.03	0.02	0.02
query8	0.04	0.03	0.03
query9	0.60	0.50	0.51
query10	0.56	0.58	0.57
query11	0.14	0.10	0.10
query12	0.15	0.11	0.11
query13	0.61	0.60	0.60
query14	2.70	2.87	2.72
query15	0.89	0.83	0.81
query16	0.39	0.39	0.39
query17	1.07	1.03	0.98
query18	0.23	0.21	0.22
query19	1.96	1.82	2.03
query20	0.01	0.01	0.01
query21	15.38	0.93	0.60
query22	0.76	0.84	0.83
query23	15.09	1.39	0.57
query24	3.23	0.96	1.66
query25	0.12	0.24	0.12
query26	0.19	0.16	0.14
query27	0.06	0.06	0.05
query28	13.74	1.49	1.04
query29	12.62	3.97	3.28
query30	0.25	0.10	0.07
query31	2.82	0.59	0.37
query32	3.23	0.53	0.47
query33	3.08	3.11	3.07
query34	16.91	5.10	4.51
query35	4.55	4.49	4.56
query36	0.66	0.49	0.48
query37	0.09	0.06	0.06
query38	0.05	0.04	0.04
query39	0.03	0.02	0.02
query40	0.17	0.14	0.13
query41	0.08	0.02	0.02
query42	0.04	0.02	0.02
query43	0.04	0.04	0.03
Total cold run time: 106.08 s
Total hot run time: 31.54 s

@jacktengg jacktengg force-pushed the 0107-opt-open-segments branch from aedcfb8 to bfdaa6f Compare January 7, 2025 16:42
@jacktengg
Copy link
Contributor Author

run buildall

@doris-robot
Copy link

TPC-H: Total hot run time: 32928 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit bfdaa6f0f6667458b4e22156adf515e3bdf533b9, data reload: false

------ Round 1 ----------------------------------
q1	17593	6199	6108	6108
q2	2041	300	166	166
q3	10420	1257	725	725
q4	10208	898	445	445
q5	7533	2257	1996	1996
q6	208	181	152	152
q7	879	754	615	615
q8	9239	1372	1177	1177
q9	5069	4917	4946	4917
q10	6745	2305	1907	1907
q11	490	282	262	262
q12	356	372	227	227
q13	17769	3701	3029	3029
q14	251	251	212	212
q15	545	506	488	488
q16	630	628	590	590
q17	578	851	334	334
q18	7119	6586	6528	6528
q19	1240	958	556	556
q20	315	324	190	190
q21	2867	2222	1992	1992
q22	370	328	312	312
Total cold run time: 102465 ms
Total hot run time: 32928 ms

----- Round 2, with runtime_filter_mode=off -----
q1	6269	6300	6282	6282
q2	244	324	229	229
q3	2268	2672	2361	2361
q4	1418	1820	1370	1370
q5	4385	4746	4826	4746
q6	183	178	146	146
q7	2093	2004	1863	1863
q8	2606	2792	2667	2667
q9	7341	7253	7320	7253
q10	3057	3369	2841	2841
q11	590	527	501	501
q12	689	745	589	589
q13	3397	3913	3265	3265
q14	284	295	281	281
q15	545	521	498	498
q16	639	692	626	626
q17	1221	1735	1268	1268
q18	7623	7444	7163	7163
q19	836	996	1100	996
q20	1888	1970	1786	1786
q21	5442	5012	4792	4792
q22	616	612	575	575
Total cold run time: 53634 ms
Total hot run time: 52098 ms

@doris-robot
Copy link

TPC-DS: Total hot run time: 189023 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
TPC-DS sf100 test result on commit bfdaa6f0f6667458b4e22156adf515e3bdf533b9, data reload: false

query1	979	374	376	374
query2	6525	2407	2476	2407
query3	6717	213	218	213
query4	33772	23487	23318	23318
query5	4265	599	446	446
query6	288	211	204	204
query7	4621	509	298	298
query8	315	255	242	242
query9	9570	2670	2658	2658
query10	472	314	261	261
query11	17950	15414	15345	15345
query12	158	110	106	106
query13	1647	539	408	408
query14	10884	6875	7792	6875
query15	267	208	183	183
query16	8227	605	415	415
query17	1589	758	554	554
query18	2105	391	288	288
query19	213	182	148	148
query20	116	112	110	110
query21	209	121	101	101
query22	4150	4158	4394	4158
query23	35815	33814	33316	33316
query24	6436	2273	2354	2273
query25	473	475	412	412
query26	1063	306	147	147
query27	1972	470	345	345
query28	5410	2448	2434	2434
query29	541	524	409	409
query30	234	186	150	150
query31	981	962	809	809
query32	91	61	59	59
query33	505	337	295	295
query34	751	852	523	523
query35	826	819	721	721
query36	997	1065	939	939
query37	118	98	76	76
query38	4148	4153	4058	4058
query39	1498	1441	1463	1441
query40	203	124	105	105
query41	48	47	44	44
query42	120	105	110	105
query43	546	548	505	505
query44	1340	818	803	803
query45	181	175	174	174
query46	868	1048	650	650
query47	1898	1947	1878	1878
query48	387	406	333	333
query49	774	479	393	393
query50	642	653	384	384
query51	6974	6990	6964	6964
query52	100	102	90	90
query53	234	259	183	183
query54	486	485	397	397
query55	82	82	80	80
query56	246	276	238	238
query57	1198	1197	1144	1144
query58	242	224	232	224
query59	3111	3323	2988	2988
query60	278	264	247	247
query61	107	110	110	110
query62	904	807	730	730
query63	235	200	189	189
query64	4548	1047	646	646
query65	3299	3206	3224	3206
query66	826	433	306	306
query67	15838	15740	15397	15397
query68	7653	718	530	530
query69	449	293	261	261
query70	1233	1179	1138	1138
query71	396	286	254	254
query72	6089	2727	3960	2727
query73	649	758	355	355
query74	10095	9164	9264	9164
query75	3272	3175	2666	2666
query76	3462	1200	804	804
query77	635	385	284	284
query78	10211	10082	9539	9539
query79	4032	807	594	594
query80	757	520	448	448
query81	485	273	230	230
query82	649	160	121	121
query83	162	167	224	167
query84	242	96	78	78
query85	755	355	296	296
query86	406	321	305	305
query87	4426	4661	4346	4346
query88	5017	2186	2169	2169
query89	418	341	298	298
query90	1813	192	188	188
query91	135	138	104	104
query92	65	58	53	53
query93	2579	908	543	543
query94	685	405	290	290
query95	330	266	249	249
query96	486	678	275	275
query97	2854	2994	2848	2848
query98	221	203	196	196
query99	1727	1559	1471	1471
Total cold run time: 295001 ms
Total hot run time: 189023 ms

@doris-robot
Copy link

ClickBench: Total hot run time: 31.11 s
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/clickbench-tools
ClickBench test result on commit bfdaa6f0f6667458b4e22156adf515e3bdf533b9, data reload: false

query1	0.04	0.03	0.03
query2	0.07	0.04	0.03
query3	0.24	0.07	0.07
query4	1.60	0.11	0.11
query5	0.43	0.41	0.39
query6	1.15	0.66	0.65
query7	0.02	0.01	0.02
query8	0.03	0.04	0.03
query9	0.58	0.50	0.51
query10	0.55	0.56	0.54
query11	0.15	0.11	0.10
query12	0.14	0.12	0.11
query13	0.60	0.60	0.60
query14	2.74	2.72	2.85
query15	0.91	0.83	0.81
query16	0.38	0.39	0.37
query17	1.02	1.05	1.05
query18	0.22	0.20	0.21
query19	1.87	1.87	1.94
query20	0.02	0.01	0.02
query21	15.37	0.97	0.59
query22	0.74	0.92	0.67
query23	15.16	1.44	0.61
query24	3.20	1.71	0.63
query25	0.16	0.07	0.12
query26	0.32	0.14	0.14
query27	0.08	0.05	0.04
query28	13.91	1.49	1.05
query29	12.56	3.94	3.28
query30	0.27	0.09	0.07
query31	2.83	0.59	0.38
query32	3.23	0.53	0.47
query33	3.08	3.07	3.12
query34	16.98	5.11	4.53
query35	4.53	4.51	4.52
query36	0.62	0.49	0.50
query37	0.10	0.06	0.06
query38	0.05	0.04	0.03
query39	0.04	0.03	0.02
query40	0.17	0.14	0.13
query41	0.08	0.02	0.03
query42	0.03	0.02	0.02
query43	0.04	0.03	0.03
Total cold run time: 106.31 s
Total hot run time: 31.11 s

@jacktengg jacktengg force-pushed the 0107-opt-open-segments branch from bfdaa6f to df41457 Compare January 8, 2025 09:03
@jacktengg
Copy link
Contributor Author

run buildall

@doris-robot
Copy link

TPC-H: Total hot run time: 32999 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit df41457c1f14371a549a4bf812d6b4fe284e11ec, data reload: false

------ Round 1 ----------------------------------
q1	17577	6321	6129	6129
q2	2050	304	168	168
q3	10415	1283	787	787
q4	10213	886	450	450
q5	7591	2237	2027	2027
q6	206	184	145	145
q7	906	787	604	604
q8	9220	1377	1226	1226
q9	5325	4991	4933	4933
q10	6826	2327	1857	1857
q11	499	280	253	253
q12	362	370	220	220
q13	17757	3664	3129	3129
q14	231	229	206	206
q15	577	501	491	491
q16	625	625	598	598
q17	584	862	319	319
q18	7063	6422	6336	6336
q19	2331	987	589	589
q20	315	314	181	181
q21	2932	2340	2045	2045
q22	365	334	306	306
Total cold run time: 103970 ms
Total hot run time: 32999 ms

----- Round 2, with runtime_filter_mode=off -----
q1	6400	6224	6238	6224
q2	237	325	229	229
q3	2270	2674	2276	2276
q4	1433	1850	1405	1405
q5	4400	4789	4837	4789
q6	185	180	142	142
q7	2069	2005	1817	1817
q8	2663	2819	2720	2720
q9	7425	7285	7345	7285
q10	3091	3335	2782	2782
q11	582	496	485	485
q12	623	723	606	606
q13	3552	3843	3222	3222
q14	300	319	272	272
q15	588	513	511	511
q16	646	665	668	665
q17	1293	1728	1250	1250
q18	7791	7574	7382	7382
q19	899	1198	1130	1130
q20	1995	2003	1872	1872
q21	5769	5282	4903	4903
q22	615	621	577	577
Total cold run time: 54826 ms
Total hot run time: 52544 ms

@doris-robot
Copy link

TPC-DS: Total hot run time: 195802 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
TPC-DS sf100 test result on commit df41457c1f14371a549a4bf812d6b4fe284e11ec, data reload: false

query1	1274	945	923	923
query2	6455	2270	2330	2270
query3	11106	4755	4951	4755
query4	32902	23450	23205	23205
query5	3683	622	447	447
query6	253	187	190	187
query7	3985	485	301	301
query8	292	249	224	224
query9	9443	2772	2765	2765
query10	472	313	253	253
query11	18135	15299	15206	15206
query12	163	104	106	104
query13	1577	541	423	423
query14	8892	7845	7155	7155
query15	247	211	203	203
query16	7904	652	475	475
query17	1539	750	553	553
query18	2029	405	307	307
query19	209	185	169	169
query20	123	117	113	113
query21	213	126	106	106
query22	4701	4483	4421	4421
query23	34152	33261	33273	33261
query24	6450	2329	2313	2313
query25	484	495	387	387
query26	727	278	156	156
query27	1988	462	340	340
query28	5183	2533	2490	2490
query29	575	586	435	435
query30	202	189	152	152
query31	970	904	800	800
query32	76	61	56	56
query33	472	352	283	283
query34	773	841	540	540
query35	796	808	744	744
query36	1026	1046	964	964
query37	126	103	80	80
query38	4064	4170	4239	4170
query39	1472	1438	1453	1438
query40	206	117	108	108
query41	48	46	44	44
query42	117	104	105	104
query43	509	510	483	483
query44	1338	837	844	837
query45	183	202	178	178
query46	892	1075	663	663
query47	1897	1913	1844	1844
query48	394	425	324	324
query49	719	480	392	392
query50	637	674	411	411
query51	7064	7076	6996	6996
query52	104	102	92	92
query53	237	265	192	192
query54	479	507	411	411
query55	82	81	78	78
query56	287	275	261	261
query57	1193	1218	1121	1121
query58	248	258	235	235
query59	3152	3303	3057	3057
query60	297	283	261	261
query61	157	131	148	131
query62	865	802	761	761
query63	234	195	241	195
query64	2831	1021	653	653
query65	3291	3218	3241	3218
query66	834	407	305	305
query67	16683	15690	15477	15477
query68	9040	684	516	516
query69	474	289	253	253
query70	1222	1129	1146	1129
query71	428	295	254	254
query72	6291	3893	3763	3763
query73	649	749	367	367
query74	9986	9341	9113	9113
query75	4418	3103	2654	2654
query76	3882	1201	777	777
query77	755	373	281	281
query78	10008	10020	9540	9540
query79	3718	805	585	585
query80	701	508	524	508
query81	473	260	227	227
query82	367	148	121	121
query83	193	166	143	143
query84	288	93	71	71
query85	744	350	309	309
query86	359	282	289	282
query87	4301	4339	4361	4339
query88	3110	2245	2216	2216
query89	437	327	300	300
query90	2128	187	185	185
query91	133	131	103	103
query92	114	54	52	52
query93	2187	909	527	527
query94	661	393	293	293
query95	336	267	252	252
query96	500	611	279	279
query97	2845	2906	2783	2783
query98	225	199	197	197
query99	1592	1515	1377	1377
Total cold run time: 292676 ms
Total hot run time: 195802 ms

@doris-robot
Copy link

ClickBench: Total hot run time: 32.19 s
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/clickbench-tools
ClickBench test result on commit df41457c1f14371a549a4bf812d6b4fe284e11ec, data reload: false

query1	0.04	0.03	0.02
query2	0.07	0.04	0.03
query3	0.24	0.07	0.06
query4	1.62	0.10	0.10
query5	0.41	0.42	0.41
query6	1.14	0.66	0.66
query7	0.02	0.02	0.02
query8	0.05	0.04	0.03
query9	0.62	0.51	0.50
query10	0.57	0.57	0.54
query11	0.15	0.10	0.10
query12	0.14	0.11	0.11
query13	0.60	0.61	0.60
query14	2.72	2.74	2.74
query15	0.89	0.83	0.82
query16	0.37	0.38	0.39
query17	1.03	1.04	1.02
query18	0.22	0.20	0.21
query19	1.91	1.80	2.02
query20	0.02	0.00	0.02
query21	15.41	0.89	0.57
query22	0.76	0.79	0.77
query23	15.22	1.40	0.61
query24	3.09	1.72	2.03
query25	0.26	0.19	0.05
query26	0.23	0.14	0.13
query27	0.06	0.05	0.05
query28	14.45	1.48	1.03
query29	12.59	3.98	3.26
query30	0.25	0.08	0.06
query31	2.82	0.59	0.38
query32	3.25	0.55	0.47
query33	3.09	3.11	3.15
query34	16.71	5.11	4.55
query35	4.51	4.49	4.51
query36	0.65	0.49	0.48
query37	0.09	0.07	0.06
query38	0.05	0.04	0.03
query39	0.02	0.03	0.02
query40	0.16	0.14	0.13
query41	0.08	0.03	0.03
query42	0.03	0.02	0.02
query43	0.03	0.04	0.03
Total cold run time: 106.64 s
Total hot run time: 32.19 s

@doris-robot
Copy link

TeamCity be ut coverage result:
Function Coverage: 38.88% (10132/26062)
Line Coverage: 29.92% (85730/286507)
Region Coverage: 29.03% (43754/150727)
Branch Coverage: 25.56% (22330/87352)
Coverage Report: http://coverage.selectdb-in.cc/coverage/df41457c1f14371a549a4bf812d6b4fe284e11ec_df41457c1f14371a549a4bf812d6b4fe284e11ec/report/index.html

@jacktengg jacktengg force-pushed the 0107-opt-open-segments branch from df41457 to 9688288 Compare January 8, 2025 16:06
@jacktengg
Copy link
Contributor Author

run buildall

// the segments in this rowset will be loaded by calling load_segments() explicitly.
auto segment_count = num_segments();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

不要在这改,这里改有问题。
// do not use cache to load index
// because the index file may conflict
// and the cached fd may be invalid
RETURN_IF_ERROR(org_rowset->load(false));
有一些备份恢复的地方会调用这个函数,此时就会带来很多不必要的IO。
我们给get_segment_num_rows 这个函数加一个lock,让他去加载,然后返回。
然后这个do load 函数,可能没啥用了,你顺道删了把。之前的实现都是空的。


DCHECK(seg_id >= 0);
auto seg_path = DORIS_TRY(segment_path(seg_id));
io::FileReaderOptions reader_options {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这个地方,我们直接调用segment loader 的load segment 方法,把cache 设置为false,是不是等价的?

RETURN_IF_ERROR(_read_context->rowid_conversion->init_segment_map(rowset()->rowset_id(),
segment_num_rows));
}
const bool use_lazy_init_iterators = !_is_merge_iterator();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这个为啥不用lazy?

@@ -86,12 +86,24 @@ std::string file_cache_key_str(const std::string& seg_path) {
return file_cache_key_from_path(seg_path).to_string();
}

Status Segment::get_num_rows(io::FileSystemSPtr fs, const std::string& path, int64_t tablet_id,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

没意义

@doris-robot
Copy link

TPC-H: Total hot run time: 9985 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit 9688288b38c74a7ea0989ef65152646e4f1e9678, data reload: false

------ Round 1 ----------------------------------
q1	2686	1356	1329	1329
q2	192	108	100	100
q3	1091	867	891	867
q4	1062	494	528	494
q5	123	127	104	104
q6	475	540	467	467
q7	140	174	132	132
q8	145	148	146	146
q9	100	102	98	98
q10	81	87	75	75
q11	90	71	67	67
q12	393	344	331	331
q13	876	740	751	740
q14	314	324	304	304
q15	880	845	827	827
q16	181	170	164	164
q17	752	192	195	192
q18	2841	1843	1805	1805
q19	287	219	220	219
q20	64	313	250	250
q21	431	884	1642	884
q22	1020	390	428	390
Total cold run time: 14224 ms
Total hot run time: 9985 ms

----- Round 2, with runtime_filter_mode=off -----
q1	1692	1368	1311	1311
q2	57	52	54	52
q3	562	548	533	533
q4	240	247	243	243
q5	147	146	142	142
q6	443	476	449	449
q7	77	82	83	82
q8	133	349	338	338
q9	730	574	572	572
q10	84	112	145	112
q11	100	58	55	55
q12	377	370	360	360
q13	778	762	743	743
q14	320	327	319	319
q15	749	759	722	722
q16	163	155	156	155
q17	255	249	253	249
q18	1253	1207	1285	1207
q19	202	209	198	198
q20	73	210	312	210
q21	411	239	178	178
q22	277	209	217	209
Total cold run time: 9123 ms
Total hot run time: 8439 ms

@jacktengg jacktengg force-pushed the 0107-opt-open-segments branch from 9688288 to ca2ea85 Compare January 9, 2025 03:11
@jacktengg jacktengg force-pushed the 0107-opt-open-segments branch from ca2ea85 to edb84dc Compare January 9, 2025 03:13
@jacktengg
Copy link
Contributor Author

run buildall

Copy link
Contributor

@yiguolei yiguolei left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link
Contributor

github-actions bot commented Jan 9, 2025

PR approved by at least one committer and no changes requested.

@github-actions github-actions bot added approved Indicates a PR has been approved by one committer. reviewed labels Jan 9, 2025
Copy link
Contributor

github-actions bot commented Jan 9, 2025

PR approved by anyone and no changes requested.

@doris-robot
Copy link

TPC-H: Total hot run time: 32583 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit 3c0e077f93497bc80e02703c57bbb99579d606d0, data reload: false

------ Round 1 ----------------------------------
q1	17646	6218	6014	6014
q2	2047	310	170	170
q3	10410	1308	718	718
q4	10205	886	442	442
q5	7533	2200	1982	1982
q6	203	176	142	142
q7	905	766	611	611
q8	9236	1403	1228	1228
q9	5162	4835	4931	4835
q10	6716	2284	1832	1832
q11	479	293	253	253
q12	344	361	226	226
q13	17775	3695	3103	3103
q14	229	235	209	209
q15	564	517	490	490
q16	652	627	568	568
q17	577	860	327	327
q18	7144	6392	6457	6392
q19	2678	989	571	571
q20	300	305	189	189
q21	2778	2295	1964	1964
q22	358	329	317	317
Total cold run time: 103941 ms
Total hot run time: 32583 ms

----- Round 2, with runtime_filter_mode=off -----
q1	6335	6191	6243	6191
q2	236	328	227	227
q3	2255	2617	2325	2325
q4	1421	1834	1362	1362
q5	4309	4779	4851	4779
q6	180	172	139	139
q7	2056	2035	1827	1827
q8	2675	2737	2634	2634
q9	7216	7145	7214	7145
q10	3020	3321	2874	2874
q11	575	524	495	495
q12	667	781	577	577
q13	3515	3828	3314	3314
q14	279	301	283	283
q15	569	496	504	496
q16	648	704	658	658
q17	1219	1725	1236	1236
q18	7705	7390	7089	7089
q19	821	1165	1044	1044
q20	1909	2000	1837	1837
q21	5553	5028	4939	4939
q22	590	615	592	592
Total cold run time: 53753 ms
Total hot run time: 52063 ms

@doris-robot
Copy link

TPC-DS: Total hot run time: 188004 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
TPC-DS sf100 test result on commit 3c0e077f93497bc80e02703c57bbb99579d606d0, data reload: false

query1	975	381	355	355
query2	6515	2358	2441	2358
query3	6708	209	207	207
query4	33325	24013	23300	23300
query5	4365	615	468	468
query6	282	198	184	184
query7	4629	504	306	306
query8	306	249	233	233
query9	9426	2688	2651	2651
query10	440	322	239	239
query11	18046	15294	15125	15125
query12	164	109	104	104
query13	1679	531	400	400
query14	10230	7266	7187	7187
query15	250	205	184	184
query16	8680	570	420	420
query17	1642	733	545	545
query18	2145	390	302	302
query19	215	169	148	148
query20	113	107	111	107
query21	209	123	101	101
query22	4141	4190	4074	4074
query23	34189	32617	33157	32617
query24	6299	2327	2303	2303
query25	497	448	378	378
query26	1167	266	161	161
query27	2022	455	329	329
query28	5088	2439	2403	2403
query29	749	538	448	448
query30	234	182	148	148
query31	961	881	796	796
query32	89	60	65	60
query33	513	342	282	282
query34	766	835	510	510
query35	802	804	708	708
query36	994	1025	930	930
query37	127	103	72	72
query38	4049	4100	3952	3952
query39	1454	1401	1427	1401
query40	205	112	98	98
query41	49	47	47	47
query42	120	102	102	102
query43	530	531	484	484
query44	1334	819	818	818
query45	181	169	155	155
query46	860	1016	642	642
query47	1810	1785	1787	1785
query48	392	403	306	306
query49	765	470	382	382
query50	620	670	383	383
query51	6922	6995	6949	6949
query52	103	104	95	95
query53	227	259	190	190
query54	474	491	405	405
query55	80	75	76	75
query56	266	264	244	244
query57	1146	1157	1075	1075
query58	233	222	257	222
query59	2962	3127	3076	3076
query60	264	262	245	245
query61	110	110	102	102
query62	824	764	718	718
query63	235	194	191	191
query64	4200	982	649	649
query65	3253	3186	3178	3178
query66	1048	432	307	307
query67	15814	15690	15472	15472
query68	8813	711	520	520
query69	459	295	245	245
query70	1196	1137	1114	1114
query71	422	305	266	266
query72	6142	3810	3847	3810
query73	684	762	369	369
query74	9974	8855	8682	8682
query75	4337	3145	2653	2653
query76	4089	1160	776	776
query77	874	369	271	271
query78	10221	9922	9362	9362
query79	3544	831	598	598
query80	703	517	444	444
query81	457	268	224	224
query82	648	153	122	122
query83	199	163	143	143
query84	329	93	77	77
query85	734	352	300	300
query86	349	317	294	294
query87	4619	4454	4274	4274
query88	4183	2230	2205	2205
query89	411	316	291	291
query90	1940	187	193	187
query91	138	134	106	106
query92	65	55	54	54
query93	1064	852	520	520
query94	655	397	292	292
query95	344	266	254	254
query96	485	596	281	281
query97	2903	2954	2827	2827
query98	226	205	202	202
query99	1635	1471	1350	1350
Total cold run time: 292354 ms
Total hot run time: 188004 ms

@doris-robot
Copy link

ClickBench: Total hot run time: 31.55 s
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/clickbench-tools
ClickBench test result on commit 3c0e077f93497bc80e02703c57bbb99579d606d0, data reload: false

query1	0.03	0.03	0.04
query2	0.07	0.04	0.03
query3	0.24	0.06	0.08
query4	1.61	0.10	0.10
query5	0.42	0.39	0.41
query6	1.14	0.65	0.65
query7	0.02	0.02	0.02
query8	0.04	0.03	0.03
query9	0.57	0.54	0.49
query10	0.56	0.57	0.55
query11	0.15	0.10	0.10
query12	0.14	0.11	0.11
query13	0.62	0.60	0.58
query14	2.72	2.75	2.84
query15	0.90	0.83	0.84
query16	0.38	0.39	0.40
query17	0.97	1.01	1.04
query18	0.22	0.21	0.20
query19	1.96	1.89	2.07
query20	0.01	0.01	0.01
query21	15.36	0.94	0.58
query22	0.74	0.81	0.73
query23	15.22	1.43	0.51
query24	2.91	1.16	1.33
query25	0.17	0.21	0.16
query26	0.24	0.14	0.14
query27	0.07	0.05	0.04
query28	13.57	1.54	1.05
query29	12.57	3.91	3.27
query30	0.27	0.09	0.07
query31	2.80	0.60	0.39
query32	3.22	0.54	0.46
query33	3.05	3.05	3.21
query34	16.70	5.08	4.45
query35	4.48	4.47	4.48
query36	0.73	0.49	0.49
query37	0.09	0.06	0.06
query38	0.05	0.03	0.03
query39	0.03	0.02	0.03
query40	0.17	0.14	0.13
query41	0.08	0.03	0.02
query42	0.04	0.02	0.02
query43	0.04	0.03	0.03
Total cold run time: 105.37 s
Total hot run time: 31.55 s

@doris-robot
Copy link

TeamCity be ut coverage result:
Function Coverage: 38.87% (10128/26056)
Line Coverage: 29.93% (85712/286337)
Region Coverage: 29.02% (43710/150640)
Branch Coverage: 25.55% (22306/87290)
Coverage Report: http://coverage.selectdb-in.cc/coverage/3c0e077f93497bc80e02703c57bbb99579d606d0_3c0e077f93497bc80e02703c57bbb99579d606d0/report/index.html

Copy link
Member

@mrhhsg mrhhsg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@yiguolei yiguolei merged commit 7d3d36e into apache:master Jan 9, 2025
23 of 26 checks passed
jacktengg added a commit that referenced this pull request Jan 13, 2025
### What problem does this PR solve?

Issue Number: close #xxx

Related PR: #xxx

Problem Summary:
When there are a lot of segments in one rowset, it will consume plenty
of memory if open all the segments all at once. This PR open segments
one by one and release the `Segment` object immediately if it's not need
to be kept for later use, thus reduce memory footprints dramatically.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by one committer. not-merge/2.1 not-merge/3.0 reviewed
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants