Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[feat](mtmv)Unified external table interface supporting partition refresh and partition pruning #44673

Merged
merged 2 commits into from
Nov 28, 2024

Conversation

zddr
Copy link
Contributor

@zddr zddr commented Nov 27, 2024

What problem does this PR solve?

  • Add MvccTable to represent a table that supports querying specified version data
  • Add the MvccSnapshot interface to store snapshot information of mvcc at a certain moment in time
  • Add the MvccSnapshot parameter to the method of the MTMVRelatedTableIf interface to retrieve data of a specified version
  • Partition pruning related methods combined with the MvccSnapshot parameter are used to obtain partition information for a specified version
  • Load the snapshot information of mvccTable at the beginning of the query plan and store it in StatementContext

Issue Number: close #xxx

Related PR: #xxx

Problem Summary:
Unified external table interface supporting partition refresh and partition pruning

Release note

Unified external table interface supporting partition refresh and partition pruning

Check List (For Author)

  • Test

    • Regression test
    • Unit Test
    • Manual test (add detailed scripts or steps below)
    • No need to test or manual test. Explain why:
      • This is a refactor/code format and no logic has been changed.
      • Previous test can cover this change.
      • No code files have been changed.
      • Other reason
  • Behavior changed:

    • No.
    • Yes.
  • Does this need documentation?

    • No.
    • Yes.

Check List (For Reviewer who merge this PR)

  • Confirm the release note
  • Confirm test cases
  • Confirm document
  • Add branch pick label

@doris-robot
Copy link

Thank you for your contribution to Apache Doris.
Don't know what should be done next? See How to process your PR.

Please clearly describe your PR:

  1. What problem was fixed (it's best to include specific error reporting information). How it was fixed.
  2. Which behaviors were modified. What was the previous behavior, what is it now, why was it modified, and what possible impacts might there be.
  3. What features were added. Why was this function added?
  4. Which code was refactored and why was this part of the code refactored?
  5. Which functions were optimized and what is the difference before and after the optimization?

@zddr
Copy link
Contributor Author

zddr commented Nov 27, 2024

run buildall

@doris-robot
Copy link

TPC-H: Total hot run time: 40130 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit f49ea8361eb7a2c02c9a153f093aa1a6243bd0dd, data reload: false

------ Round 1 ----------------------------------
q1	17619	7439	7328	7328
q2	2053	179	169	169
q3	10595	1103	1207	1103
q4	10558	781	759	759
q5	7630	2725	2736	2725
q6	238	147	147	147
q7	984	640	606	606
q8	9245	1914	1978	1914
q9	6511	6350	6421	6350
q10	6957	2283	2319	2283
q11	460	262	262	262
q12	427	218	220	218
q13	17806	3038	3055	3038
q14	250	210	232	210
q15	560	526	523	523
q16	659	577	586	577
q17	969	644	532	532
q18	7271	6661	6787	6661
q19	1334	941	1044	941
q20	485	181	184	181
q21	3964	3321	3291	3291
q22	384	312	321	312
Total cold run time: 106959 ms
Total hot run time: 40130 ms

----- Round 2, with runtime_filter_mode=off -----
q1	7260	7278	7275	7275
q2	328	230	225	225
q3	2854	2858	2976	2858
q4	2091	1863	1859	1859
q5	5704	5681	5687	5681
q6	227	145	144	144
q7	2245	1862	1821	1821
q8	3441	3602	3529	3529
q9	8901	8921	8855	8855
q10	3601	3595	3534	3534
q11	602	525	514	514
q12	811	627	612	612
q13	14155	3187	3241	3187
q14	304	275	269	269
q15	579	526	522	522
q16	712	668	644	644
q17	1873	1676	1651	1651
q18	8228	7771	7511	7511
q19	1691	1420	1581	1420
q20	2117	1851	1907	1851
q21	5498	5267	5363	5267
q22	633	583	574	574
Total cold run time: 73855 ms
Total hot run time: 59803 ms

@zddr
Copy link
Contributor Author

zddr commented Nov 27, 2024

run cloud_p0

@zddr
Copy link
Contributor Author

zddr commented Nov 27, 2024

run external

@doris-robot
Copy link

TPC-DS: Total hot run time: 198108 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
TPC-DS sf100 test result on commit f49ea8361eb7a2c02c9a153f093aa1a6243bd0dd, data reload: false

query1	1267	931	954	931
query2	6226	2103	2046	2046
query3	10928	3953	4068	3953
query4	67456	29202	23745	23745
query5	5010	473	476	473
query6	430	200	187	187
query7	5626	305	297	297
query8	309	215	212	212
query9	9150	2748	2747	2747
query10	443	255	253	253
query11	17349	15282	15875	15282
query12	156	104	101	101
query13	1532	397	449	397
query14	9916	7915	7827	7827
query15	214	186	197	186
query16	7249	441	473	441
query17	1259	573	592	573
query18	1884	302	294	294
query19	200	157	192	157
query20	119	108	107	107
query21	210	102	99	99
query22	4710	4563	4501	4501
query23	34975	34564	34473	34473
query24	5406	2450	2507	2450
query25	488	406	427	406
query26	646	157	158	157
query27	1817	290	302	290
query28	4497	2512	2508	2508
query29	689	429	398	398
query30	225	149	150	149
query31	1020	795	876	795
query32	69	56	59	56
query33	417	297	287	287
query34	973	510	544	510
query35	881	759	763	759
query36	1107	955	976	955
query37	134	79	71	71
query38	4547	4553	4317	4317
query39	1509	1499	1464	1464
query40	214	103	103	103
query41	51	48	48	48
query42	113	105	101	101
query43	533	508	501	501
query44	1193	849	853	849
query45	195	174	168	168
query46	1157	733	702	702
query47	2053	1925	1953	1925
query48	441	332	366	332
query49	743	407	417	407
query50	851	405	414	405
query51	7410	7231	7199	7199
query52	91	89	89	89
query53	250	180	179	179
query54	513	388	390	388
query55	79	74	79	74
query56	249	257	233	233
query57	1322	1179	1176	1176
query58	221	211	213	211
query59	3194	3068	3024	3024
query60	275	254	247	247
query61	111	111	107	107
query62	803	659	672	659
query63	211	185	190	185
query64	1380	665	633	633
query65	3315	3196	3233	3196
query66	756	288	308	288
query67	16110	15913	15676	15676
query68	3859	564	570	564
query69	420	249	260	249
query70	1214	1095	1159	1095
query71	350	249	256	249
query72	6415	4048	3971	3971
query73	779	419	375	375
query74	10243	9109	9051	9051
query75	3412	2742	2640	2640
query76	1922	1136	1090	1090
query77	474	271	271	271
query78	10489	9441	9417	9417
query79	1535	596	602	596
query80	849	428	432	428
query81	515	241	233	233
query82	1337	114	117	114
query83	268	145	154	145
query84	288	73	71	71
query85	912	328	300	300
query86	343	306	301	301
query87	4756	4643	4628	4628
query88	3546	2228	2202	2202
query89	415	306	290	290
query90	2001	184	218	184
query91	129	104	99	99
query92	68	51	49	49
query93	1933	535	539	535
query94	778	302	297	297
query95	352	256	248	248
query96	613	287	277	277
query97	2852	2666	2647	2647
query98	216	193	192	192
query99	1606	1341	1347	1341
Total cold run time: 319766 ms
Total hot run time: 198108 ms

@doris-robot
Copy link

ClickBench: Total hot run time: 31.63 s
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/clickbench-tools
ClickBench test result on commit f49ea8361eb7a2c02c9a153f093aa1a6243bd0dd, data reload: false

query1	0.03	0.03	0.03
query2	0.06	0.04	0.04
query3	0.23	0.07	0.07
query4	1.63	0.10	0.10
query5	0.45	0.41	0.41
query6	1.13	0.66	0.66
query7	0.02	0.02	0.02
query8	0.04	0.04	0.03
query9	0.56	0.52	0.49
query10	0.54	0.54	0.56
query11	0.14	0.11	0.10
query12	0.14	0.11	0.12
query13	0.62	0.60	0.60
query14	2.76	2.84	2.76
query15	0.90	0.82	0.82
query16	0.39	0.38	0.38
query17	1.06	1.03	1.05
query18	0.21	0.20	0.20
query19	1.99	1.85	2.05
query20	0.01	0.01	0.01
query21	15.36	0.60	0.60
query22	2.80	1.67	1.77
query23	16.93	0.94	0.77
query24	3.18	1.11	0.14
query25	0.14	0.06	0.04
query26	0.48	0.14	0.14
query27	0.05	0.04	0.05
query28	11.59	1.08	1.07
query29	12.56	3.26	3.28
query30	0.25	0.06	0.06
query31	2.86	0.39	0.39
query32	3.27	0.46	0.46
query33	3.01	3.02	3.02
query34	17.14	4.50	4.46
query35	4.45	4.48	4.47
query36	0.67	0.49	0.48
query37	0.09	0.06	0.07
query38	0.05	0.04	0.04
query39	0.03	0.02	0.02
query40	0.17	0.13	0.12
query41	0.08	0.03	0.02
query42	0.03	0.02	0.02
query43	0.04	0.03	0.04
Total cold run time: 108.14 s
Total hot run time: 31.63 s

@zddr
Copy link
Contributor Author

zddr commented Nov 27, 2024

run cloud_p0

@morrySnow
Copy link
Contributor

run cloud_p0

morrySnow
morrySnow previously approved these changes Nov 27, 2024
@github-actions github-actions bot added the approved Indicates a PR has been approved by one committer. label Nov 27, 2024
Copy link
Contributor

PR approved by at least one committer and no changes requested.

Copy link
Contributor

PR approved by anyone and no changes requested.

@github-actions github-actions bot removed the approved Indicates a PR has been approved by one committer. label Nov 27, 2024
@zddr
Copy link
Contributor Author

zddr commented Nov 27, 2024

run buildall

@doris-robot
Copy link

TPC-H: Total hot run time: 39631 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit 274ea59c44cdac4398aec3dd624041832f5a6346, data reload: false

------ Round 1 ----------------------------------
q1	17588	7458	7307	7307
q2	2045	178	162	162
q3	10554	1141	1194	1141
q4	10229	725	751	725
q5	7605	2657	2670	2657
q6	240	153	154	153
q7	988	617	600	600
q8	9227	1828	1912	1828
q9	6524	6359	6383	6359
q10	6933	2276	2274	2274
q11	461	259	263	259
q12	401	214	211	211
q13	17765	3011	3008	3008
q14	241	221	210	210
q15	567	526	521	521
q16	623	566	582	566
q17	969	671	616	616
q18	7331	6498	6664	6498
q19	1351	989	939	939
q20	481	181	185	181
q21	3920	3105	3280	3105
q22	372	311	320	311
Total cold run time: 106415 ms
Total hot run time: 39631 ms

----- Round 2, with runtime_filter_mode=off -----
q1	7305	7269	7254	7254
q2	319	228	228	228
q3	2894	2746	2886	2746
q4	2117	1838	1766	1766
q5	5537	5686	5616	5616
q6	223	138	139	138
q7	2221	1797	1786	1786
q8	3355	3506	3478	3478
q9	8821	8829	8790	8790
q10	3613	3536	3589	3536
q11	591	511	512	511
q12	817	608	621	608
q13	13818	3199	3212	3199
q14	297	267	262	262
q15	568	535	528	528
q16	667	634	616	616
q17	1820	1649	1592	1592
q18	7796	7376	7390	7376
q19	1655	1544	1551	1544
q20	2054	1830	1795	1795
q21	5315	5171	5293	5171
q22	647	552	591	552
Total cold run time: 72450 ms
Total hot run time: 59092 ms

@doris-robot
Copy link

TPC-DS: Total hot run time: 191171 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
TPC-DS sf100 test result on commit 274ea59c44cdac4398aec3dd624041832f5a6346, data reload: false

query1	976	395	378	378
query2	6513	2154	2057	2057
query3	6714	216	210	210
query4	33997	23588	23650	23588
query5	4295	455	448	448
query6	293	191	193	191
query7	4625	291	300	291
query8	289	228	231	228
query9	9434	2716	2707	2707
query10	474	251	268	251
query11	18033	15203	15116	15116
query12	152	102	102	102
query13	1650	417	450	417
query14	9657	7648	7540	7540
query15	295	189	180	180
query16	8064	448	471	448
query17	1846	583	564	564
query18	2107	294	304	294
query19	370	191	142	142
query20	116	111	108	108
query21	210	100	107	100
query22	4616	4135	4188	4135
query23	35134	34040	34044	34040
query24	11424	2447	2468	2447
query25	669	374	373	373
query26	1826	147	146	146
query27	2783	274	278	274
query28	7943	2424	2425	2424
query29	1019	406	404	404
query30	295	149	148	148
query31	1040	801	821	801
query32	92	56	83	56
query33	786	280	278	278
query34	1041	507	526	507
query35	889	728	720	720
query36	1128	951	929	929
query37	265	73	75	73
query38	4398	4176	4166	4166
query39	1482	1424	1435	1424
query40	280	98	97	97
query41	48	43	42	42
query42	108	101	99	99
query43	542	494	501	494
query44	1261	797	812	797
query45	189	167	165	165
query46	1124	684	682	682
query47	1947	1855	1861	1855
query48	408	310	328	310
query49	1289	373	389	373
query50	800	389	379	379
query51	7367	7068	7151	7068
query52	98	88	88	88
query53	254	175	179	175
query54	1223	391	410	391
query55	88	79	78	78
query56	254	241	249	241
query57	1292	1116	1142	1116
query58	236	207	221	207
query59	3299	2988	2966	2966
query60	278	250	262	250
query61	141	113	109	109
query62	870	669	684	669
query63	206	192	189	189
query64	5130	658	626	626
query65	3306	3184	3214	3184
query66	1454	319	358	319
query67	16118	15632	15630	15630
query68	4510	565	553	553
query69	404	250	261	250
query70	1225	1128	1098	1098
query71	335	243	252	243
query72	6379	4062	4172	4062
query73	770	354	358	354
query74	9482	8998	9124	8998
query75	3474	2654	2643	2643
query76	2676	1064	1111	1064
query77	499	269	283	269
query78	10280	9415	9315	9315
query79	2331	593	605	593
query80	1211	426	431	426
query81	549	229	224	224
query82	950	117	118	117
query83	245	153	175	153
query84	243	71	70	70
query85	1482	299	295	295
query86	442	298	302	298
query87	4770	4546	4594	4546
query88	3881	2205	2174	2174
query89	409	290	298	290
query90	2108	186	182	182
query91	131	107	104	104
query92	65	53	51	51
query93	1756	542	532	532
query94	1129	289	290	289
query95	357	248	251	248
query96	628	276	275	275
query97	2877	2655	2694	2655
query98	212	199	190	190
query99	1679	1339	1306	1306
Total cold run time: 304039 ms
Total hot run time: 191171 ms

@doris-robot
Copy link

ClickBench: Total hot run time: 32.47 s
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/clickbench-tools
ClickBench test result on commit 274ea59c44cdac4398aec3dd624041832f5a6346, data reload: false

query1	0.04	0.03	0.02
query2	0.07	0.03	0.03
query3	0.23	0.08	0.07
query4	1.59	0.10	0.10
query5	0.42	0.42	0.42
query6	1.16	0.65	0.65
query7	0.02	0.01	0.02
query8	0.04	0.02	0.03
query9	0.59	0.52	0.51
query10	0.56	0.56	0.58
query11	0.15	0.10	0.10
query12	0.14	0.11	0.11
query13	0.62	0.61	0.61
query14	2.72	2.75	2.87
query15	0.91	0.84	0.82
query16	0.38	0.37	0.38
query17	1.07	1.05	1.00
query18	0.22	0.20	0.20
query19	1.85	1.88	1.97
query20	0.01	0.01	0.01
query21	15.35	0.58	0.58
query22	2.76	1.94	1.97
query23	17.07	0.93	0.75
query24	2.94	0.54	1.87
query25	0.21	0.24	0.16
query26	0.41	0.14	0.15
query27	0.04	0.04	0.05
query28	10.34	1.11	1.07
query29	12.55	3.31	3.29
query30	0.25	0.07	0.06
query31	2.86	0.39	0.38
query32	3.27	0.47	0.47
query33	3.01	2.98	3.04
query34	17.12	4.52	4.52
query35	4.54	4.53	4.54
query36	0.68	0.48	0.50
query37	0.10	0.06	0.07
query38	0.04	0.04	0.03
query39	0.03	0.03	0.02
query40	0.16	0.12	0.12
query41	0.07	0.02	0.02
query42	0.04	0.02	0.02
query43	0.03	0.03	0.03
Total cold run time: 106.66 s
Total hot run time: 32.47 s

Copy link
Contributor

@Jibing-Li Jibing-Li left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@github-actions github-actions bot added the approved Indicates a PR has been approved by one committer. label Nov 27, 2024
Copy link
Contributor

PR approved by at least one committer and no changes requested.

@morrySnow morrySnow merged commit 913cda6 into apache:master Nov 28, 2024
25 of 26 checks passed
zddr added a commit to zddr/incubator-doris that referenced this pull request Nov 29, 2024
…resh and partition pruning (apache#44673)

- Add `MvccTable` to represent a table that supports querying specified
version data
- Add the `MvccSnapshot` interface to store snapshot information of mvcc
at a certain moment in time
- Add the `MvccSnapshot` parameter to the method of the
`MTMVRelatedTableIf `interface to retrieve data of a specified version
- Partition pruning related methods combined with the `MvccSnapshot`
parameter are used to obtain partition information for a specified
version
- Load the snapshot information of mvccTable at the beginning of the
query plan and store it in StatementContext

Unified external table interface supporting partition refresh and
partition pruning
zddr added a commit to zddr/incubator-doris that referenced this pull request Dec 2, 2024
…resh and partition pruning (apache#44673)

- Add `MvccTable` to represent a table that supports querying specified
version data
- Add the `MvccSnapshot` interface to store snapshot information of mvcc
at a certain moment in time
- Add the `MvccSnapshot` parameter to the method of the
`MTMVRelatedTableIf `interface to retrieve data of a specified version
- Partition pruning related methods combined with the `MvccSnapshot`
parameter are used to obtain partition information for a specified
version
- Load the snapshot information of mvccTable at the beginning of the
query plan and store it in StatementContext

Unified external table interface supporting partition refresh and
partition pruning
morrySnow pushed a commit that referenced this pull request Dec 11, 2024
In the previous PR, a snapshot of the table was obtained and stored in
the statementContext at the beginning of the query.
The modification of this PR is to ensure that the same metadata is used
during the query process. When calling the relevant interface, snapshot
needs to be obtained from statementContext as a parameter and passed to
the relevant method

Related PR: #44911 #44673
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by one committer. dev/2.1.8-merged dev/3.0.4-merged reviewed
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants