Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[enhance](nereids) SqlParser support load data into temporary partition #45025

Merged
merged 2 commits into from
Dec 10, 2024

Conversation

zxealous
Copy link
Contributor

@zxealous zxealous commented Dec 5, 2024

Change-Id: Id977545450b5d71da5ae932bd0b52a5dfdda8600

What problem does this PR solve?

Issue Number: close #xxx

Related PR: #xxx

Problem Summary:
Currently, use DorisParser.g4 parse sql, and not support specified temporary partition while load data now.

MySQL [test]> LOAD LABEL db.label ( DATA INFILE("hdfs://hdfs:9000/user/partition/partition_type") INTO TABLE `tb` TEMPORARY PARTITION (partition_g)) WITH BROKER "ahdfs" ( "username" = "xxx",  "password" = "") PROPERTIES( "max_filter_ratio"="0.5" );
ERROR 1105 (HY000): errCode = 2, detailMessage =
mismatched input 'TEMPORARY' expecting {')', ','}(line 1, pos xxx)

This pr support specified temporary partition while load data, because load operation will backup to old optimizer, so only support specified temporary partition while load data can successfully load data into the temporary partition.

TODO:
1.nereids need support load data to temporary partition.
2.nereids need support export data from temporary partition.

Release note

None

Check List (For Author)

  • Test

    • Regression test
    • Unit Test
    • Manual test (add detailed scripts or steps below)
    • No need to test or manual test. Explain why:
      • This is a refactor/code format and no logic has been changed.
      • Previous test can cover this change.
      • No code files have been changed.
      • Other reason
  • Behavior changed:

    • No.
    • Yes.
  • Does this need documentation?

    • No.
    • Yes.

Check List (For Reviewer who merge this PR)

  • Confirm the release note
  • Confirm test cases
  • Confirm document
  • Add branch pick label

Change-Id: Id977545450b5d71da5ae932bd0b52a5dfdda8600
@doris-robot
Copy link

Thank you for your contribution to Apache Doris.
Don't know what should be done next? See How to process your PR.

Please clearly describe your PR:

  1. What problem was fixed (it's best to include specific error reporting information). How it was fixed.
  2. Which behaviors were modified. What was the previous behavior, what is it now, why was it modified, and what possible impacts might there be.
  3. What features were added. Why was this function added?
  4. Which code was refactored and why was this part of the code refactored?
  5. Which functions were optimized and what is the difference before and after the optimization?

Change-Id: I37e080786b56f5a12c7eac3a84e9301de66c6812
@zxealous zxealous force-pushed the support-load-tmp-partition branch from 9790fe8 to cdd856c Compare December 5, 2024 05:37
@zxealous
Copy link
Contributor Author

zxealous commented Dec 5, 2024

run buildall

@doris-robot
Copy link

TPC-H: Total hot run time: 39809 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit cdd856c21687966561fbbf399766e0d3741be04d, data reload: false

------ Round 1 ----------------------------------
q1	17603	7450	7209	7209
q2	2069	177	166	166
q3	10868	1061	1167	1061
q4	10553	724	830	724
q5	7588	2704	2612	2612
q6	234	148	149	148
q7	989	639	592	592
q8	9233	1838	1934	1838
q9	6659	6555	6554	6554
q10	7024	2353	2319	2319
q11	470	259	256	256
q12	423	226	223	223
q13	17782	3018	3064	3018
q14	240	228	206	206
q15	572	532	508	508
q16	675	595	575	575
q17	983	593	548	548
q18	7165	6745	6578	6578
q19	1337	1032	983	983
q20	465	185	195	185
q21	4044	3351	3187	3187
q22	370	319	320	319
Total cold run time: 107346 ms
Total hot run time: 39809 ms

----- Round 2, with runtime_filter_mode=off -----
q1	7219	7180	7206	7180
q2	317	234	224	224
q3	2865	2786	2939	2786
q4	2105	1832	1829	1829
q5	5657	5641	5621	5621
q6	221	147	143	143
q7	2224	1776	1835	1776
q8	3365	3534	3463	3463
q9	8949	9056	9043	9043
q10	3614	3582	3533	3533
q11	605	500	527	500
q12	828	647	649	647
q13	13309	3268	3190	3190
q14	307	278	278	278
q15	567	528	533	528
q16	703	646	651	646
q17	2015	1622	1617	1617
q18	8316	7751	7788	7751
q19	1645	1509	1567	1509
q20	2080	1854	1882	1854
q21	5714	5495	5457	5457
q22	619	559	598	559
Total cold run time: 73244 ms
Total hot run time: 60134 ms

@doris-robot
Copy link

TPC-DS: Total hot run time: 197805 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
TPC-DS sf100 test result on commit cdd856c21687966561fbbf399766e0d3741be04d, data reload: false

query1	1505	963	944	944
query2	6247	2073	2079	2073
query3	11021	4497	4329	4329
query4	67437	28668	23605	23605
query5	4908	474	473	473
query6	412	193	192	192
query7	5459	299	289	289
query8	313	231	231	231
query9	8517	2727	2709	2709
query10	424	252	258	252
query11	17037	15264	15965	15264
query12	158	104	102	102
query13	1419	445	442	442
query14	10473	7410	6899	6899
query15	227	196	189	189
query16	7026	474	541	474
query17	1079	585	590	585
query18	1109	313	317	313
query19	214	164	166	164
query20	123	111	116	111
query21	218	105	107	105
query22	4497	4543	4357	4357
query23	34844	34659	34473	34473
query24	5491	2578	2546	2546
query25	500	410	416	410
query26	660	156	157	156
query27	1915	308	294	294
query28	4192	2467	2498	2467
query29	669	434	412	412
query30	211	150	163	150
query31	1009	870	875	870
query32	66	58	60	58
query33	452	294	316	294
query34	931	507	561	507
query35	926	758	784	758
query36	1115	972	984	972
query37	128	72	78	72
query38	4552	4487	4267	4267
query39	1517	1492	1469	1469
query40	198	105	103	103
query41	47	41	43	41
query42	114	110	114	110
query43	531	501	501	501
query44	1176	833	824	824
query45	189	169	167	167
query46	1179	726	732	726
query47	1995	1960	1916	1916
query48	417	310	313	310
query49	723	403	387	387
query50	855	398	403	398
query51	7384	7245	7201	7201
query52	102	85	94	85
query53	269	181	182	181
query54	525	408	417	408
query55	79	73	76	73
query56	276	243	238	238
query57	1229	1124	1126	1124
query58	215	240	231	231
query59	3205	3061	3139	3061
query60	273	250	252	250
query61	108	108	109	108
query62	794	690	680	680
query63	205	191	183	183
query64	1424	670	644	644
query65	3280	3233	3247	3233
query66	646	297	303	297
query67	15862	15686	15741	15686
query68	3856	594	591	591
query69	424	263	252	252
query70	1233	1163	1096	1096
query71	370	254	248	248
query72	6382	4157	4061	4061
query73	767	360	372	360
query74	10206	9050	9022	9022
query75	3382	2648	2646	2646
query76	1995	1096	1143	1096
query77	512	280	284	280
query78	10359	9946	9575	9575
query79	1135	627	673	627
query80	822	448	439	439
query81	483	238	236	236
query82	923	122	119	119
query83	166	142	146	142
query84	275	69	69	69
query85	868	312	300	300
query86	342	280	302	280
query87	4696	4574	4492	4492
query88	3396	2224	2235	2224
query89	411	294	291	291
query90	2013	188	188	188
query91	135	103	100	100
query92	62	52	52	52
query93	1397	547	544	544
query94	781	305	306	305
query95	353	254	249	249
query96	614	275	281	275
query97	2834	2688	2714	2688
query98	221	240	191	191
query99	1835	1324	1337	1324
Total cold run time: 315549 ms
Total hot run time: 197805 ms

@doris-robot
Copy link

ClickBench: Total hot run time: 33.03 s
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/clickbench-tools
ClickBench test result on commit cdd856c21687966561fbbf399766e0d3741be04d, data reload: false

query1	0.03	0.03	0.03
query2	0.06	0.04	0.03
query3	0.23	0.08	0.07
query4	1.61	0.11	0.10
query5	0.44	0.40	0.42
query6	1.17	0.67	0.66
query7	0.02	0.02	0.01
query8	0.04	0.03	0.03
query9	0.56	0.51	0.49
query10	0.55	0.56	0.56
query11	0.13	0.10	0.11
query12	0.14	0.11	0.12
query13	0.60	0.60	0.60
query14	2.76	2.85	2.69
query15	0.91	0.82	0.82
query16	0.38	0.39	0.39
query17	1.03	1.02	1.01
query18	0.22	0.21	0.22
query19	1.99	1.87	2.00
query20	0.01	0.01	0.01
query21	15.37	0.58	0.57
query22	2.49	2.19	1.58
query23	17.33	0.83	0.78
query24	3.24	1.52	1.67
query25	0.26	0.17	0.14
query26	0.52	0.14	0.14
query27	0.05	0.04	0.04
query28	9.92	1.09	1.08
query29	12.54	3.29	3.27
query30	0.24	0.06	0.07
query31	2.85	0.38	0.38
query32	3.27	0.46	0.46
query33	3.00	3.01	3.02
query34	16.58	4.46	4.50
query35	4.62	4.57	4.58
query36	0.65	0.48	0.51
query37	0.09	0.07	0.06
query38	0.05	0.04	0.03
query39	0.03	0.02	0.03
query40	0.16	0.12	0.13
query41	0.08	0.02	0.02
query42	0.04	0.02	0.03
query43	0.03	0.03	0.03
Total cold run time: 106.29 s
Total hot run time: 33.03 s

@zxealous
Copy link
Contributor Author

zxealous commented Dec 5, 2024

run p0

@@ -927,7 +927,7 @@ identityOrFunction
dataDesc
: ((WITH)? mergeType)? DATA INFILE LEFT_PAREN filePaths+=STRING_LITERAL (COMMA filePath+=STRING_LITERAL)* RIGHT_PAREN
INTO TABLE targetTableName=identifier
(PARTITION partition=identifierList)?
(partitionSpec)?
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should also change from data in L945 ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

will change it later

@@ -927,7 +927,7 @@ identityOrFunction
dataDesc
: ((WITH)? mergeType)? DATA INFILE LEFT_PAREN filePaths+=STRING_LITERAL (COMMA filePath+=STRING_LITERAL)* RIGHT_PAREN
INTO TABLE targetTableName=identifier
(PARTITION partition=identifierList)?
(partitionSpec)?
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

there are two (PARTITION partition=identifierList)? ,please replace them both with (partitionSpec)? and continue the work

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

@zxealous
Copy link
Contributor Author

zxealous commented Dec 5, 2024

run buildall

1 similar comment
@zxealous
Copy link
Contributor Author

zxealous commented Dec 5, 2024

run buildall

@zxealous zxealous force-pushed the support-load-tmp-partition branch from 5c694a6 to cdd856c Compare December 6, 2024 08:20
@github-actions github-actions bot added the approved Indicates a PR has been approved by one committer. label Dec 6, 2024
Copy link
Contributor

github-actions bot commented Dec 6, 2024

PR approved by at least one committer and no changes requested.

Copy link
Contributor

github-actions bot commented Dec 6, 2024

PR approved by anyone and no changes requested.

@zxealous zxealous changed the title [enhance](nereids) Support load data into temporary partition [enhance](nereids) SqlParser support load data into temporary partition Dec 7, 2024
@morrySnow morrySnow merged commit 8a14454 into apache:master Dec 10, 2024
38 of 50 checks passed
github-actions bot pushed a commit that referenced this pull request Dec 10, 2024
…45025)

Problem Summary:
Currently, use DorisParser.g4 parse sql, and not support specified
temporary partition while load data now.

LOAD LABEL db.label ( DATA INFILE("hdfs://hdfs:9000/user/partition/partition_type") INTO TABLE `tb` TEMPORARY PARTITION (partition_g)) WITH BROKER "ahdfs" ( "username" = "xxx",  "password" = "") PROPERTIES( "max_filter_ratio"="0.5" )

ERROR 1105 (HY000): errCode = 2, detailMessage =
mismatched input 'TEMPORARY' expecting {')', ','}(line 1, pos xxx)

This pr support specified temporary partition while load data, because
load operation will backup to old optimizer, so only support specified
temporary partition while load data can successfully load data into the
temporary partition.

TODO:
1.nereids need support load data to temporary partition.
2.nereids need support export data from temporary partition.
github-actions bot pushed a commit that referenced this pull request Dec 10, 2024
…45025)

Problem Summary:
Currently, use DorisParser.g4 parse sql, and not support specified
temporary partition while load data now.

LOAD LABEL db.label ( DATA INFILE("hdfs://hdfs:9000/user/partition/partition_type") INTO TABLE `tb` TEMPORARY PARTITION (partition_g)) WITH BROKER "ahdfs" ( "username" = "xxx",  "password" = "") PROPERTIES( "max_filter_ratio"="0.5" )

ERROR 1105 (HY000): errCode = 2, detailMessage =
mismatched input 'TEMPORARY' expecting {')', ','}(line 1, pos xxx)

This pr support specified temporary partition while load data, because
load operation will backup to old optimizer, so only support specified
temporary partition while load data can successfully load data into the
temporary partition.

TODO:
1.nereids need support load data to temporary partition.
2.nereids need support export data from temporary partition.
yiguolei pushed a commit that referenced this pull request Dec 11, 2024
dataroaring pushed a commit that referenced this pull request Dec 19, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by one committer. dev/2.1.8-merged dev/3.0.4-merged reviewed
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants