Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[fix](inverted index) Content Check for Tokenize Function Parser #44465

Merged
merged 1 commit into from
Nov 25, 2024

Conversation

zzzxl1993
Copy link
Contributor

@zzzxl1993 zzzxl1993 commented Nov 22, 2024

What problem does this PR solve?

Issue Number: close #xxx

Related PR: #xxx

Problem Summary:

  1. Prevent users from mistakenly assuming other tokenizers exist.

Release note

None

Check List (For Author)

  • Test

    • Regression test
    • Unit Test
    • Manual test (add detailed scripts or steps below)
    • No need to test or manual test. Explain why:
      • This is a refactor/code format and no logic has been changed.
      • Previous test can cover this change.
      • No code files have been changed.
      • Other reason
  • Behavior changed:

    • No.
    • Yes.
  • Does this need documentation?

    • No.
    • Yes.

Check List (For Reviewer who merge this PR)

  • Confirm the release note
  • Confirm test cases
  • Confirm document
  • Add branch pick label

@doris-robot
Copy link

Thank you for your contribution to Apache Doris.
Don't know what should be done next? See How to process your PR.

Please clearly describe your PR:

  1. What problem was fixed (it's best to include specific error reporting information). How it was fixed.
  2. Which behaviors were modified. What was the previous behavior, what is it now, why was it modified, and what possible impacts might there be.
  3. What features were added. Why was this function added?
  4. Which code was refactored and why was this part of the code refactored?
  5. Which functions were optimized and what is the difference before and after the optimization?

@zzzxl1993
Copy link
Contributor Author

run buildall

Copy link
Contributor

clang-tidy review says "All clean, LGTM! 👍"

@doris-robot
Copy link

TeamCity be ut coverage result:
Function Coverage: 38.31% (9980/26048)
Line Coverage: 29.43% (83520/283811)
Region Coverage: 28.58% (42978/150357)
Branch Coverage: 25.18% (21838/86744)
Coverage Report: http://coverage.selectdb-in.cc/coverage/6587818b6322347b523bb5b03b3d92798fb0b37c_6587818b6322347b523bb5b03b3d92798fb0b37c/report/index.html

@doris-robot
Copy link

TPC-H: Total hot run time: 39839 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit 6587818b6322347b523bb5b03b3d92798fb0b37c, data reload: false

------ Round 1 ----------------------------------
q1	17592	7402	7284	7284
q2	2047	178	173	173
q3	10676	1087	1172	1087
q4	10544	719	795	719
q5	7595	2709	2684	2684
q6	245	152	149	149
q7	977	613	599	599
q8	9250	1790	1933	1790
q9	6573	6451	6363	6363
q10	6992	2317	2322	2317
q11	459	268	257	257
q12	420	219	214	214
q13	17793	3050	3048	3048
q14	248	222	217	217
q15	575	549	511	511
q16	687	578	609	578
q17	981	495	570	495
q18	7262	6805	6686	6686
q19	1331	1049	969	969
q20	452	184	176	176
q21	4021	3213	3244	3213
q22	385	323	310	310
Total cold run time: 107105 ms
Total hot run time: 39839 ms

----- Round 2, with runtime_filter_mode=off -----
q1	7231	7211	7215	7211
q2	324	229	231	229
q3	2866	2827	2901	2827
q4	2065	1804	1812	1804
q5	5738	5684	5630	5630
q6	235	143	144	143
q7	2271	1843	1823	1823
q8	3439	3556	3498	3498
q9	8887	8931	8932	8931
q10	3601	3538	3519	3519
q11	598	496	520	496
q12	828	625	612	612
q13	10428	3263	3219	3219
q14	322	285	292	285
q15	568	517	527	517
q16	692	639	655	639
q17	1843	1636	1607	1607
q18	8330	7801	7493	7493
q19	1709	1521	1464	1464
q20	2100	1936	1877	1877
q21	5608	5571	5373	5373
q22	660	557	549	549
Total cold run time: 70343 ms
Total hot run time: 59746 ms

@doris-robot
Copy link

TPC-DS: Total hot run time: 197563 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
TPC-DS sf100 test result on commit 6587818b6322347b523bb5b03b3d92798fb0b37c, data reload: false

query1	1231	941	926	926
query2	6249	2213	2066	2066
query3	10937	4100	4098	4098
query4	67788	28993	23796	23796
query5	4933	465	438	438
query6	424	202	177	177
query7	5596	311	287	287
query8	307	222	212	212
query9	9373	2674	2652	2652
query10	472	246	246	246
query11	17435	15481	16004	15481
query12	160	107	105	105
query13	1520	440	444	440
query14	10592	7450	6859	6859
query15	212	183	204	183
query16	7264	465	460	460
query17	1196	598	607	598
query18	1873	319	317	317
query19	229	169	153	153
query20	120	114	117	114
query21	217	110	108	108
query22	4815	4547	4474	4474
query23	34964	34618	34340	34340
query24	5501	2506	2587	2506
query25	506	399	384	384
query26	637	147	150	147
query27	1833	289	290	289
query28	4394	2513	2480	2480
query29	671	427	419	419
query30	221	145	150	145
query31	1009	838	842	838
query32	86	56	56	56
query33	413	296	290	290
query34	944	512	519	512
query35	852	733	732	732
query36	1104	959	987	959
query37	122	74	75	74
query38	4555	4386	4488	4386
query39	1512	1486	1458	1458
query40	208	98	102	98
query41	44	41	43	41
query42	107	99	95	95
query43	557	515	521	515
query44	1208	856	856	856
query45	187	176	183	176
query46	1152	693	706	693
query47	2010	1961	1941	1941
query48	435	326	347	326
query49	750	397	436	397
query50	851	397	398	397
query51	7514	7354	7043	7043
query52	98	93	90	90
query53	250	178	176	176
query54	520	388	407	388
query55	75	73	78	73
query56	240	230	246	230
query57	1286	1152	1132	1132
query58	213	228	217	217
query59	3280	3063	3189	3063
query60	259	253	247	247
query61	117	104	109	104
query62	781	663	674	663
query63	208	188	189	188
query64	1345	658	682	658
query65	3320	3193	3219	3193
query66	696	297	302	297
query67	16143	15924	15823	15823
query68	4053	569	559	559
query69	416	250	261	250
query70	1134	1132	1136	1132
query71	347	241	260	241
query72	6397	4166	3975	3975
query73	767	359	364	359
query74	10226	9091	9086	9086
query75	3398	2688	2702	2688
query76	1883	1197	1172	1172
query77	496	268	278	268
query78	10579	9461	9419	9419
query79	1491	616	607	607
query80	877	430	428	428
query81	518	230	227	227
query82	1366	123	118	118
query83	173	147	150	147
query84	291	73	75	73
query85	873	305	299	299
query86	350	302	289	289
query87	4895	4597	4525	4525
query88	3805	2289	2221	2221
query89	423	293	291	291
query90	1994	186	198	186
query91	137	99	103	99
query92	66	54	50	50
query93	1914	543	548	543
query94	796	294	289	289
query95	358	243	259	243
query96	610	272	284	272
query97	2878	2685	2682	2682
query98	234	195	195	195
query99	1757	1305	1344	1305
Total cold run time: 321623 ms
Total hot run time: 197563 ms

@doris-robot
Copy link

ClickBench: Total hot run time: 32.76 s
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/clickbench-tools
ClickBench test result on commit 6587818b6322347b523bb5b03b3d92798fb0b37c, data reload: false

query1	0.03	0.04	0.03
query2	0.06	0.04	0.03
query3	0.24	0.08	0.07
query4	1.60	0.10	0.11
query5	0.43	0.39	0.41
query6	1.16	0.67	0.66
query7	0.02	0.01	0.02
query8	0.04	0.04	0.02
query9	0.57	0.53	0.49
query10	0.55	0.56	0.55
query11	0.14	0.10	0.10
query12	0.15	0.11	0.12
query13	0.62	0.63	0.59
query14	2.83	2.73	2.82
query15	0.92	0.83	0.83
query16	0.38	0.38	0.38
query17	1.07	1.06	1.08
query18	0.21	0.21	0.21
query19	2.00	1.87	1.95
query20	0.02	0.01	0.01
query21	15.36	0.60	0.56
query22	2.46	2.11	1.64
query23	16.84	1.05	0.84
query24	3.68	1.11	1.97
query25	0.26	0.11	0.13
query26	0.56	0.13	0.13
query27	0.03	0.04	0.05
query28	9.71	1.12	1.06
query29	12.59	3.33	3.30
query30	0.24	0.06	0.06
query31	2.85	0.38	0.39
query32	3.28	0.46	0.48
query33	2.98	3.00	3.07
query34	17.00	4.53	4.50
query35	4.58	4.54	4.60
query36	0.67	0.49	0.50
query37	0.09	0.06	0.06
query38	0.04	0.03	0.04
query39	0.03	0.02	0.02
query40	0.16	0.13	0.12
query41	0.08	0.03	0.02
query42	0.03	0.03	0.02
query43	0.04	0.03	0.03
Total cold run time: 106.6 s
Total hot run time: 32.76 s

Copy link
Member

@airborne12 airborne12 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@github-actions github-actions bot added the approved Indicates a PR has been approved by one committer. label Nov 25, 2024
Copy link
Contributor

PR approved by at least one committer and no changes requested.

Copy link
Contributor

PR approved by anyone and no changes requested.

Copy link
Contributor

@qidaye qidaye left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@airborne12 airborne12 merged commit 8b68d08 into apache:master Nov 25, 2024
32 of 35 checks passed
github-actions bot pushed a commit that referenced this pull request Nov 25, 2024
)

Problem Summary:
1. Prevent users from mistakenly assuming other tokenizers exist.
github-actions bot pushed a commit that referenced this pull request Nov 25, 2024
)

Problem Summary:
1. Prevent users from mistakenly assuming other tokenizers exist.
yiguolei pushed a commit that referenced this pull request Nov 26, 2024
)

Problem Summary:
1. Prevent users from mistakenly assuming other tokenizers exist.
airborne12 pushed a commit that referenced this pull request Nov 27, 2024
airborne12 pushed a commit that referenced this pull request Nov 27, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by one committer. dev/2.1.8-merged dev/3.0.4-merged reviewed
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants