Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Opt](multi-catalog)Disable dict filter in parquet/orc reader if have non-single conjuncts. #44777

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

kaka11chen
Copy link
Contributor

@kaka11chen kaka11chen commented Nov 29, 2024

What problem does this PR solve?

Related PR: #26386

Problem Summary:

Because of #26386, we split the conjunctions into single_slot_filter_conjuncts and non_single_slot_filter_conjuncts, where you can use the final dictionary filtering and delayed materialization of the single_slot_filter_conjuncts part to perform non_single_slot_filter_conjuncts. However, this results in fewer conditions for late materialization, so the effect will be poor.

Release note

At present, it is a matter of how many conditions there are for late materialization and whether to perform dictionary filtering on multiple columns. Because late materialization is more important, we use when there are non_single_slot_filter_conjuncts in the filter, directly without dictionary filtering, and use the conjuncts to do late materialization.

None

Check List (For Author)

  • Test

    • Regression test
    • Unit Test
    • Manual test (add detailed scripts or steps below)
    • No need to test or manual test. Explain why:
      • This is a refactor/code format and no logic has been changed.
      • Previous test can cover this change.
      • No code files have been changed.
      • Other reason
  • Behavior changed:

    • No.
    • Yes.
  • Does this need documentation?

    • No.
    • Yes.

Check List (For Reviewer who merge this PR)

  • Confirm the release note
  • Confirm test cases
  • Confirm document
  • Add branch pick label

@doris-robot
Copy link

Thank you for your contribution to Apache Doris.
Don't know what should be done next? See How to process your PR.

Please clearly describe your PR:

  1. What problem was fixed (it's best to include specific error reporting information). How it was fixed.
  2. Which behaviors were modified. What was the previous behavior, what is it now, why was it modified, and what possible impacts might there be.
  3. What features were added. Why was this function added?
  4. Which code was refactored and why was this part of the code refactored?
  5. Which functions were optimized and what is the difference before and after the optimization?

@kaka11chen
Copy link
Contributor Author

run buildall

@kaka11chen kaka11chen force-pushed the disable_dict_filter_when_not_single_conjuncts branch from 6b0d1b8 to 1f8ccb3 Compare November 29, 2024 09:29
@kaka11chen
Copy link
Contributor Author

run buildall

@kaka11chen kaka11chen force-pushed the disable_dict_filter_when_not_single_conjuncts branch from 1f8ccb3 to 3c6b546 Compare November 29, 2024 09:44
@kaka11chen
Copy link
Contributor Author

run buildall

@doris-robot
Copy link

TeamCity be ut coverage result:
Function Coverage: 38.33% (9976/26028)
Line Coverage: 29.42% (83544/284001)
Region Coverage: 28.53% (42965/150574)
Branch Coverage: 25.15% (21823/86760)
Coverage Report: http://coverage.selectdb-in.cc/coverage/3c6b546b8735b1c074b62e2c756ba6b0aa483ae0_3c6b546b8735b1c074b62e2c756ba6b0aa483ae0/report/index.html

@doris-robot
Copy link

TPC-H: Total hot run time: 40268 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit 3c6b546b8735b1c074b62e2c756ba6b0aa483ae0, data reload: false

------ Round 1 ----------------------------------
q1	17777	7581	7322	7322
q2	2050	185	175	175
q3	10639	1057	1186	1057
q4	10356	780	760	760
q5	7600	2720	2736	2720
q6	244	150	145	145
q7	979	634	596	596
q8	9264	1885	1896	1885
q9	6685	6560	6531	6531
q10	6985	2302	2324	2302
q11	480	254	255	254
q12	414	225	225	225
q13	17773	3034	3064	3034
q14	242	210	217	210
q15	579	544	507	507
q16	662	587	583	583
q17	997	580	555	555
q18	7521	6745	6726	6726
q19	1326	1027	963	963
q20	467	180	182	180
q21	4052	3291	3222	3222
q22	385	319	316	316
Total cold run time: 107477 ms
Total hot run time: 40268 ms

----- Round 2, with runtime_filter_mode=off -----
q1	7309	7307	7290	7290
q2	324	236	231	231
q3	2915	2877	2975	2877
q4	2095	1851	1891	1851
q5	5647	5692	5675	5675
q6	225	141	141	141
q7	2257	1854	1788	1788
q8	3405	3486	3564	3486
q9	8983	9086	9113	9086
q10	3601	3613	3549	3549
q11	597	495	498	495
q12	830	618	605	605
q13	12269	3329	3281	3281
q14	300	267	270	267
q15	601	530	535	530
q16	691	655	649	649
q17	1878	1600	1602	1600
q18	8415	7947	7663	7663
q19	1721	1609	1531	1531
q20	2150	1875	1865	1865
q21	5712	5609	5482	5482
q22	638	588	604	588
Total cold run time: 72563 ms
Total hot run time: 60530 ms

@doris-robot
Copy link

TPC-DS: Total hot run time: 197585 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
TPC-DS sf100 test result on commit 3c6b546b8735b1c074b62e2c756ba6b0aa483ae0, data reload: false

query1	1243	922	926	922
query2	6248	2067	2114	2067
query3	11115	4380	4373	4373
query4	66903	28992	23545	23545
query5	4997	460	466	460
query6	417	189	183	183
query7	5620	300	293	293
query8	324	226	239	226
query9	8814	2722	2733	2722
query10	454	254	245	245
query11	17319	15346	15918	15346
query12	155	108	109	108
query13	1633	433	433	433
query14	11119	7533	7543	7533
query15	214	203	194	194
query16	7359	466	476	466
query17	1045	567	568	567
query18	1850	292	297	292
query19	190	146	178	146
query20	118	119	111	111
query21	198	100	107	100
query22	4774	4550	4502	4502
query23	35405	34374	34828	34374
query24	5594	2542	2506	2506
query25	488	397	387	387
query26	632	154	150	150
query27	1824	284	283	283
query28	4474	2489	2510	2489
query29	688	444	404	404
query30	210	157	162	157
query31	973	834	841	834
query32	64	56	55	55
query33	414	315	290	290
query34	938	516	532	516
query35	895	778	756	756
query36	1111	955	948	948
query37	118	80	71	71
query38	4481	4435	4389	4389
query39	1521	1472	1466	1466
query40	202	101	102	101
query41	45	43	47	43
query42	124	96	100	96
query43	538	516	490	490
query44	1161	826	830	826
query45	188	173	173	173
query46	1144	728	751	728
query47	2016	1945	1948	1945
query48	415	325	327	325
query49	719	403	398	398
query50	839	397	397	397
query51	7445	7143	7101	7101
query52	99	87	85	85
query53	256	183	173	173
query54	505	398	389	389
query55	78	74	77	74
query56	254	230	229	229
query57	1262	1138	1125	1125
query58	214	209	208	208
query59	3380	3210	3063	3063
query60	270	244	243	243
query61	112	105	103	103
query62	784	652	685	652
query63	207	191	186	186
query64	1364	700	643	643
query65	3279	3266	3169	3169
query66	624	306	297	297
query67	15866	15709	15520	15520
query68	4141	546	547	546
query69	409	248	246	246
query70	1159	1129	1060	1060
query71	318	245	241	241
query72	6383	4033	4001	4001
query73	766	435	357	357
query74	10096	9173	9050	9050
query75	3375	2646	2687	2646
query76	1855	1064	1053	1053
query77	456	270	273	270
query78	10334	9406	9398	9398
query79	2284	601	617	601
query80	1426	415	435	415
query81	510	235	224	224
query82	1280	117	111	111
query83	219	139	142	139
query84	279	68	71	68
query85	1005	296	311	296
query86	429	311	312	311
query87	4702	4569	4585	4569
query88	3984	2231	2179	2179
query89	413	287	289	287
query90	1866	183	182	182
query91	147	108	104	104
query92	68	50	50	50
query93	3239	541	530	530
query94	765	294	302	294
query95	341	245	240	240
query96	630	275	280	275
query97	2851	2632	2654	2632
query98	215	191	191	191
query99	1601	1348	1328	1328
Total cold run time: 322875 ms
Total hot run time: 197585 ms

@doris-robot
Copy link

ClickBench: Total hot run time: 32.7 s
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/clickbench-tools
ClickBench test result on commit 3c6b546b8735b1c074b62e2c756ba6b0aa483ae0, data reload: false

query1	0.04	0.03	0.02
query2	0.06	0.03	0.03
query3	0.24	0.07	0.07
query4	1.63	0.10	0.10
query5	0.43	0.43	0.40
query6	1.18	0.66	0.66
query7	0.02	0.02	0.01
query8	0.03	0.03	0.03
query9	0.58	0.51	0.51
query10	0.55	0.55	0.57
query11	0.15	0.10	0.12
query12	0.13	0.12	0.11
query13	0.61	0.60	0.59
query14	2.83	2.74	2.84
query15	0.91	0.82	0.82
query16	0.39	0.39	0.38
query17	1.04	1.06	1.02
query18	0.23	0.22	0.21
query19	1.89	1.85	2.06
query20	0.02	0.01	0.01
query21	15.36	0.60	0.59
query22	2.73	2.25	1.40
query23	16.95	1.06	0.89
query24	2.99	2.09	1.48
query25	0.26	0.22	0.06
query26	0.56	0.14	0.14
query27	0.05	0.04	0.05
query28	9.50	1.10	1.07
query29	12.56	3.20	3.17
query30	0.26	0.06	0.06
query31	2.89	0.37	0.38
query32	3.28	0.47	0.47
query33	2.98	2.97	3.12
query34	17.02	4.47	4.49
query35	4.52	4.58	4.52
query36	0.68	0.48	0.48
query37	0.09	0.06	0.06
query38	0.05	0.04	0.04
query39	0.04	0.03	0.02
query40	0.16	0.12	0.12
query41	0.08	0.02	0.02
query42	0.03	0.03	0.02
query43	0.04	0.03	0.03
Total cold run time: 106.04 s
Total hot run time: 32.7 s

@kaka11chen kaka11chen marked this pull request as ready for review December 19, 2024 08:29
@kaka11chen kaka11chen marked this pull request as draft December 19, 2024 08:32
@kaka11chen kaka11chen marked this pull request as ready for review December 30, 2024 09:08
@kaka11chen
Copy link
Contributor Author

run cloud_p0

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants