-
Notifications
You must be signed in to change notification settings - Fork 3.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[enhance](orc) Optimize ORC Predicate Pushdown for OR-connected Predicate #43255
Conversation
Thank you for your contribution to Apache Doris. Since 2024-03-18, the Document has been moved to doris-website. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
clang-tidy made some suggestions
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
clang-tidy made some suggestions
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
clang-tidy made some suggestions
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
clang-tidy made some suggestions
run buildall |
TeamCity be ut coverage result: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
clang-tidy made some suggestions
run buildall |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
clang-tidy made some suggestions
run buildall |
921f87f
to
5056b99
Compare
run buildall |
TeamCity be ut coverage result: |
run external |
1 similar comment
run external |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
clang-tidy made some suggestions
run external |
run buildall |
TeamCity be ut coverage result: |
bool OrcReader::_init_search_argument( | ||
std::unordered_map<std::string, ColumnValueRangeType>* colname_to_value_range) { | ||
if ((!_enable_filter_by_min_max) || colname_to_value_range->empty()) { | ||
bool OrcReader::_init_search_argument(const VExprContextSPtrs& conjuncts) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For a certain query, the conjuncts is identical in all orc readers.
So I think we can just init search argument once and use the result for every orc reader?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it is possible to cache searchArgument into member variables of OrcReader
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
clang-tidy made some suggestions
run external |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
clang-tidy made some suggestions
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
clang-tidy made some suggestions
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
clang-tidy made some suggestions
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
clang-tidy made some suggestions
PR approved by at least one committer and no changes requested. |
PR approved by anyone and no changes requested. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
…cate (#43255) ### What problem does this PR solve? Problem Summary: This issue addresses a limitation in Apache Doris where only predicates joined by AND are pushed down to the ORC reader, leaving OR-connected predicates unoptimized. By extending pushdown functionality to handle these OR conditions, the aim is to better leverage ORC’s predicate pushdown capabilities, reducing data reads and improving query performance.
…cate (#43255) ### What problem does this PR solve? Problem Summary: This issue addresses a limitation in Apache Doris where only predicates joined by AND are pushed down to the ORC reader, leaving OR-connected predicates unoptimized. By extending pushdown functionality to handle these OR conditions, the aim is to better leverage ORC’s predicate pushdown capabilities, reducing data reads and improving query performance.
…nected Predicate #43255 (#44438) Cherry-picked from #43255 Co-authored-by: Socrates <[email protected]>
…nected Predicate #43255 (#44436) Cherry-picked from #43255 Co-authored-by: Socrates <[email protected]>
### What problem does this PR solve? In the old logic, the `check_expr_can_push_down` function does not check whether the `orc::Literal` are constructed successfully, but only checks during `build_search_argument`. However, if it is found that the `orc::Literal` fails to be constructed after `builder->startNot`, it will fail because the builder cannot end `startNot`. Therefore, we advance the behavior of constructing `orc::Literal` to the `check_expr_can_push_down` function and save the result to the map, so that it will never fail in the `build_search_argument` phase. Related PR: #43255
…4615) In the old logic, the `check_expr_can_push_down` function does not check whether the `orc::Literal` are constructed successfully, but only checks during `build_search_argument`. However, if it is found that the `orc::Literal` fails to be constructed after `builder->startNot`, it will fail because the builder cannot end `startNot`. Therefore, we advance the behavior of constructing `orc::Literal` to the `check_expr_can_push_down` function and save the result to the map, so that it will never fail in the `build_search_argument` phase. Related PR: apache#43255
…r OR-connected Predicate apache#43255 (apache#44438)" This reverts commit dceaf97.
…ins (#45104) ### What problem does this PR solve? Related PR: #43255 Problem Summary: Should ignore null values when the literals of in_predicate contains null value, like `in (1, null)` For example, init table in hive: ```sql CREATE TABLE sample_orc_table ( id INT, name STRING, age INT ) STORED AS ORC; INSERT INTO TABLE sample_orc_table VALUES (1, 'Alice', 25), (2, NULL, NULL); ``` select result in Doris should be: ```sql mysql> select * from sample_orc_table where age in (null,25); +------+-------+------+ | id | name | age | +------+-------+------+ | 1 | Alice | 25 | +------+-------+------+ 1 row in set (0.30 sec) mysql> select * from sample_orc_table where age in (25); +------+-------+------+ | id | name | age | +------+-------+------+ | 1 | Alice | 25 | +------+-------+------+ 1 row in set (0.27 sec) mysql> select * from sample_orc_table where age in (null); Empty set (0.01 sec) mysql> select * from sample_orc_table where age is null; +------+------+------+ | id | name | age | +------+------+------+ | 2 | NULL | NULL | +------+------+------+ 1 row in set (0.11 sec) ```
…ins (#45104) ### What problem does this PR solve? Related PR: #43255 Problem Summary: Should ignore null values when the literals of in_predicate contains null value, like `in (1, null)` For example, init table in hive: ```sql CREATE TABLE sample_orc_table ( id INT, name STRING, age INT ) STORED AS ORC; INSERT INTO TABLE sample_orc_table VALUES (1, 'Alice', 25), (2, NULL, NULL); ``` select result in Doris should be: ```sql mysql> select * from sample_orc_table where age in (null,25); +------+-------+------+ | id | name | age | +------+-------+------+ | 1 | Alice | 25 | +------+-------+------+ 1 row in set (0.30 sec) mysql> select * from sample_orc_table where age in (25); +------+-------+------+ | id | name | age | +------+-------+------+ | 1 | Alice | 25 | +------+-------+------+ 1 row in set (0.27 sec) mysql> select * from sample_orc_table where age in (null); Empty set (0.01 sec) mysql> select * from sample_orc_table where age is null; +------+------+------+ | id | name | age | +------+------+------+ | 2 | NULL | NULL | +------+------+------+ 1 row in set (0.11 sec) ```
What problem does this PR solve?
Issue Number: close #xxx
Related PR: #xxx
Problem Summary:
This issue addresses a limitation in Apache Doris where only predicates joined by AND are pushed down to the ORC reader, leaving OR-connected predicates unoptimized. By extending pushdown functionality to handle these OR conditions, the aim is to better leverage ORC’s predicate pushdown capabilities, reducing data reads and improving query performance.
Check List (For Committer)
Test
Behavior changed:
Does this need documentation?
Release note
None
Check List (For Reviewer who merge this PR)