-
Notifications
You must be signed in to change notification settings - Fork 3.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[fix](iceberg)Bring field_id with parquet files And fix map type's key optional #44470
Conversation
Thank you for your contribution to Apache Doris. Please clearly describe your PR:
|
run buildall |
1 similar comment
run buildall |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
clang-tidy made some suggestions
namespace doris { | ||
namespace iceberg { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
warning: nested namespaces can be concatenated [modernize-concat-nested-namespaces]
namespace doris { | |
namespace iceberg { | |
namespace doris::iceberg { |
be/src/vec/exec/format/table/iceberg/arrow_schema_util.cpp:132:
- } // namespace iceberg
- } // namespace doris
+ } // namespace doris
return Status::OK(); | ||
} | ||
|
||
Status ArrowSchemaUtil::ConvertTo(const iceberg::NestedField& field, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
warning: function 'ConvertTo' exceeds recommended size/complexity thresholds [readability-function-size]
Status ArrowSchemaUtil::ConvertTo(const iceberg::NestedField& field,
^
Additional context
be/src/vec/exec/format/table/iceberg/arrow_schema_util.cpp:39: 89 lines including whitespace and comments (threshold 80)
Status ArrowSchemaUtil::ConvertTo(const iceberg::NestedField& field,
^
break; | ||
|
||
case iceberg::TypeID::DECIMAL: { | ||
DecimalType* dt = dynamic_cast<DecimalType*>(field.field_type()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
warning: use auto when initializing with a cast to avoid duplicating the type name [modernize-use-auto]
DecimalType* dt = dynamic_cast<DecimalType*>(field.field_type()); | |
auto* dt = dynamic_cast<DecimalType*>(field.field_type()); |
// #include <sys/types.h> | ||
// #include <unistd.h> | ||
|
||
#include <arrow/type.h> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
warning: 'arrow/type.h' file not found [clang-diagnostic-error]
#include <arrow/type.h>
^
|
||
namespace doris { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
warning: nested namespaces can be concatenated [modernize-concat-nested-namespaces]
namespace doris { | |
namespace doris::iceberg { |
be/src/vec/exec/format/table/iceberg/arrow_schema_util.h:47:
- } // namespace iceberg
- } // namespace doris
+ } // namespace doris
@@ -265,6 +265,10 @@ class DecimalType : public PrimitiveType { | |||
ss << "decimal(" << precision << ", " << scale << ")"; | |||
return ss.str(); | |||
} | |||
|
|||
int get_precision() { return precision; } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
warning: method 'get_precision' can be made const [readability-make-member-function-const]
int get_precision() { return precision; } | |
int get_precision() const { return precision; } |
|
||
int get_precision() { return precision; } | ||
|
||
int get_scale() { return scale; } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
warning: method 'get_scale' can be made const [readability-make-member-function-const]
int get_scale() { return scale; } | |
int get_scale() const { return scale; } |
namespace doris { | ||
namespace iceberg { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
warning: nested namespaces can be concatenated [modernize-concat-nested-namespaces]
namespace doris { | |
namespace iceberg { | |
namespace doris::iceberg { |
be/test/vec/exec/format/table/iceberg/arrow_schema_util_test.cpp:219:
- } // namespace iceberg
- } // namespace doris
+ } // namespace doris
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
clang-tidy made some suggestions
namespace doris { | ||
namespace iceberg { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
warning: nested namespaces can be concatenated [modernize-concat-nested-namespaces]
namespace doris { | |
namespace iceberg { | |
namespace doris::iceberg { |
be/src/vec/exec/format/table/iceberg/arrow_schema_util.h:48:
- } // namespace iceberg
- } // namespace doris
+ } // namespace doris
TeamCity be ut coverage result: |
run buildall |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
clang-tidy made some suggestions
return Status::OK(); | ||
} | ||
|
||
Status ArrowSchemaUtil::convert_to(const iceberg::NestedField& field, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
warning: function 'convert_to' exceeds recommended size/complexity thresholds [readability-function-size]
Status ArrowSchemaUtil::convert_to(const iceberg::NestedField& field,
^
Additional context
be/src/vec/exec/format/table/iceberg/arrow_schema_util.cpp:39: 89 lines including whitespace and comments (threshold 80)
Status ArrowSchemaUtil::convert_to(const iceberg::NestedField& field,
^
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
clang-tidy made some suggestions
break; | ||
|
||
case iceberg::TypeID::DECIMAL: { | ||
auto dt = dynamic_cast<DecimalType*>(field.field_type()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
warning: 'auto dt' can be declared as 'auto *dt' [readability-qualified-auto]
auto dt = dynamic_cast<DecimalType*>(field.field_type()); | |
auto *dt = dynamic_cast<DecimalType*>(field.field_type()); |
namespace doris { | ||
namespace iceberg { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
warning: nested namespaces can be concatenated [modernize-concat-nested-namespaces]
namespace doris { | |
namespace iceberg { | |
namespace doris::iceberg { |
be/src/vec/exec/format/table/iceberg/arrow_schema_util.h:43:
- } // namespace iceberg
- } // namespace doris
+ } // namespace doris
TeamCity be ut coverage result: |
run buildall |
TeamCity be ut coverage result: |
run p0 |
run external |
5732eb1
to
f8085c4
Compare
run buildall |
TeamCity be ut coverage result: |
run buildall |
TeamCity be ut coverage result: |
run p0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
PR approved by at least one committer and no changes requested. |
PR approved by anyone and no changes requested. |
run buildall |
TeamCity be ut coverage result: |
PR approved by at least one committer and no changes requested. |
…y optional (#44470) ### What problem does this PR solve? 1. Column IDs are required to be stored as [field IDs](http://github.com/apache/parquet-format/blob/40699d05bd24181de6b1457babbee2c16dce3803/src/main/thrift/parquet.thrift#L459) on the parquet schema. ref: https://iceberg.apache.org/spec/?h=field+id#parquet So, we should add field ids. 2. For `MapType`, its key is always required.
…y optional (#44470) ### What problem does this PR solve? 1. Column IDs are required to be stored as [field IDs](http://github.com/apache/parquet-format/blob/40699d05bd24181de6b1457babbee2c16dce3803/src/main/thrift/parquet.thrift#L459) on the parquet schema. ref: https://iceberg.apache.org/spec/?h=field+id#parquet So, we should add field ids. 2. For `MapType`, its key is always required.
…ap type's key optional #44470 (#44828) Cherry-picked from #44470 Co-authored-by: wuwenchi <[email protected]>
…ap type's key optional #44470 (#44827) Cherry-picked from #44470 Co-authored-by: wuwenchi <[email protected]>
What problem does this PR solve?
ref: https://iceberg.apache.org/spec/?h=field+id#parquet
So, we should add field ids.
MapType
, its key is always required.Release note
None
Check List (For Author)
Test
Behavior changed:
Does this need documentation?
Check List (For Reviewer who merge this PR)