Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature](iceberg-writer) Implements iceberg partition transform. #37692

Merged
merged 3 commits into from
Jul 13, 2024

Conversation

kaka11chen
Copy link
Contributor

Proposed changes

Cherry-pick iceberg partition transform functionality. #36289 #36889

…apache#36289)

apache#31442

Added iceberg operator function to support direct entry into the lake by
doris
1. Support insert into  data to iceberg by appending  hdfs files
2. Implement iceberg partition routing through partitionTransform
2.1) Serialize spec and schema data into json on the fe side and then
deserialize on the be side to get the schema and partition information
of iceberg table
2.2) Then implement Iceberg's Identity, Bucket, Year/Month/Day and other
types of partition strategies through partitionTransform and template
class
3. Transaction management through IcebergTransaction
3.1) After the be side file is written, report CommitData data to fe
according to the partition granularity
3.2) After receiving CommitData data, fe submits metadata to iceberg in
IcebergTransaction

### Future work
- Add unit test for partition transform function.
- Implement partition transform function with exchange sink turned on.
- The partition transform function omits the processing of bigint type.

---------

Co-authored-by: lik40 <[email protected]>
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
@doris-robot
Copy link

Thank you for your contribution to Apache Doris.
Don't know what should be done next? See How to process your PR

Since 2024-03-18, the Document has been moved to doris-website.
See Doris Document.

@kaka11chen
Copy link
Contributor Author

run buildall

Copy link
Contributor

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

clang-tidy made some suggestions

ColumnPtr null_map_column_ptr;
bool is_nullable = false;
if (column_ptr->is_nullable()) {
const ColumnNullable* nullable_column =
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

warning: use auto when initializing with a cast to avoid duplicating the type name [modernize-use-auto]

Suggested change
const ColumnNullable* nullable_column =
const auto* nullable_column =

ColumnPtr null_map_column_ptr;
bool is_nullable = false;
if (column_ptr->is_nullable()) {
const ColumnNullable* nullable_column =
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

warning: use auto when initializing with a cast to avoid duplicating the type name [modernize-use-auto]

Suggested change
const ColumnNullable* nullable_column =
const auto* nullable_column =

ColumnPtr null_map_column_ptr;
bool is_nullable = false;
if (column_ptr->is_nullable()) {
const ColumnNullable* nullable_column =
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

warning: use auto when initializing with a cast to avoid duplicating the type name [modernize-use-auto]

Suggested change
const ColumnNullable* nullable_column =
const auto* nullable_column =

Int32* __restrict p_out = out_data.data();

while (p_in < end_in) {
Int64 long_value = static_cast<Int64>(*p_in);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

warning: use auto when initializing with a cast to avoid duplicating the type name [modernize-use-auto]

Suggested change
Int64 long_value = static_cast<Int64>(*p_in);
auto long_value = static_cast<Int64>(*p_in);

ColumnPtr null_map_column_ptr;
bool is_nullable = false;
if (column_ptr->is_nullable()) {
const ColumnNullable* nullable_column =
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

warning: use auto when initializing with a cast to avoid duplicating the type name [modernize-use-auto]

Suggested change
const ColumnNullable* nullable_column =
const auto* nullable_column =

ColumnPtr null_map_column_ptr;
bool is_nullable = false;
if (column_ptr->is_nullable()) {
const ColumnNullable* nullable_column =
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

warning: use auto when initializing with a cast to avoid duplicating the type name [modernize-use-auto]

Suggested change
const ColumnNullable* nullable_column =
const auto* nullable_column =

ColumnPtr null_map_column_ptr;
bool is_nullable = false;
if (column_ptr->is_nullable()) {
const ColumnNullable* nullable_column =
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

warning: use auto when initializing with a cast to avoid duplicating the type name [modernize-use-auto]

Suggested change
const ColumnNullable* nullable_column =
const auto* nullable_column =

ColumnPtr null_map_column_ptr;
bool is_nullable = false;
if (column_ptr->is_nullable()) {
const ColumnNullable* nullable_column =
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

warning: use auto when initializing with a cast to avoid duplicating the type name [modernize-use-auto]

Suggested change
const ColumnNullable* nullable_column =
const auto* nullable_column =

ColumnPtr null_map_column_ptr;
bool is_nullable = false;
if (column_ptr->is_nullable()) {
const ColumnNullable* nullable_column =
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

warning: use auto when initializing with a cast to avoid duplicating the type name [modernize-use-auto]

Suggested change
const ColumnNullable* nullable_column =
const auto* nullable_column =

ColumnPtr col_ptr = partition_column.column->convert_to_full_column_if_const();
CHECK(col_ptr != nullptr);
if (col_ptr->is_nullable()) {
const ColumnNullable* nullable_column =
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

warning: use auto when initializing with a cast to avoid duplicating the type name [modernize-use-auto]

Suggested change
const ColumnNullable* nullable_column =
const auto* nullable_column =

@kaka11chen kaka11chen changed the title [Feature] (iceberg-writer) Implements iceberg partition transform. [Feature](iceberg-writer) Implements iceberg partition transform. Jul 12, 2024
… fix some issues. (apache#36889)

- Add iceberg partition transform unit tests.
- Change `ColumnWithTypeAndName apply(Block& block, int column_pos)` to
`ColumnWithTypeAndName apply(const Block& block, int column_pos)`.
- Fix and change string truncate partition transform issue.
- Fix bucket partition transform calculation error.
- Fix year/month partition transform calculation error due to leap year
issue.
@kaka11chen kaka11chen force-pushed the cherry-pick-36289-36889_2.1 branch from d7d31d3 to 9a2c922 Compare July 12, 2024 09:23
@kaka11chen
Copy link
Contributor Author

run buildall

@doris-robot
Copy link

TeamCity be ut coverage result:
Function Coverage: 36.64% (9271/25303)
Line Coverage: 28.14% (75859/269556)
Region Coverage: 26.94% (38973/144656)
Branch Coverage: 23.64% (19786/83688)
Coverage Report: http://coverage.selectdb-in.cc/coverage/9a2c9227ed2005552cd5b9f677eb80d1f1e12be0_9a2c9227ed2005552cd5b9f677eb80d1f1e12be0/report/index.html

@morningman
Copy link
Contributor

run buildall

@doris-robot
Copy link

TeamCity be ut coverage result:
Function Coverage: 36.64% (9272/25303)
Line Coverage: 28.14% (75861/269549)
Region Coverage: 26.94% (38974/144653)
Branch Coverage: 23.65% (19790/83688)
Coverage Report: http://coverage.selectdb-in.cc/coverage/d8b77b3ca2775faddf32fe3af62e3416e7bae755_d8b77b3ca2775faddf32fe3af62e3416e7bae755/report/index.html

@morningman morningman merged commit 8930df3 into apache:branch-2.1 Jul 13, 2024
19 of 21 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants