Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add UPDATE support for Iceberg #24281

Draft
wants to merge 7 commits into
base: master
Choose a base branch
from

Conversation

ZacBlanco
Copy link
Contributor

@ZacBlanco ZacBlanco commented Dec 18, 2024

Description

This PR adds support in the Iceberg connector for UPDATE operations.

Motivation and Context

Row-level table UPDATE support

Impact

  • Users can now set update_mode table property on Iceberg tables
  • Inserts and deletes now show operations as "overwrite" in the snapshot entries due to the new update implementation.
  • UPDATE <x> SET ... WHERE ... queries can now run successfully

Test Plan

Comprehensive set of unit tests for different tables and column types.

Contributor checklist

  • Please make sure your submission complies with our contributing guide, in particular code style and commit standards.
  • PR description addresses the issue accurately and concisely. If the change is non-trivial, a GitHub Issue is referenced.
  • Documented new properties (with its default value), SQL syntax, functions, or other functionality.
  • If release notes are required, they follow the release notes guidelines.
  • Adequate tests were added if applicable.
  • CI passed.

Release Notes

Please follow release notes guidelines and fill in the release notes below.

== RELEASE NOTES ==

Iceberg Changes
* Iceberg connector support for `UPDATE` SQL statements :pr:`24281`

ZacBlanco and others added 2 commits December 18, 2024 13:43
Without this change, the UpdateOperator would throw an exception
stating that there was no valid page source. This occurs because the
driver which is responsible for setting the UpdateablePageSource
never calls the proper method due to never receiving any inputs.

This now handles the case where the page source is never set by
returning an EmptySplitPageSource
This commit allows users to perform row-level updates when using
the Iceberg connector with Java executors.

This is achieved by improving on the IcebergUpdatablePageSource
to implement the updateRows method. The implementation passes
a  generated row ID column as a field in the page required by
updateRows. Then during updateRows, generated a positionDelete
file entry for the row ID, and also writes the row's updated value to a
new page sink for the newly updated data.

These new files are then commited in a rowDelta transaction within
the Iceberg connector metadata after processing is complete.

Co-Authored-By: Nidhin Varghese <[email protected]>
Co-Authored-By: Anoop V S <[email protected]>
@prestodb-ci prestodb-ci added the from:IBM PR from IBM label Dec 18, 2024
@prestodb-ci prestodb-ci requested review from a team, anandamideShakyan and ShahimSharafudeen and removed request for a team December 18, 2024 22:05
- failing equality delete test
- typo in assertion on test update with predicate
@ZacBlanco ZacBlanco force-pushed the upstream-iceberg-update branch from e56fc7c to f0a29a0 Compare December 20, 2024 18:28
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
from:IBM PR from IBM
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants