-
Notifications
You must be signed in to change notification settings - Fork 166
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add SqlCatalog _commit_table support #265
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @syun64 for the great work! Overall LGTM. Just have two minor comments
pyiceberg/catalog/sql.py
Outdated
try: | ||
tbl = ( | ||
session.query(IcebergTables) | ||
.with_for_update(of=IcebergTables, nowait=True) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Quick question: Why do we use nowait
here? It looks like we're not catching errors from concurrent SELECT FOR UPDATE NOWAIT
attempts.
I think skip_locked=True
may be better here, as it lets the existing NoResultFound
catch handle the case when the row is locked by another session. What do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great question @HonahX . The reason I wanted to avoid using skip_locked is because I think that would obfuscate the exception message we receive when NoResultFound is raised, which currently means that the table that we wanted to find doesn't exist in the DB.
I felt that using nowait would be a good practice to avoid processes hanging and waiting for locks, which would behave a lot less like optimistic locking. With nowait, if we caught the right exception, we could raise a CommitFailedException with a message describing that a conflicting concurrent commit was made to the record, which is an expected type of exception with optimistic locking.
Would it work to catch sqlalchemy.exc.OperationalError that is raised when the lock isn't available?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the explanation! Reflecting further, I think that using neither NOWAIT nor SKIP_LOCKED might be better for maintaining consistency. Currently, when the engine supports accurate rowcounts, we employ UPDATE TABLE or DELETE TABLE, which inherently wait for locks. If the engine lacks rowcount support, sticking with SELECT FOR UPDATE ensures consistent behavior with UPDATE TABLE operations, as it waits for locks. This consistency eliminates the need to handle additional exceptions, as any concurrent transaction will naturally lead to a NoResultFound
scenario once the table is dropped, renamed, or updated.
Shifting focus to potential alternatives, If we later decide that avoiding lock waits is preferable in fallback situations, we could consider the following modifications:
- For
drop_table
andrename_table
, I think you're right that we can catchsqlalchemy.exc.OperationalError
and re-raise it as a new exception indicating that another process is set to delete the table. UsingCommitFailedException
doesn't seem appropriate here, as it's primarily meant for failed commits on iceberg tables. - For
_commit_table
,skip_locked
might still be useful. At the point when the SQL command is executed, we're assured of the table's existence. Therefore, encounteringNoResultFound
would directly imply a concurrent commit attempt.
Do you think the concern about maintaining consistency between cases that do and do not support rowcount is valid? If not, what are your thoughts on adopting NOWAIT for drop_table
and rename_table
, and SKIP_LOCKED for _commit_table
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @HonahX . Thank you for explaining so patiently.
If the engine lacks rowcount support, sticking with SELECT FOR UPDATE ensures consistent behavior with UPDATE TABLE operations, as it waits for locks. This consistency eliminates the need to handle additional exceptions, as any concurrent transaction will naturally lead to a NoResultFound scenario once the table is dropped, renamed, or updated.
I agree with your analysis here, and I think that dropping nowait and skip_locked will be best to mimic the other behavior with UPDATE TABLE operation as closely as possible.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM! Thanks for the great work @syun64 ! For this important PR, let's give some time for others to provide their input, especially looking forward to thoughts from @Fokko and @cccs-eric, who are the original authors of the SqlCatalog.
Thanks @syun64 for working on this 👍 |
#262
This PR takes much code-spiration from @HonahX 's recent PR for a similar feature enhancement to the Glue Catalog.
The only difference here is that we use the metadata_location as the optimistic locking parameter to ensure that the version of the table we used to validate the requirements on is still the same when the SQL update is being made.
EDIT:
for methods that require rowcount on UPDATE and DELETE for optimistic locking, we now fallback to "SELECT ... FOR UPDATE" sqlalchemy expression with_for_update to acquire a row level lock if the sqlalchemy dialect does not support rowcount for UPDATE / DELETE
https://docs.sqlalchemy.org/en/14/core/internals.html - supports_sane_rowcount