-
Notifications
You must be signed in to change notification settings - Fork 4.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
HIVE-28578: Fix concurrency issue in ObjectStore#updateTableColumnStatistics #5567
base: master
Are you sure you want to change the base?
Conversation
...e-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/ObjectStore.java
Outdated
Show resolved
Hide resolved
...e-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/ObjectStore.java
Outdated
Show resolved
Hide resolved
...e-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/ObjectStore.java
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please check my comments!
9df299d
to
2a6ce35
Compare
catch (NucleusDataStoreException e) { | ||
retries--; | ||
|
||
if (retries == 0) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about logging the error here? Something like Maximum number of retries (Retry = 3) reached.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure if it is necessary. Logging is already there from the retries. I don't know what is the value in a log entry right before throwing an exception. The exception will be logged and it contains the stack trace.
It would be something like that in the log:
[INFO] I will throw an exception
[ERROR] There is an exception...
int retries = 3; | ||
boolean success = false; | ||
while (!success && retries > 0) { | ||
// TODO## ideally the col stats stats should be in colstats, not in the table! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
" col stats stats" I think the stats was repeated by mistake, so this can be removed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure if the author made it as a mistakes. Column statistics are a kind of statistics. So saying the statistics around column statistics shouldn't be in the table make sense to me.
I am not sure I get this. There is already a table-level lock for atomic stats update and we are adding an extra re-try loop inside? |
The existing table-level lock is an in-memory lock. That means in case you have multiple HMS instances and two different processes are getting requests in the same time, it just doesn't have any effect - as on JVM level, there is only one single instance. |
Quality Gate passedIssues Measures |
ok, in-memory lock serves diff purpose - thread safety, see HIVE-25904. Descriptions says "updateTableColumnStatistics can throw SQLIntegrityConstraintViolationException during replication if HA is on and two different HMS instance gets the same call but with different engine. " Some of RawStore interface methods are not retriable by design, so I would expect others to be retriable without the need for extra hacks, simply use
|
It is trickier than this. Most of the statistics are stored in
I assume that TODO never happened.
Let me check what that exactly does and if it is possible to apply at the customer. |
Hi @deniskuzZ , TBH, I'm not sure if using the Retrying handler for all the metastore client calls is an approach that worth to use just because we have this one and only rare edge case. And I also consider it the problem of using an ORM tool that cannot properly handle such a trivial case. It could be handled with a simple merge statement, using plain SQL. |
What changes were proposed in this pull request?
There can be a concurrency issue and persisting table parameters can have duplicate key error when replication wants to store table column statistics and the table have statistics for multiple engines. In this change, in a parallel process already saved those, we retry it so that, DataNucleus will try to do update instead of insert.
Why are the changes needed?
To avoid duplicate key errors.
Does this PR introduce any user-facing change?
No
Is the change a dependency upgrade?
No
How was this patch tested?
Repro and test process is described in the ticket: https://issues.apache.org/jira/browse/HIVE-28578?focusedCommentId=17901761&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-17901761
Tested manually, on a cluster.