-
Notifications
You must be signed in to change notification settings - Fork 654
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[BUG] Collisions on subworkflow launchplan execution IDs #2778
[BUG] Collisions on subworkflow launchplan execution IDs #2778
Comments
For an immediate fix we will update the hashing algorithm to 64bit (or 128bit?) to increase the length of the hash. This should significantly reduce the probability of a collision. However, in the very small chance a collision would still occur we should implement some additional logic to check for an existing execution before starting the launchplan. |
I've considered a few different options, after a brief chat with Dan. There are three problems to solve here Detecting collisions [Punt]Admin today detect collisions and returns "Already exists" iff exec name matches an existing one. We can modify that behavior to check for collisions in Why should we do this? Reduce collisions
Backward compatibility
|
@EngHabu my only comment deals with maintaining versions in differing locations moving forward, for example if we chose to add a CRD version we will have (1) individual component version (ie. eventing like you mentioned + current others), (2) our newly defined CRD version which is incremented for any changes moving forward, and (3) the k8s CRD resource version (currently v1alpha1), but I know we discussed moving this if we update the CRD to store strict binary for space savings (what is the relationship between a v1beta1 release and our flyte workflow version?). I don't feel strongly about any of these options - but I think it is important to give some thought on the implications of adding another version. Any choice we can probably do some refactoring to clean versions not conforming to the new "standard". |
Describe the bug
When a launchplan is executed as part of a Flyte workflow FlytePropeller is responsible for generating an execution ID. This algorithm must be deterministic so that subsequent workflow processing checks the correct launchplan execution ID to retrieve status. Currently it uses a 32bit hash of the subworkflow node ID. However, this only produces a 8 character (actually 7 and 1 padded) execution ID. With a large number of launchplan executions collisions may occur.
Expected behavior
Launchplan executions ID should never collide.
Additional context to reproduce
No response
Screenshots
No response
Are you sure this issue hasn't been raised already?
Have you read the Code of Conduct?
The text was updated successfully, but these errors were encountered: