-
Notifications
You must be signed in to change notification settings - Fork 88
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[scd] properly cleanup implicit sub on oir update #1057
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you @Shastick for taking care of it. This looks good to me since it only cleans up orphan subscription without changing the current response to USSP.
Let's open another issue to discuss if creating a new implicit subscription is the expected behaviour.
You will find minor comments inline.
return stacktrace.Propagate(err, "Could not determine if previous Subscription can be removed") | ||
} | ||
|
||
// We _could_ reuse the previous subscription ID if it was an implicit subscription, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please add a reference to the ticket to keep a link with the design discussion/decision.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The comment does not entirely reflect my intention, I will reword it.
in short: when an update is done and the parameters for the creation of an implicit subscription are passed, my understanding is that we actually want to create a new implicit subscription.
Clients are allowed to change the subscription attached to an OIR, and from my understanding, I think it is valid to give the option to replace an existing subscription with an implicit one.
Clients that wish to reuse an existing subscription (whether or not it is an implicit one) should specify the existing subscription ID in the call parameters.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To be more precise: there was no design discussion about this – this is my interpretation of how the API should work.
If you think this needs discussing/clarifying we can either open an issue, discuss here or have a chat at the next weekly meeting.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
when an update is done and the parameters for the creation of an implicit subscription are passed, my understanding is that we actually want to create a new implicit subscription.
Yes, when new_subscription
is specified, a new implicit subscription should be created.
I think it is valid to give the option to replace an existing subscription with an implicit one.
I don't know of an actual use case for this, but I agree this is what should happen if someone specifies new_subscription
on an op intent with an existing subscription. And, if that existing subscription was implicit and no longer has any dependent operational intents, then I would expect the existing subscription to be automatically deleted by the DSS since it was implicit (managed by DSS).
Clients that wish to reuse an existing subscription (whether or not it is an implicit one) should specify the existing subscription ID in the call parameters.
Agreed.
If you think this needs discussing/clarifying we can either open an issue, discuss here or have a chat at the next weekly meeting.
I don't think there is any ambiguity here so I don't think there's need for further discussion. However, if someone thinks otherwise then we can certainly have that discussion.
if len(dependentOps) == 0 { | ||
return false, stacktrace.NewError("An implicit Subscription had no dependent OperationalIntents") | ||
} else if len(dependentOps) == 1 { | ||
return true, nil |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We're assuming the one dependent operational intent reference is the one the caller is interested in, but that assumption is neither documented nor enforced. If we're just checking a particular subscription, then the answer shouldn't be "yes" or "no", it should be "only when removing this particular operational intent reference" (returning an optionally-nil pointer to an operational intent ID). Or, if we're checking whether a particular subscription can be removed after removing the dependency of an operational intent reference on that subscription, then the ID of the operational intent should be passed into this function and validated that the single dependent operational intent is the operational intent passed.
The operational intent ID was not needed previously because the context in which the code used to be located had just identified this subscription as the one the operational intent was using.
@@ -16,6 +16,42 @@ import ( | |||
"github.com/interuss/stacktrace" | |||
) | |||
|
|||
// subscriptionCanBeRemoved will check if: | |||
// - a previous subscription was attached, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Attached to what? Neither this documentation nor the function signature mentions an operational intent. I think we should add an op intent ID to the function signature (see other comment) and then this will make more sense (a previous subscription was attached to the specified operational intent).
ddf25c7
to
67d531c
Compare
@@ -490,6 +511,14 @@ func (a *Server) upsertOperationalIntentReference(ctx context.Context, authorize | |||
} | |||
} | |||
|
|||
removePreviousImplicitSubscription, err = subscriptionIsOnlyAttachedToOIR(ctx, r, &id, previousSubscriptionID) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Any reason why not deleting the subscription in this block rather than setting a flag? That would simplify the control flow.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
At this point in the transaction the subscription cannot be deleted, because it is referenced by the operational intent.
The previous implicit subscription can only be removed once the OIR has been updated to point to another subscription.
If you delete it before the r.UpsertOperationalIntent(ctx, op)
statement at line 719, you get an error.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note to self: will need to confirm the above again, as I might have been confused by another error.
My current understanding is that foreign keys are enforced immediately in cockroach DB, and not at the time when the transaction is committed, which is why the subscription cleanup must happen after the OIR was updated. The doc seems to concur, though I have not checked if we set a non-default foreign key action on the relevant field.
The relevant extract:
If there are any existing references to the key being deleted, the transaction will fail at the end of the statement
if err != nil { | ||
return false, stacktrace.Propagate(err, "Unable to get OperationalIntent's Subscription from repo") | ||
} | ||
if sub == nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We retrieve the subscription ID from the operational intent out of the store. IIUC it is then not realistic for the subscription to not exist because the store will guarantee that to not happen.
IMO this also highlights that invoking GetSubscription
here is not appropriatea: this should be done when previousSubscriptionID
is retrieved, and then pass the subscription around and not just its ID.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We retrieve the subscription ID from the operational intent out of the store. IIUC it is then not realistic for the subscription to not exist because the store will guarantee that to not happen.
Operational intents are not guaranteed to have a subscription attached: OIR's in the 'ACCEPTED' state may well be created without a subscription. Sorry, had to put myself back into the context of the PR.
At this point the sub
should not be nil, but I think we should still check for this nil and report a clean error if it happens versus running into a nil pointer, in case we have a bug?
This check is taken verbatim from the old deletion logic that this method replaces.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IMO this also highlights that invoking GetSubscription here is not appropriatea: this should be done when previousSubscriptionID is retrieved, and then pass the subscription around and not just its ID.
I disagree: there are circumstances where we won't need to fetch the subscription. For example, when updating an OIR, if the request specifies the same subscription ID as before, there is no need to get it from the store.
return false, stacktrace.Propagate(err, "Could not find dependent OperationalIntents") | ||
} | ||
if len(dependentOps) == 0 { | ||
return false, stacktrace.NewError("An implicit Subscription had no dependent OperationalIntents") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure if that's OK: when this new logic will be introduced, there will exist in the store some dangling implicit subscriptions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That is correct: although, I essentially kept the logic as it existed before being factored into this method.
I'd rather fix this in a separate PR, as there might be some discussion about whether or not we should opportunistically fix such inconsistencies when we encounter them vs. noisily failing so we become aware of them.
return false, stacktrace.NewError("OperationalIntent's Subscription missing from repo") | ||
} | ||
|
||
if sub.ImplicitSubscription { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Better use an early return if the subscription is explicit.
67d531c
to
cfc4895
Compare
I tried to build an overview of what should happen wrt subscription in the upsert. Is the subscription ID provided in the request parameters the same as the subscription ID attached to the existing operational intent? (in both cases the subscription ID may be none)
At the moment we have a bit of a complex control flow for handling subscriptions updates, based first on whether the subscription ID is empty, but if the above is correct, the control flow should be as simple as (and also extensible to the deletion of operational intents):
If I'm correct a small refactoring could be appropriate to factor away this logic IMO. WDYT? |
Taking to draft as I need to handle another corner case. |
ee0fdf1
to
a917152
Compare
Yes, it's actually possible to move the check out of the if block. The result does not yet feel entirely clean, but I'm not sure we can do better without doing a deeper refactoring |
dbf7c86
to
f4948b8
Compare
cbc2b0f
to
c917f3a
Compare
229c4a2
to
7bf4b55
Compare
@mickmis where do we stand on the need to refactor the Once I'm done with handling the OVN check PRs and corresponding scenarios I can take a look at the refactor, but I'm wondering if it is aligned with our current priorities? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
where do we stand on the need to refactor the UpdateOperationalIntentReference method? If needed, the refactor should certainly happen in another PR: should we do it before or after merging this one? How urgent is this with respect to the current open issues that are caused by the dangling OIRs?
Once I'm done with handling the OVN check PRs and corresponding scenarios I can take a look at the refactor, but I'm wondering if it is aligned with our current priorities?
After looking at it one more time, I do believe the change in this PR does what it is supposed to do. But I think it can and should be simplified more.
If not bound by any other constraint, IMO this function should be refactored before making such a change. Its control flow is already more complex than it should, and this change make that situation even worse. This makes it hard to identify tricky corner-cases, both at implementation and review. Also, the logic in this function is both one of the most complex and most central in the DSS.
Now, I'm not sure where we stand in terms of constraints and priorities. Do you have an opinion @BenjaminPelletier ?
// A subscription has been specified: | ||
// If it is different from the previous subscription, we need to fetch it from the store | ||
// in order to ensure it correctly covers the OIR. | ||
if effectiveSubscription == nil || previousSubIsBeingReplaced { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
effectiveSubscription == nil
will always be true here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually now that I look at it again, I'm also not completely sure why this change here is needed? Was there previously an issue that is fixed by this change? Or is it just an optimization to avoid fetching twice the same subscription?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
effectiveSubscription == nil will always be true here.
Thanks for catching. A previous revision was pre-setting the value and that ended up being even more confusing than it is now.
Actually now that I look at it again, I'm also not completely sure why this change here is needed? Was there previously an issue that is fixed by this change? Or is it just an optimization to avoid fetching twice the same subscription?
This is to avoid re-fetching the same subscription already fetched at line 496 in the if old.SubscriptionID != nil {
block if that subscription is not being replaced. In cases where it is being replaced, we need to fetch the subscription that will eventually be attached to the OIR to verify that it properly covers the OIR.
It indeed ensures that we don't do unnecessary roundtrips.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for catching. A previous revision was pre-setting the value and that ended up being even more confusing than it is now.
(Well, as a good illustration of how convoluted the control logic got, my 'simplification' was too much and needs a rework)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is to avoid re-fetching the same subscription already fetched at line 496 in the if old.SubscriptionID != nil { block if that subscription is not being replaced. In cases where it is being replaced, we need to fetch the subscription that will eventually be attached to the OIR to verify that it properly covers the OIR.
It indeed ensures that we don't do unnecessary roundtrips.
Could you add a comment mentioning that? That clarifies that this is not actually a separate case, but an optimization. Ideally that could be hidden behind a function but not sure if that's doable or makes sense here.
Do you mean that this PR can and should be simplified, or that the method should be simplified?
I personally see this as two separate issues:
As such, I would see no issue in proceeding with adding something as long as it is correct (when we apply our usual standard for correctness) even if it worsens maintainability a little bit, if focusing on maintainability first means delaying the fix by an unknown amount of time. I see three reasons for not waiting on a refactor to proceed with the fix:
I do however agree that the logic herein is pretty central to what the DSS does, and that it would deserve to be made as maintainable as possible as that will benefit us in the long term. |
7bf4b55
to
5f0dab6
Compare
6584b7a
to
1c4e1ae
Compare
1c4e1ae
to
9ca5e54
Compare
I do think fixing the correctness problem from #1056 (which I think this PR does) is important; it has impact in a number of areas (e.g., this table from #1074 which probably means #1074 wouldn't have been filed if this issue had been fixed earlier). I have not reviewed the full content of this PR in particular because Committer review time is limited and theoretically we should all have fairly similar overall thoughts (when we don't, that's a good candidate for additional documentation in the TSC repo). I (and/or @barroco) can take a more holistic look to weigh in on the inherently-gray tradeoffs between correctness, maintainability, and timeliness, though I think in an ideal world (which indeed we don't live in, unfortunately) a good result could be worked out with the reviewer(s) already most familiar with the content. I don't know if this general thought is applicable to this specific case, but often complexity can be managed by factoring more sections of a large function into their own functions with their own documentation, signatures, and encapsulated behavior (as has already been done once in this PR). This lets the core of the complex function look more like brief pseudocode, and therefore be more understandable at a glance (like @mickmis's pseudocode). We don't need this fix tomorrow, so we can still iterate on the most maintainable version, but I agree we don't want that to take forever. If anyone thinks we could do a better job at picking tradeoffs, that would be a great topic for the Tuesday meeting, and probably worth giving a heads up on Slack 1-2 days beforehand that we'll want to discuss that so anyone who wants to can pre-read the background info (e.g., this PR) to be better prepared to discuss real time. |
Given that
I'm happy to start with encapsulating things into submethods and see what kind of results I can come up with today. edit: created #1088 so we can discuss the refactor-specific things there. |
You make great points here, thanks for the response, I'd just add that for correctness we need understanding in addition to testing (which is what I struggled to reach). All in all I think that what you ended up doing @Shastick (i.e. timeboxing some enhancements) is a good compromise. (thanks for the feedback @BenjaminPelletier) |
Closing this PR to open a new clean one, since so much of original base branch has changed. See #1104 for the continuation |
This is a rework of interuss#1057 after most changes for interuss#1088 have landed. The DSS built from this PR successfully passed the new qualifier's implicit subscription handling scenario, although that scenario still needs to be extended to cover all corner cases. The PR is open for review, as it addresses interuss#1056
This is a rework of #1057 after most changes for #1088 have landed. The DSS built from this PR successfully passed the new qualifier's implicit subscription handling scenario, although that scenario still needs to be extended to cover all corner cases. The PR is open for review, as it addresses #1056
This is a first stab at #1056, and covers two corner cases that currently result in dangling implicit subscriptions:
The OIR upsert logic now check the following:
We now delete subscriptions that meet all the conditions above.
Notes
This PR chooses the approach to maintain the re-creation of an implicit subscription when, in a request to update an OIR, parameters for an implicit subscription are provided.
Resolves #1056
Test plan: with this PR, the DSS is passing the test scenario check through which I discovered the problem, but which has not yet been added to the qualifier. (It will be available once interuss/monitoring#720 is merged)