-
Notifications
You must be signed in to change notification settings - Fork 594
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Design and implement Kong router flavor change plan #5018
Comments
@mflendrich Do you intend to work on this? FYI: I've submitted #5040 as initial proposal how to tackle this. |
@pmalek the assignment was meant for the issue definition phase only. Please feel free to go ahead with your attempt at implementation. |
Some work has been started in #5040 and put on hold. Until I get to it, anyone feeling like working on it feel free to pick this up. |
tl;dr
Do we expect that this is possible in practice? Are there uncontrived configurations where different router configurations are expected? Was there some report elsewhere where we observed this? In general, we should expect that Kong replicas will all use the same router configuration under normal operation. All replicas in a Deployment will (eventually) have the same environment. There are two similar exceptions to this rule:
This state deadlocks immediately because the new replicas never become ready. Roughly:
Edit the proxy-kong Deployment to use This will spawn one new replica (AFAIK disruption budget limits the new replicas until some start working) that never becomes ready. Proxy logs will show repeated
The controller will show
because the route configuration is invalid for its configuration. You can scale the proxy replicas to 0 and then back up to 3 so that all replicas use the same router flavor, but that state being consistent won't actually fix things. I didn't check the code to confirm, but that behavior suggests the controller decides the flavor it'll use at startup and doesn't change it. This just results in a set of all unready replicas even though the router flavor is consistent.
You need to further rollout restart the controller so that it decides a new router flavor after. Warning in logs is probably not that effective here since it's kind of a die roll whether the system will return to a functional state: if the controller ever starts while the proxy replicas are in a mixed state and it (randomly) chooses a replica with the old config value, it'll get stuck on that and the system will remain in deadlock. IMO we're better off handling this in upgrade docs: for most users, this transition will probably only happen once, when/if we change the chart default flavor (Kong/charts#937 or similar--IDK if there's a relevant open issue at this point since we closed #4719). I wouldn't expect users to switch between the flavors on their own, though it's not out of the question: expressions is new enough that we could encounter bugs and recommend a user switch back to traditional until a bug affecting them is fixed. I haven't encountered such issues myself yet though (we should double-check with the gateway team, they may be aware of stuff we're not), so conservatively we could approach this with upgrade docs saying "do an outage window and scale everything to 0 before modifying this, then modify and scale back up to your usual replica count). If we do expect that users will need to switch router flavors often going forward, we'd maybe need to consider making the controller able to handle either flavor by generating both Finally, the above mostly applies to DB-less. DB-backed is, AFAIK, a whole different ballgame because routes live in the DB and aren't magically converted to the other flavor if you switch and restart. Initial exploration suggests that the active router flavor can only "see" routes with its type of config, so switching effectively just makes all routes created on the previous flavor disappear. The controller's diff and update logic may work around that (it should see that the route in the DB doesn't match the expected config for the current flavor and update it), but I didn't actually try to confirm, so take that guess with a generously-sized grain of salt. |
That's the typical upgrade scenario where I'd expect users to want to be covered and have zero down time ideally. @mheap mentioned that we do not expect users to change the flavor back and forth so I'd consider this as a typical use case for flavor change, one-of basically. I'd like to propose the following course of action in that case:
Are we OK with this approach or are there any other suggestions to this? @mflendrich @mheap |
Updated the title and description and put this tentatively in 3.2 milestone given that there's some pending work on the Gateway side. |
As I got from the gateway team, they have the changes on expression router for accepting fields in the traditional router (but cannot specify both traditional fields like
Following these steps, I think we can implement the upgrade process with no changes to KIC translator. Testing and documentation is required. Maybe we need to add a router flavor upgrade test for this. |
This is a helm chart change only. Let's wait until the Gateway LTS supports "mixed mode expression routing" and we can turn it on for all supported versions at once. |
From @randmonkey:
I'm of the opinion of waiting until Mar 2025 to change the default |
Even then, what’s the downside to keeping the current default? The behaviour is the same |
Is there an existing issue for this?
Problem Statement
If KIC is started with multiple downstream gateways having incompatible router flavors, such configuration has no chance to work and will lead to hard to understand errors.
As per discussion with @mheap, there exists some work on Kong Gateway side which aims to support some (limited?) expressions support in
traditional_compatible
router flavor. @mheap suggested to punt work in this issue here after 3.1 and wait until the Gateway work has more shape.Proposed Solution
traditional_compatible
Additional information
post-facto enhancement of #702
Acceptance Criteria
The text was updated successfully, but these errors were encountered: