Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

kafka-broker-receiver crashes on startup starting with 1.14.0 in EKS #3901

Open
wSedlacek opened this issue May 15, 2024 · 30 comments · Fixed by #3997
Open

kafka-broker-receiver crashes on startup starting with 1.14.0 in EKS #3901

wSedlacek opened this issue May 15, 2024 · 30 comments · Fixed by #3997
Labels
kind/bug Categorizes issue or PR as related to a bug.

Comments

@wSedlacek
Copy link

Describe the bug
The container in kafka-broker-receiver reports error code 143 on startup.
The only logs are

Picked up JAVA_TOOL_OPTIONS: -XX:+CrashOnOutOfMemoryError
{"@timestamp":"2024-05-15T19:22:46.914Z","@version":"1","message":"Registering tracing configurations backend=UNKNOWN sampleRate=0.0 loggingDebugEnabled=false headersFormat=W3C","logger_name":"dev.knative.eventing.kafka.broker.core.tracing.TracingConfig","thread_name":"main","level":"INFO","level_value":20000,"backend":"UNKNOWN","sampleRate":0.0,"loggingDebugEnabled":false,"headersFormat":"W3C"}
{"@timestamp":"2024-05-15T19:22:46.957Z","@version":"1","message":"Starting Receiver env=ReceiverEnv{ingressPort=8080, livenessProbePath='/healthz', readinessProbePath='/readyz', httpServerConfigFilePath='/etc/config/config-kafka-broker-httpserver.properties'} BaseEnv{producerConfigFilePath='/etc/config/config-kafka-broker-producer.properties', dataPlaneConfigFilePath='/etc/brokers-triggers/data', metricsPort=9090, metricsPath='/metrics', metricsPublishQuantiles=false}","logger_name":"dev.knative.eventing.kafka.broker.receiver.main.Main","thread_name":"main","level":"INFO","level_value":20000,"env":{"producerConfigFilePath":"/etc/config/config-kafka-broker-producer.properties","dataPlaneConfigFilePath":"/etc/brokers-triggers/data","metricsPort":9090,"metricsPath":"/metrics","metricsJvmEnabled":false,"metricsHTTPClientEnabled":false,"metricsHTTPServerEnabled":false,"configTracingPath":"/etc/tracing","configFeaturesPath":"/etc/features","waitStartupSeconds":8,"ingressPort":8080,"ingressTLSPort":8443,"livenessProbePath":"/healthz","readinessProbePath":"/readyz","httpServerConfigFilePath":"/etc/config/config-kafka-broker-httpserver.properties","publishQuantilesEnabled":false}}
{"@timestamp":"2024-05-15T19:22:46.987Z","@version":"1","message":"Metrics cert paths weren't provided, server will start without TLS","logger_name":"dev.knative.eventing.kafka.broker.core.metrics.Metrics","thread_name":"main","level":"INFO","level_value":20000}
{"@timestamp":"2024-05-15T19:22:46.987Z","@version":"1","message":"Metrics server host wasn't provided, using default value 0.0.0.0","logger_name":"dev.knative.eventing.kafka.broker.core.metrics.Metrics","thread_name":"main","level":"INFO","level_value":20000}
May 15, 2024 7:22:47 PM org.jboss.logmanager.JBossLoggerFinder getLogger
ERROR: The LogManager accessed before the "java.util.logging.manager" system property was set to "org.jboss.logmanager.LogManager". Results may be unexpected.

Enabling verbose logging does show a few more DNS requests but the important post those logs with verbose seems to be

{"@timestamp":"2024-05-15T19:25:13.059Z","@version":"1","message":"Got raw OIDC discovery info: {\"issuer\":\"https://oidc.eks.us-east-1.amazonaws.com/id\provider_id",\"jwks_uri\":\"https://node-id:443/openid/v1/jwks\",\"response_types_supported\":[\"id_token\"],\"subject_types_supported\":[\"public\"],\"id_token_signing_alg_values_supported\":[\"RS256\"]}","logger_name":"dev.knative.eventing.kafka.broker.core.oidc.TokenVerifier","thread_name":"vert.x-eventloop-thread-1","level":"DEBUG","level_value":10000}
{

I suspect the OIDC features are causing the issue as I don't see this behavior locally in k3d.

Expected behavior
When OIDC is disabled (default) OIDC should not be used (which I think is causing the crash) but more importantly it should not crash on startup

To Reproduce
In EKS install with

kubectl apply -f https://github.com/knative-extensions/eventing-kafka-broker/releases/download/knative-v1.14.0/eventing-kafka-controller.yaml
kubectl apply -f https://github.com/knative-extensions/eventing-kafka-broker/releases/download/knative-v1.14.0/eventing-kafka-broker.yaml

Knative release version
Knative Eventing: 1.14.0

Additional context
Downgrading to 1.13.9 does not produce the issue.
Upgrading to 1.14.1 or 1.14.2 still has the issue.

@wSedlacek wSedlacek added the kind/bug Categorizes issue or PR as related to a bug. label May 15, 2024
@Cali0707
Copy link
Member

cc @creydr

@creydr
Copy link
Contributor

creydr commented May 31, 2024

Hello @wSedlacek,
thanks for reporting this.
The OIDC config is loaded initially (authentication-oidc enabled or not), but should only crash on an invalid config when it is enabled. And I don't see this error message here. Can you share more logs, so I can understand better what is going on?

@wSedlacek
Copy link
Author

Sadly there isn't much else to go off in the logs. With the default logging it simply tries to start then terminates.
With verbose logging I see the OIDC logs I shared briefly before it terminates, with nothing else notable.
It runs fine locally in k3d, but in the EKS environment it just falls over on startup.

My only real lead is that it works with 1.13.9 but stops working with 1.14.0, so it has to be something that changed between those two versions.

@creydr
Copy link
Contributor

creydr commented May 31, 2024

Do you have any termination logs / logs why the container was terminated (or something in the pods .status.containerStatuses?

OIDC support was added in 1.14.

@wSedlacek
Copy link
Author

Ah, yes. This might be useful.

  containerStatuses:
    - name: kafka-broker-receiver
      state:
        running:
          startedAt: '2024-05-31T14:58:24Z'
      lastState:
        terminated:
          exitCode: 143
          reason: Error
          message: >
            alue":20000,"backend":"UNKNOWN","sampleRate":0.0,"loggingDebugEnabled":false,"headersFormat":"W3C"}

            {"@timestamp":"2024-05-31T14:58:07.449Z","@version":"1","message":"Starting
            Receiver env=ReceiverEnv{ingressPort=8080,
            livenessProbePath='/healthz', readinessProbePath='/readyz',
            httpServerConfigFilePath='/etc/config/config-kafka-broker-httpserver.properties'}
            BaseEnv{producerConfigFilePath='/etc/config/config-kafka-broker-producer.properties',
            dataPlaneConfigFilePath='/etc/brokers-triggers/data',
            metricsPort=9090, metricsPath='/metrics',
            metricsPublishQuantiles=false}","logger_name":"dev.knative.eventing.kafka.broker.receiver.main.Main","thread_name":"main","level":"INFO","level_value":20000,"env":{"producerConfigFilePath":"/etc/config/config-kafka-broker-producer.properties","dataPlaneConfigFilePath":"/etc/brokers-triggers/data","metricsPort":9090,"metricsPath":"/metrics","metricsJvmEnabled":false,"metricsHTTPClientEnabled":false,"metricsHTTPServerEnabled":false,"configTracingPath":"/etc/tracing","configFeaturesPath":"/etc/features","waitStartupSeconds":8,"ingressPort":8080,"ingressTLSPort":8443,"livenessProbePath":"/healthz","readinessProbePath":"/readyz","httpServerConfigFilePath":"/etc/config/config-kafka-broker-httpserver.properties","publishQuantilesEnabled":false}}

            {"@timestamp":"2024-05-31T14:58:07.474Z","@version":"1","message":"Metrics
            cert paths weren't provided, server will start without
            TLS","logger_name":"dev.knative.eventing.kafka.broker.core.metrics.Metrics","thread_name":"main","level":"INFO","level_value":20000}

            {"@timestamp":"2024-05-31T14:58:07.474Z","@version":"1","message":"Metrics
            server host wasn't provided, using default value
            0.0.0.0","logger_name":"dev.knative.eventing.kafka.broker.core.metrics.Metrics","thread_name":"main","level":"INFO","level_value":20000}

            May 31, 2024 2:58:07 PM org.jboss.logmanager.JBossLoggerFinder
            getLogger

            ERROR: The LogManager accessed before the
            "java.util.logging.manager" system property was set to
            "org.jboss.logmanager.LogManager". Results may be unexpected.
          startedAt: '2024-05-31T14:58:06Z'
          finishedAt: '2024-05-31T14:58:24Z'
          containerID: >-
            containerd://ad23317fb9960c9b6c48e987ef16a90e6dba9b549849bbc0f79c93624938f763

@treyhyde
Copy link

FWIW I was forced to revert 1.14 to 1.13.x, I couldn't spot anything interesting in the logs to take a guess at why it wasn't starting up.

I believe my containerStatuses was similar to above.

@pierDipi
Copy link
Member

I'm trying with removing that dependency as we don't need it but I will need somehow to test it out on EKS #3997, we can also release it as it's a clean up that shouldn't affect functionality (tests are passing)

@treyhyde
Copy link

treyhyde commented Jul 29, 2024

I don't believe this is fixed, without jboss

{"@timestamp":"2024-07-29T17:38:01.65Z","@version":"1","message":"Registering tracing configurations backend=ZIPKIN sampleRate=1.0 loggingDebugEnabled=false headersFormat=W3C","logger_name":"dev.knative.eventing.kafka.broker.core.tracing.TracingConfig","thread_name":"main","level":"INFO","level_value":20000,"backend":"ZIPKIN","sampleRate":1.0,"loggingDebugEnabled":false,"headersFormat":"W3C"}
{"@timestamp":"2024-07-29T17:38:02.021Z","@version":"1","message":"Starting Receiver env=ReceiverEnv{ingressPort=8080, livenessProbePath='/healthz', readinessProbePath='/readyz', httpServerConfigFilePath='/etc/config/config-kafka-broker-httpserver.properties'} BaseEnv{producerConfigFilePath='/etc/config/config-kafka-broker-producer.properties', dataPlaneConfigFilePath='/etc/brokers-triggers/data', metricsPort=9090, metricsPath='/metrics', metricsPublishQuantiles=false}","logger_name":"dev.knative.eventing.kafka.broker.receiver.main.Main","thread_name":"main","level":"INFO","level_value":20000,"env":{"producerConfigFilePath":"/etc/config/config-kafka-broker-producer.properties","dataPlaneConfigFilePath":"/etc/brokers-triggers/data","metricsPort":9090,"metricsPath":"/metrics","metricsJvmEnabled":false,"metricsHTTPClientEnabled":false,"metricsHTTPServerEnabled":false,"configTracingPath":"/etc/tracing","configFeaturesPath":"/etc/features","waitStartupSeconds":8,"ingressPort":8080,"ingressTLSPort":8443,"livenessProbePath":"/healthz","readinessProbePath":"/readyz","httpServerConfigFilePath":"/etc/config/config-kafka-broker-httpserver.properties","publishQuantilesEnabled":false}}
{"@timestamp":"2024-07-29T17:38:02.065Z","@version":"1","message":"Metrics cert paths weren't provided, server will start without TLS","logger_name":"dev.knative.eventing.kafka.broker.core.metrics.Metrics","thread_name":"main","level":"INFO","level_value":20000}
{"@timestamp":"2024-07-29T17:38:02.065Z","@version":"1","message":"Metrics server host wasn't provided, using default value 0.0.0.0","logger_name":"dev.knative.eventing.kafka.broker.core.metrics.Metrics","thread_name":"main","level":"INFO","level_value":20000}```

and then fails liveness probes as the service never appears to actually listen.

Liveness probe failed: Get "http://10.207.25.223:8080/healthz": dial tcp 10.207.25.223:8080: connect: connection refused
Warning Unhealthy 6m22s (x8 over 6m46s) kubelet Readiness probe failed: Get "http://10.207.25.223:8080/readyz": dial tcp 10.207.25.223:8080: connect: connection refused

@Cali0707
Copy link
Member

@treyhyde which patch release of 1.14 are you running? The most recent one I can see is 1.14.7, which does not have the jboss fix in it

@treyhyde
Copy link

This was 1.15.0, I'm assuming the patch is in that build as I no longer see the jboss messages.

1.13.x is the last version that has actually successfully started.

@Cali0707
Copy link
Member

Okay, reopening this for now then, thanks for trying 1.15 out @treyhyde ! We'll keep trying to debug and fix this

@Cali0707 Cali0707 reopened this Jul 29, 2024
@Cali0707
Copy link
Member

@treyhyde since I don't have access to an EKS cluster, if I were to share a receiver image with extra logging would you be open to testing that out on EKS and sharing the logs with us?

@treyhyde
Copy link

@Cali0707 absolutely

@Cali0707
Copy link
Member

Cali0707 commented Jul 30, 2024

@treyhyde the image with extra startup logs is quay.io/cali0707/knative/knative-kafka-broker-receiver-loom:extra-startup-logs

This was built off of https://github.com/Cali0707/eventing-kafka-broker/tree/extra-receiver-startup-logs

Thanks for helping to debug this!

@treyhyde
Copy link

Picked up JAVA_TOOL_OPTIONS: -XX:+CrashOnOutOfMemoryError
{"@timestamp":"2024-07-30T15:09:51.122Z","@version":"1","message":"Registering tracing configurations backend=ZIPKIN sampleRate=1.0 loggingDebugEnabled=false headersFormat=W3C","logger_name":"dev.knative.eventing.kafka.broker.core.tracing.TracingConfig","thread_name":"main","level":"INFO","level_value":20000,"backend":"ZIPKIN","sampleRate":1.0,"loggingDebugEnabled":false,"headersFormat":"W3C"}
{"@timestamp":"2024-07-30T15:09:51.469Z","@version":"1","message":"about to load properties from file","logger_name":"dev.knative.eventing.kafka.broker.core.utils.Configurations","thread_name":"main","level":"INFO","level_value":20000}
{"@timestamp":"2024-07-30T15:09:51.47Z","@version":"1","message":"loaded properties from file","logger_name":"dev.knative.eventing.kafka.broker.core.utils.Configurations","thread_name":"main","level":"INFO","level_value":20000}
{"@timestamp":"2024-07-30T15:09:51.472Z","@version":"1","message":"Starting Receiver env=ReceiverEnv{ingressPort=8080, livenessProbePath='/healthz', readinessProbePath='/readyz', httpServerConfigFilePath='/etc/config/config-kafka-broker-httpserver.properties'} BaseEnv{producerConfigFilePath='/etc/config/config-kafka-broker-producer.properties', dataPlaneConfigFilePath='/etc/brokers-triggers/data', metricsPort=9090, metricsPath='/metrics', metricsPublishQuantiles=false}","logger_name":"dev.knative.eventing.kafka.broker.receiver.main.Main","thread_name":"main","level":"INFO","level_value":20000,"env":{"producerConfigFilePath":"/etc/config/config-kafka-broker-producer.properties","dataPlaneConfigFilePath":"/etc/brokers-triggers/data","metricsPort":9090,"metricsPath":"/metrics","metricsJvmEnabled":false,"metricsHTTPClientEnabled":false,"metricsHTTPServerEnabled":false,"configTracingPath":"/etc/tracing","configFeaturesPath":"/etc/features","waitStartupSeconds":8,"ingressPort":8080,"ingressTLSPort":8443,"livenessProbePath":"/healthz","readinessProbePath":"/readyz","httpServerConfigFilePath":"/etc/config/config-kafka-broker-httpserver.properties","publishQuantilesEnabled":false}}
{"@timestamp":"2024-07-30T15:09:51.517Z","@version":"1","message":"Metrics cert paths weren't provided, server will start without TLS","logger_name":"dev.knative.eventing.kafka.broker.core.metrics.Metrics","thread_name":"main","level":"INFO","level_value":20000}
{"@timestamp":"2024-07-30T15:09:51.517Z","@version":"1","message":"Metrics server host wasn't provided, using default value 0.0.0.0","logger_name":"dev.knative.eventing.kafka.broker.core.metrics.Metrics","thread_name":"main","level":"INFO","level_value":20000}
{"@timestamp":"2024-07-30T15:09:51.812Z","@version":"1","message":"Created vertx","logger_name":"dev.knative.eventing.kafka.broker.receiver.main.Main","thread_name":"main","level":"INFO","level_value":20000}
{"@timestamp":"2024-07-30T15:09:51.817Z","@version":"1","message":"Registered message codec","logger_name":"dev.knative.eventing.kafka.broker.receiver.main.Main","thread_name":"main","level":"INFO","level_value":20000}
{"@timestamp":"2024-07-30T15:09:51.817Z","@version":"1","message":"about to read server properties from file /etc/config/config-kafka-broker-httpserver.properties","logger_name":"dev.knative.eventing.kafka.broker.receiver.main.Main","thread_name":"main","level":"INFO","level_value":20000}
{"@timestamp":"2024-07-30T15:09:51.817Z","@version":"1","message":"reading properties file","logger_name":"dev.knative.eventing.kafka.broker.core.utils.Configurations","thread_name":"main","level":"INFO","level_value":20000}
{"@timestamp":"2024-07-30T15:09:51.817Z","@version":"1","message":"about to load properties from file","logger_name":"dev.knative.eventing.kafka.broker.core.utils.Configurations","thread_name":"main","level":"INFO","level_value":20000}
{"@timestamp":"2024-07-30T15:09:51.82Z","@version":"1","message":"loaded properties from file","logger_name":"dev.knative.eventing.kafka.broker.core.utils.Configurations","thread_name":"main","level":"INFO","level_value":20000}
{"@timestamp":"2024-07-30T15:09:51.82Z","@version":"1","message":"converting properties to JsonObject","logger_name":"dev.knative.eventing.kafka.broker.core.utils.Configurations","thread_name":"main","level":"INFO","level_value":20000}
{"@timestamp":"2024-07-30T15:09:51.823Z","@version":"1","message":"Read http server properties from file /etc/config/config-kafka-broker-httpserver.properties","logger_name":"dev.knative.eventing.kafka.broker.receiver.main.Main","thread_name":"main","level":"INFO","level_value":20000}
{"@timestamp":"2024-07-30T15:09:51.823Z","@version":"1","message":"created http server options","logger_name":"dev.knative.eventing.kafka.broker.receiver.main.Main","thread_name":"main","level":"INFO","level_value":20000}
{"@timestamp":"2024-07-30T15:09:51.823Z","@version":"1","message":"reading properties file","logger_name":"dev.knative.eventing.kafka.broker.core.utils.Configurations","thread_name":"main","level":"INFO","level_value":20000}
{"@timestamp":"2024-07-30T15:09:51.823Z","@version":"1","message":"about to load properties from file","logger_name":"dev.knative.eventing.kafka.broker.core.utils.Configurations","thread_name":"main","level":"INFO","level_value":20000}
{"@timestamp":"2024-07-30T15:09:51.823Z","@version":"1","message":"loaded properties from file","logger_name":"dev.knative.eventing.kafka.broker.core.utils.Configurations","thread_name":"main","level":"INFO","level_value":20000}
{"@timestamp":"2024-07-30T15:09:51.823Z","@version":"1","message":"converting properties to JsonObject","logger_name":"dev.knative.eventing.kafka.broker.core.utils.Configurations","thread_name":"main","level":"INFO","level_value":20000}
{"@timestamp":"2024-07-30T15:09:51.823Z","@version":"1","message":"Read https server properties from file /etc/config/config-kafka-broker-httpserver.properties","logger_name":"dev.knative.eventing.kafka.broker.receiver.main.Main","thread_name":"main","level":"INFO","level_value":20000}
{"@timestamp":"2024-07-30T15:09:51.823Z","@version":"1","message":"created https server options","logger_name":"dev.knative.eventing.kafka.broker.receiver.main.Main","thread_name":"main","level":"INFO","level_value":20000}
{"@timestamp":"2024-07-30T15:09:51.823Z","@version":"1","message":"building OIDC config","logger_name":"dev.knative.eventing.kafka.broker.receiver.main.Main","thread_name":"main","level":"INFO","level_value":20000}

And then it stays there until the livenessProbes terminate it.

@treyhyde
Copy link

well, I take that back, I extended the liveness probes, and your versions seems to have a bit of a different behavior unless I messed up the liveness probe hacks before...

If I wait long enough, it fails... (as expected, sort of)

{"@timestamp":"2024-07-30T15:16:37.053Z","@version":"1","message":"Could not load OIDC configuration. This will lead to problems, when the authentication-oidc flag will be enabled later","logger_name":"dev.knative.eventing.kafka.broker.receiver.main.Main","thread_name":"main","level":"WARN","level_value":30000}

and then goes on to actually start up and go ready. I didn't see this happen on 1.15.0, but again, I could have messed up the probes and not properly extended the timing enough.

@Cali0707
Copy link
Member

Cali0707 commented Jul 30, 2024

Thanks @treyhyde !

I added one more log that we were missing (specifically, the exception from failing to load the OIDC configuration). Would you mind grabbing the logs one more time? The updated image should be on the same tag as before.

In the meantime, I'll open a PR so that we don't even try to load the OIDC config if it is not enabled. But, this is not a root cause fix for EKS so I would still appreciate it if you could share the new logs and we can figure out what's causing it to fail there!

@treyhyde
Copy link

{"@timestamp":"2024-07-30T17:53:34.856Z","@version":"1","message":"Could not load OIDC configuration. This will lead to problems, when the authentication-oidc flag will be enabled later","logger_name":"dev.knative.eventing.kafka.broker.receiver.main.Main","thread_name":"main","level":"WARN","level_value":30000,"stack_trace":"java.util.concurrent.ExecutionException: io.netty.channel.ConnectTimeoutException: connection timed out after 60000 ms: ip-172-16-187-207.us-west-2.compute.internal/172.16.187.207:443\n\tat java.base/java.util.concurrent.CompletableFuture.reportGet(Unknown Source)\n\tat java.base/java.util.concurrent.CompletableFuture.get(Unknown Source)\n\tat dev.knative.eventing.kafka.broker.receiver.main.Main.start(Main.java:135)\n\tat dev.knative.eventing.kafka.broker.receiverloom.Main.main(Main.java:23)\nCaused by: io.netty.channel.ConnectTimeoutException: connection timed out after 60000 ms: ip-172-16-187-207.us-west-2.compute.internal/172.16.187.207:443\n\tat io.netty.channel.nio.AbstractNioChannel$AbstractNioUnsafe$1.run(AbstractNioChannel.java:263)\n\tat io.netty.util.concurrent.PromiseTask.runTask(PromiseTask.java:98)\n\tat io.netty.util.concurrent.ScheduledFutureTask.run(ScheduledFutureTask.java:153)\n\tat io.netty.util.concurrent.AbstractEventExecutor.runTask(AbstractEventExecutor.java:173)\n\tat io.netty.util.concurrent.AbstractEventExecutor.safeExecute(AbstractEventExecutor.java:166)\n\tat io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:470)\n\tat io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:569)\n\tat io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:997)\n\tat io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74)\n\tat io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)\n\tat java.base/java.lang.Thread.run(Unknown Source)\n"}
{"@timestamp":"2024-07-30T17:53:39.727Z","@version":"1","message":"built kubeclient","logger_name":"dev.knative.eventing.kafka.broker.receiver.main.Main","thread_name":"main","level":"INFO","level_value":20000}

I'm honestly not sure where that endpoint is, it's not the associated OIDC endpoint for sure. It's not any kube svc. It's not in the pod CIDR or the VPC cidr.

@treyhyde
Copy link

BTW, thanks @Cali0707 for the attention here. I am interested in OIDC rollout but I agree, the best first fix is to only load that config conditionally. IMO, that unblocks us to upgrade off of 1.13.x. We'd be very excited to get a 1.14.x and/or 1.15.x point release that includes your new PR.

@Cali0707
Copy link
Member

Cali0707 commented Jul 31, 2024

I'm honestly not sure where that endpoint is, it's not the associated OIDC endpoint for sure. It's not any kube svc. It's not in the pod CIDR or the VPC cidr.

@treyhyde can you try to curl https://kubernetes.default.svc/.well-known/openid-configuration in your cluster? You may need a Bearer Token that is valid within your cluster...

My guess is that that endpoint is set in the jwks_uri property in the response here

@Cali0707
Copy link
Member

IMO, that unblocks us to upgrade off of 1.13.x. We'd be very excited to get a 1.14.x and/or 1.15.x point release that includes your new PR.

Just a note here, we normally only support upgrades like 1.13.x -> 1.14.y -> 1.15.z, but that is especially the case for these releases as there are various data plane migrations that need to occur correctly :)

@treyhyde
Copy link

curl --cacert ca.crt --header "Authorization: Bearer ${TOKEN}" -X GET https://kubernetes.default.svc/.well-known/openid-configuration
{"issuer":"https://oidc.eks.us-west-2.amazonaws.com/id/***REDACTED***","jwks_uri":"https://ip-172-16-99-191.us-west-2.compute.internal:443/openid/v1/jwks","response_types_supported":["id_token"],"subject_types_supported":["public"],"id_token_signing_alg_values_supported":["RS256"]}

OK, there you go, it's the JWKS url

@treyhyde
Copy link

aws/containers-roadmap#2234 seems relevant

@Cali0707
Copy link
Member

@treyhyde would you mind checking that in your cluster curling https://oidc.eks.eu-west-1.amazonaws.com/id/<cluster-id>/.well-known/openid-configuration returns a reachable uri in the jwks_uri field? If so, I think we will be able to move forwards with making this discovery uri configurable in knative

@treyhyde
Copy link

treyhyde commented Jul 31, 2024

curl -v https://oidc.eks.us-west-2.amazonaws.com/id/**REDACTED***/.well-known/openid-configuration

gives me

{"issuer":"https://oidc.eks.us-west-2.amazonaws.com/id/**REDACTED***","jwks_uri":"https://oidc.eks.us-west-2.amazonaws.com/id/***REDACTED***/keys","authorization_endpoint":"urn:kubernetes:programmatic_authorization","response_types_supported":["id_token"],"subject_types_supported":["public"],"claims_supported":["sub","iss"],"id_token_signing_alg_values_supported":["RS256"]}

curl -v https://oidc.eks.us-west-2.amazonaws.com/id/***REDACTEDCLUSTERID***/keys

does indeed appear to be a jwks endpoint

{"keys":[{"kty":"RSA","kid":"**REDACTED***","use":"sig","alg":"RS256","n":"***REDATED***" ...

@treyhyde
Copy link

treyhyde commented Jul 31, 2024

BTW, if I (generously) extend the liveness probes, I can confirm that 1.14.8 also eventually goes "ready". It just needs to get pst that 60 second timeout on the jwks fetch.

@Cali0707
Copy link
Member

Cali0707 commented Jul 31, 2024

Thanks for the help debugging @treyhyde !

I've opened knative/eventing#8121 to track fixing the root cause of this, and hopefully we can merge #4021 soon which should give us 1.14.9 and 1.15.1 next Tuesday

@treyhyde
Copy link

@Cali0707 glad I could help, thanks for the quick action

@Cali0707
Copy link
Member

Following up here, I think all that's left is making the OIDC discovery url configurable in this repo as well. WDYT @creydr ?

@creydr
Copy link
Contributor

creydr commented Oct 28, 2024

Following up here, I think all that's left is making the OIDC discovery url configurable in this repo as well. WDYT @creydr ?

yes. So doing knative/eventing#8121 for eventing-kafka-broker

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Categorizes issue or PR as related to a bug.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants