Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enabling Internal Encryption breaks DomainMappings when using Contour #13659

Closed
KauzClay opened this issue Jan 30, 2023 · 2 comments
Closed

Enabling Internal Encryption breaks DomainMappings when using Contour #13659

KauzClay opened this issue Jan 30, 2023 · 2 comments
Labels
kind/bug Categorizes issue or PR as related to a bug.

Comments

@KauzClay
Copy link
Contributor

KauzClay commented Jan 30, 2023

What version of Knative?

Working off main branch

Using Contour as ingress.

Discovered while trying to add internal encryption e2e tests for net-contour here: #13536

Expected Behavior

When I create a DomainMapping for my Knative Service when Internal Encryption is enabled, I am able to reach the KService successfully.

Actual Behavior

DomainMappings fail to become ready, get stuck in "EndpointsNotReady"

net-contour controller says:

{"severity":"ERROR","timestamp":"2023-01-26T21:30:40.304820765Z","logger":"net-contour-controller","caller":"status/status.go:404","message":"Probing of http://hello.gen-14.hello.clay.tanzu.biz.default.net-contour.invalid failed, IP: 10.24.2.35:8080, ready: false, error: unexpected status code: want 200, got 503 (depth: 0)","commit":"e458d29-dirty","knative.dev/controller":"knative.dev.net-contour.pkg.reconciler.contour.Reconciler","knative.dev/kind":"networking.internal.knative.dev.Ingress","knative.dev/traceid":"724fb06d-90db-49cf-917e-a664dd798cb8","knative.dev/key":"default/hello.clay.tanzu.biz--ep","stacktrace":"knative.dev/networking/pkg/status.(*Prober).processWorkItem\n\tknative.dev/[email protected]/pkg/status/status.go:404\nknative.dev/networking/pkg/status.(*Prober).Start.func1\n\tknative.dev/[email protected]/pkg/status/status.go:289"}

I also see this in envoy logs:

[2023-01-26 21:34:40.776][19][debug][router] [source/common/router/router.cc:1212] [C49017][S13478037937466377759] upstream reset: reset reason: connection failure, transport failure reason: TLS error: 268435703:SSL routines:OPENSSL_internal:WRONG_VERSION_NUMBER
[2023-01-26 21:34:40.776][19][debug][http] [source/common/http/filter_manager.cc:905] [C49017][S13478037937466377759] Sending local reply with details upstream_reset_before_response_started{connection_failure,TLS_error:_268435703:SSL_routines:OPENSSL_internal:WRONG_VERSION_NUMBER}
[2023-01-26 21:34:40.776][19][debug][http] [source/common/http/conn_manager_impl.cc:1551] [C49017][S13478037937466377759] encoding headers via codec (end_stream=false):
':status', '503'
'content-type', 'text/plain'
'content-encoding', 'gzip'
'vary', 'Accept-Encoding'
'date', 'Thu, 26 Jan 2023 21:34:40 GMT'
'server', 'envoy'
[2023-01-26 21:34:40.735][21][debug][conn_handler] [source/server/active_tcp_listener.cc:147] [C49205] new connection from 10.24.2.43:43224
[2023-01-26 21:34:40.735][21][debug][http] [source/common/http/conn_manager_impl.cc:306] [C49205] new stream
[2023-01-26 21:34:40.735][21][debug][http] [source/common/http/conn_manager_impl.cc:930] [C49205][S163296292697216300] request headers complete (end_stream=true):
':authority', 'hello.gen-3.hello.claysreallyverylongtestineee50218ef4390e47e8e913ebbbebaf8.default.net-contour.invalid'
':path', '/healthz'
':method', 'GET'
'user-agent', 'Knative-Ingress-Probe'
'k-network-hash', 'override'
'k-network-probe', 'probe'
'accept-encoding', 'gzip'
...

[2023-01-26 21:34:40.735][21][debug][http] [source/common/http/conn_manager_impl.cc:913] [C49205][S163296292697216300] request end stream
[2023-01-26 21:34:40.735][21][debug][connection] [./source/common/network/connection_impl.h:92] [C49205] current connecting state: false
[2023-01-26 21:34:40.735][21][debug][router] [source/common/router/router.cc:470] [C49205][S163296292697216300] cluster 'default/hello/80/a67dfba3e6' match for URL '/healthz'
[2023-01-26 21:34:40.735][21][debug][router] [source/common/router/router.cc:678] [C49205][S163296292697216300] router decoding headers:
':authority', 'hello.default.svc.cluster.local'
':path', '/healthz'
':method', 'GET'
':scheme', 'http'
'user-agent', 'Knative-Ingress-Probe'
'k-network-probe', 'probe'
'accept-encoding', 'gzip'
'x-forwarded-for', '10.24.2.43'
'x-forwarded-proto', 'http'
'x-envoy-internal', 'true'
'x-request-id', '9d510376-f3c0-4a55-962d-a1f5a9f0ebe4'
'k-network-hash', 'dc12e833d98a355da2775ad80b3ae02658ed076ec3da0d5670b05f377f36f39e'
'x-request-start', 't=1674768880.735'
...
[2023-01-26 21:34:40.736][21][debug][router] [source/common/router/router.cc:1212] [C49205][S163296292697216300] upstream reset: reset reason: connection failure, transport failure reason: TLS error: 268435703:SSL routines:OPENSSL_internal:WRONG_VERSION_NUMBER
[2023-01-26 21:34:40.736][21][debug][pool] [source/common/conn_pool/conn_pool_base.cc:453] invoking idle callbacks - is_draining_for_deletion_=false
[2023-01-26 21:34:40.758][21][debug][router] [source/common/router/router.cc:1796] [C49205][S163296292697216300] performing retry
...
[2023-01-26 21:34:40.759][21][debug][router] [source/common/router/router.cc:1212] [C49205][S163296292697216300] upstream reset: reset reason: connection failure, transport failure reason: TLS error: 268435703:SSL routines:OPENSSL_internal:WRONG_VERSION_NUMBER
...
[2023-01-26 21:34:40.796][21][debug][router] [source/common/router/router.cc:1796] [C49205][S163296292697216300] performing retry
...
[2023-01-26 21:34:40.798][21][debug][router] [source/common/router/router.cc:1212] [C49205][S163296292697216300] upstream reset: reset reason: connection failure, transport failure reason: TLS error: 268435703:SSL routines:OPENSSL_internal:WRONG_VERSION_NUMBER
...
[2023-01-26 21:34:40.798][21][debug][http] [source/common/http/filter_manager.cc:905] [C49205][S163296292697216300] Sending local reply with details upstream_reset_before_response_started{connection_failure,TLS_error:_268435703:SSL_routines:OPENSSL_internal:WRONG_VERSION_NUMBER}
[2023-01-26 21:34:40.798][21][debug][http] [source/common/http/conn_manager_impl.cc:1551] [C49205][S163296292697216300] encoding headers via codec (end_stream=false):
':status', '503'
'content-type', 'text/plain'
'content-encoding', 'gzip'
'vary', 'Accept-Encoding'
'date', 'Thu, 26 Jan 2023 21:34:40 GMT'
'server', 'envoy'

When I try this out with AutoTLS enabled, the domainmappings become ready, but I still get the error when I try hitting the endpoint.

upstream connect error or disconnect/reset before headers. retried and the latest reset reason: connection failure, transport failure reason: TLS error: 268435703:SSL routines:OPENSSL_internal:WRONG_VERSION_NUMBER

Steps to Reproduce the Problem

  1. enable internal encryption in config-network
  2. deploy a simple hello world knative service
  3. set up a clusterdomainclaim for your new domain
  4. create a domainmapping

Analysis

I think the problem is in part due to fact that DomainMappings point you back at the envoy.
If you look at the DAG, you can see all the routes point to a service on port 443.
However, the one for hello goes to 80:
contour-dag-encryption

That service spec looks like this:

apiVersion: v1
kind: Service
metadata:
  ...
  name: hello
  namespace: default
spec:
  clusterIP: None
  clusterIPs:
  - None
  internalTrafficPolicy: Cluster
  ipFamilies:
  - IPv4
  - IPv6
  ipFamilyPolicy: RequireDualStack
  ports:
  - name: http2
    port: 80
    protocol: TCP
    targetPort: 80
  sessionAffinity: None
  type: ClusterIP

Internal encryption was implemented so that ports named http2 are h2c when internal encryption is disabled, h2 when enabled.

This means that the HTTPProxy defines hitting the hello service on port 80 with h2 protocol.

So when Envoy tries to make the call, it uses https (for h2), but hits the http listener on Envoy.

When you put autotls on, there is at least a listener for 443 now, but it doesn't have the route data to deal with the request (since svc.cluster.local domains don't get TLS).

Suggestion

I think one way around this is to use the internal encryption secrets for the ClusterLocal visibility domains when internal encryption is enabled. That way you get a listener on 443 for those domains. Then you'd need to change the svc to also use 443.

I suppose another option is to make the calls from the envoy back to itself not use encryption. But to me, that seems like leaving a hole in the internal encryption path.

@KauzClay KauzClay added the kind/bug Categorizes issue or PR as related to a bug. label Jan 30, 2023
@KauzClay
Copy link
Contributor Author

the idea of looping back on the envoy with domain mappings having problems when encryption is involved reminds me of this issue: #13558

Perhaps the resolutions to that issue and this one are related

@KauzClay
Copy link
Contributor Author

KauzClay commented Feb 1, 2023

closing in favor of knative-extensions/net-contour#862 since this is an issue with contour/net-contour and not serving

@KauzClay KauzClay closed this as completed Feb 1, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Categorizes issue or PR as related to a bug.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant