Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support routing to Kubernetes clusters by request path #50567

Open
wants to merge 10 commits into
base: master
Choose a base branch
from

Conversation

timothyb89
Copy link
Contributor

@timothyb89 timothyb89 commented Dec 24, 2024

This implements path-based routing for Kubernetes clusters as described by RFD0185. A new prefixed path handler is added that accepts base64-encoded Teleport and Kubernetes cluster names. The request is routed to the destination Teleport cluster using these parameters instead of those embedded in the session TLS identity, and then the preexisting handlers check authorization and complete the request as usual.

This removes the need for certificates to be issued per Kubernetes cluster: so long as the incoming identity is granted access to the cluster via its roles, access can succeed, and no KubernetesCluster attribute or cert usage restrictions are needed.

Fixes #40405


Testing steps

  1. Ensure a Teleport proxy is running this branch with one or more k8s clusters attached. The Kubernetes node does not need to be running this branch, only the proxy.

  2. Build tbot against Machine ID: Support path-based Kubernetes routing #50898

  3. Create a new bot with appropriate permissions to access your kube node(s), e.g. tctl bots add foo --roles=kube-access

  4. Start tbot with the new kubernetes/v2 service:

    $ tbot start kubernetes/v2 --destination=./tbot-user --storage=./tbot-data --token=$token --join-method=token --kubernetes-cluster-name default --proxy-server=example.teleport.sh:443
    

    The --kubernetes-cluster-name flag can be repeated. Alternatively (or additionally), try a label matcher:

    $ tbot start kubernetes/v2 --destination=./tbot-user --storage=./tbot-data --token=$token --join-method=token --kubernetes-cluster-labels foo=bar --proxy-server=example.teleport.sh:443
    
  5. Examine ./tbot-user/kubeconfig.yaml. You should see one cluster: and one context: entry per matched cluster. The server: field should include a new URL suffix, /v1/teleport/<teleportCluster>/<kubeCluster>

  6. Try KUBECONFIG=./tbot-user/kubeconfig.yaml kubectl get pods. Also consider testing kubectl exec, port forward, etc.

This implements path-based routing for Kubernetes clusters as
described by [RFD0185]. A new prefixed path handler is added that
accepts base64-encoded Teleport and Kubernetes cluster names. The
request is routed to the destination Teleport cluster using these
parameters instead of those embedded in the session TLS identity, and
then the preexisting handlers check authorization and complete the
request as usual.

This removes the need for certificates to be issued per Kubernetes
cluster: so long as the incoming identity is granted access to the
cluster via its roles, access can succeed, and no `KubernetesCluster`
attribute or cert usage restrictions are needed.

[RFD0185]: #47436
@timothyb89
Copy link
Contributor Author

timothyb89 commented Dec 24, 2024

Outstanding TODOs:

  • Add manual testing instructions to PR description
  • Unit tests
  • More rigorously validate security impact. Particularly want to ensure per-session MFA guarantees remain intact.
  • PR for tbot changes to test this more easily. Should be a low-impact change, and my current hacked tsh build is not really an acceptable validation environment. PR is available here: Machine ID: Support path-based Kubernetes routing #50898

lib/kube/proxy/forwarder.go Outdated Show resolved Hide resolved
lib/kube/proxy/forwarder.go Outdated Show resolved Hide resolved
@@ -520,7 +525,7 @@ type handlerWithAuthFuncStd func(ctx *authContext, w http.ResponseWriter, r *htt
const accessDeniedMsg = "[00] access denied"

// authenticate function authenticates request
func (f *Forwarder) authenticate(req *http.Request) (*authContext, error) {
func (f *Forwarder) authenticate(req *http.Request, params httprouter.Params, routeSourcer routeSourcer) (*authContext, error) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't fully understand why you need to pass the params/route sourcer here.

Given that singleCertHandler already defines the

o.Identity.RouteToCluster = teleportCluster
o.Identity.KubernetesCluster = kubeCluster

Do we ned to have a special case for route sourcer? We can always retrieve it from identity

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a good call, it was left over from before I had it modifying (and relying on) the identity directly. I've removed the routeSource mechanism outright and reverted most of my changes to the forwarder itself. I ended up diving deeper into how the identity is passed / trusted between clusters and wrote some notes in a comment.

return "", "", trace.Wrap(err)
}

// TODO: do we care to otherwise validate these results before casting
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah, I think it would be better anw

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not totally sure what other validation steps we might want here - somehow I'd never looked into it before but we seem to impose very few restrictions on resource names. I've added a check to make sure the parameters decode to valid UTF8, at least. Downstream looks to be a map lookup from cached cluster details, so I don't expect significant risk.

assert: func(t *testing.T, restConfig *rest.Config) {
client := pathRoutedKubeClient(t, restConfig, clusterName, "a")
_, err = client.CoreV1().Pods(metav1.NamespaceDefault).List(context.Background(), metav1.ListOptions{})
require.ErrorContains(t, err, "cannot list resource")
Copy link
Contributor Author

@timothyb89 timothyb89 Jan 15, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've noticed a mild behavior change here, I think since this just wasn't possible before. Attempting to e.g. list resources on a cluster denied by roles now yields (kubectl get pods -v8):

I0114 20:55:58.643198   22226 request.go:1212] Response Body: {"kind":"Status","apiVersion":"v1","metadata":{},"status":"Failure","message":"kubernetes cluster \"default\" not found","reason":"Forbidden","code":403}
I0114 20:55:58.643592   22226 helpers.go:246] server response object: [{
  "kind": "Status",
  "apiVersion": "v1",
  "metadata": {},
  "status": "Failure",
  "message": "kubernetes cluster \"default\" not found",
  "reason": "Forbidden",
  "code": 403
}]
Error from server (Forbidden): kubernetes cluster "default" not found

...however, attempting to list a nonexistent cluster returns:

I0114 20:51:57.008342   10127 request.go:1212] Response Body: I0114 20:51:57.008342   10127 request.go:1212] Response Body: {"kind":"Status","apiVersion":"v1","metadata":{},"status":"Failure","message":"Kubernetes cluster \"asdf\" not found","reason":"NotFound","code":404}
E0114 20:51:57.008352   10127 memcache.go:265] couldn't get current server API group list: the server could not find the requested resource
I0114 20:51:57.008357   10127 cached_discovery.go:120] skipped caching discovery info due to the server could not find the requested resource
I0114 20:51:57.008477   10127 helpers.go:246] server response object: [{
  "metadata": {},
  "status": "Failure",
  "message": "the server could not find the requested resource",
  "reason": "NotFound",
  "details": {
    "causes": [
      {
        "reason": "UnexpectedServerResponse",
        "message": "unknown"
      }
    ]
  },
  "code": 404
}]
Error from server (NotFound): the server could not find the requested resource

(and a Kubernetes cluster "$name" not found" is logged on the server)

Previously, you could only ever get certs for a cluster if the name was valid and the user is authorized to see it, and returns a consistent error otherwise before the cert is ever issued. Would we consider this a resource name oracle?

It seems like the two responses probably ought to be identical but changing them might be a mild breaking change to existing behavior (403 vs 404).

Also, don't allow path parameters to override the identity, if the
contains nonempty routing params.
Comment on lines +88 to +99
name: "successful path routing to multiple clusters",
roleSpec: defaultRoleSpec,
assert: func(t *testing.T, restConfig *rest.Config) {
clientB := pathRoutedKubeClient(t, restConfig, clusterName, "b")
_, err = clientB.CoreV1().Pods(metav1.NamespaceDefault).List(context.Background(), metav1.ListOptions{})
require.NoError(t, err)

clientA := pathRoutedKubeClient(t, restConfig, clusterName, "a")
_, err := clientA.CoreV1().Pods(metav1.NamespaceDefault).List(context.Background(), metav1.ListOptions{})
require.NoError(t, err)

},
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test triggers ensureRouteNotOverwritten() errors in a way that implies some state (Identity.KubernetesCluster, at least) is shared between requests. I haven't been able to reproduce this behavior on an actual cluster, so either:

  1. Something about the test environment/clients shares the identity object in a way not representative of reality
  2. The identity is sometimes cached between requests (just not in a way I can trigger), in which case I think this method for passing path parameters around by overwriting identity fields may not be viable.

Will need to spend more time debugging this. 😕

Copy link
Contributor

@tigrato tigrato Jan 16, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I found two issues with the tests:

  • GenTestKubeClientTLSCert already fills all the kubernetes cluster + route to cluster information
  • restConfig was edited by appending the route to cluster into the existing endpoint. This means that the second request to cluster A will have /v1/teleport/<telecluster/<kubeclterB//v1/teleport/<telecluster/<kubeclterA/ as the host. that's why it fails

This means we call the route handler multiple times with different clusters causing the identity to mismatch.

check the fixes here af3a02c

Copy link
Contributor Author

@timothyb89 timothyb89 Jan 16, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, that was definitely it, thanks for finding the issue! I was losing my mind a bit over that and it was just the client config getting reused 🤦

GenTestKubeClientTLSCert accepts "" already for the KubernetesCluster value, and I believe most identities have RouteToCluster configured, so technically I think that was working if a bit ugly. In any case, I'll add some test cases with an explicitly empty RouteToCluster.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Support connecting to multiple Kubernetes clusters using a single X509 cert
2 participants