Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Validate trusted_cluster name #49169

Closed
wants to merge 2 commits into from
Closed

Conversation

bernardjkim
Copy link
Contributor

Closes #48309

Teleport now prevents the creation of a trusted_cluster resource if the name does not match the cluster name. Users will be shown the following error message if they attempt to create a teleport_cluster with an invalid cluster name:

trusted cluster resource name must be the same as the remote cluster name. got: "root", expected: "root.example.com"

changelog: The trusted_cluster resource name must now match the cluster name on creation

Copy link

This pull request is automatically being deployed by Amplify Hosting (learn more).

Access this pull request here: https://pr-49169.d3pp5qlev8mo18.amplifyapp.com

@hugoShaka
Copy link
Contributor

hugoShaka commented Nov 19, 2024

This is a strongly breaking change as we had no such limitation in the past. Most of our users TC resources are created with names not matching the trusted cluster name. I don't think we can pull this one without bumping the trusted cluster resource version. This would cause a lot of toil on both users and customer support.

Have you considered other alternatives that will not break everyone's IaC? Else we can just create a new TC version with the sanity check you added.

Comment on lines -325 to -327
// TODO(klizhentas) in 2.5.0 prohibit adding trusted cluster resource name
// different from cluster name (we had no way of checking this before x509,
// because SSH CA was a public key, not a cert with metadata)
Copy link
Contributor

@hugoShaka hugoShaka Nov 19, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, congrats for addressing a TODO from Sasha in v2 😅

@hugoShaka
Copy link
Contributor

paging @espadolini for the trusted cluster expertise
historically accurate picture of Edoardo when I'm asking for help on Slack

@bernardjkim
Copy link
Contributor Author

Validating the name seemed like the simplest approach to addressing the issue, but breaking existing automation is definitely the big concern. I'll explore some alternate solutions.

@hugoShaka
Copy link
Contributor

but breaking existing automation is definitely the big concern. I'll explore some alternate solutions.

If we introduce a new resource version and run the validation only if tc.version == types.V3 it should not break everyone's automation and allow us to build sane integrations.

I explored storing both the resource name and the remote cluster name in different fields, but this might also pose challenges:

  • inconsistent behaviours when trying to apply resources we already renamed before doing this change
  • might face indexing issues as we try to lookup resources with their name or trusted_cluster_name depending on the context

@espadolini
Copy link
Contributor

Is there any automation that doesn't break horribly when a created resource gets renamed under its nose?

@hugoShaka
Copy link
Contributor

hugoShaka commented Nov 19, 2024

Is there any automation that doesn't break horribly when a created resource gets renamed under its nose?

Sadly, terraform can import back state. Also if you just tctl create -f your whole directory, you might not notice the renaming madness. But yes, any automation not doing "fire and forget" would explode when losing its resource unique identifier. That's why we must fix this before adding operator support.

@espadolini
Copy link
Contributor

espadolini commented Nov 19, 2024

I wouldn't be against just breaking compatibility for this, even without a (meaningless) resource version bump - but we'd only get to do that on a major version change, so it'd have to be v18-only - are we ok with that?

@Tener
Copy link
Contributor

Tener commented Nov 20, 2024

What is the rationale behind the renaming anyway? Maybe we can/should drop that behaviour instead?

@hugoShaka
Copy link
Contributor

hugoShaka commented Nov 20, 2024

even without a (meaningless) resource version bump

I disagree on the version bump being meaningless: resource versions are not only here to tell us how the manifest should be parsed. We also use them to protect users against breaking changes and ensure deterministic behaviour when applying manifests. We've been doing this semi-successfully for roles to avoid breaking existing manifests.

Adding a new resource version costs us close to nothing and makes the breaking change visible to the user. This allows us to backport the new behaviour to any supported Teleport version, and we can still deprecate and remove the old renaming version in v18 or v19 if we want.

What is the rationale behind the renaming anyway? Maybe we can/should drop that behaviour instead?

Naming TC resources after the TC cluster name allows us to lookup the TC resource by:

  • looking up at who signed the certificate
  • doing a single backend lookup to get the TC resource
  • checking if the cert was signed by the trusted cluster's CA.

Not naming the TC resource like the cluster name would require us to iterate over every trusted cluster each time to know if a cert is trusted (or build an in-memory map) and keep track of the cluster name anyway to know which CA might haver signed this.

Also, if we were to remove the renaming logic, users who wrote TC manifests whose name don't match the cluster name will create new TCs when applying again their manifest, instead of editing the current one.

@bernardjkim
Copy link
Contributor Author

Thanks for the additional insight! It sounds like introducing a v3 resource version with name validation is the more reasonable approach. I'll update the PR with the additional changes.

@bernardjkim bernardjkim marked this pull request as draft November 21, 2024 21:47
@bernardjkim
Copy link
Contributor Author

Superseded by #49789

@bernardjkim bernardjkim closed this Dec 4, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

trusted_cluster resource renames itself, breaking subsequent updates
4 participants