-
Notifications
You must be signed in to change notification settings - Fork 98
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
AUTH-541: OIDC structured auth config #713
base: master
Are you sure you want to change the base?
Conversation
@liouk: This pull request references AUTH-541 which is a valid jira issue. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
35b2d3d
to
7e8ad90
Compare
7e8ad90
to
31e7cc5
Compare
f066dae
to
c4f822c
Compare
8ca7a87
to
36db406
Compare
@liouk: This pull request references AUTH-541 which is a valid jira issue. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
81af1fc
to
46663cd
Compare
@liouk: This pull request references AUTH-541 which is a valid jira issue. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
3c2dad5
to
9e8b55d
Compare
defer resp.Body.Close() | ||
|
||
body, err := io.ReadAll(resp.Body) | ||
if err != nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Technically we only need to read the body, if the resp.StatusCode
is NOT http.StatusOK
, so I would only do it in the err case.
I never used it, but we might be even fine with http.MethodHEAD? Not sure, we could just use get and read the Body on err.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Technically we only need to read the body, if the
resp.StatusCode
is NOThttp.StatusOK
, so I would only do it in the err case.
I wrote it like this because depending on the status code, sometimes the body can contain useful information about the respective code.
I never used it, but we might be even fine with http.MethodHEAD? Not sure, we could just use get and read the Body on err.
I chose GET because:
- this is the one defined in the discovery endpoint spec: https://openid.net/specs/openid-connect-discovery-1_0.html#ProviderConfigurationRequest
- this is the one that KAS uses to do the same validation, so we're preserving the behavior
HEAD is risky in the sense that it might be blocked on some servers as it's not defined in the spec.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, probably. We should keep GET.
But I don't understand why we should read from resp.Body into a slice of bytes, while not consuming it in most of the cases.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point, fixed; reading body only if we need to put it in an error message.
36820a6
to
bca8702
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/lgtm
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: ibihim, liouk The full list of commands accepted by this bot can be found here.
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
/retest |
bca8702
to
5457a34
Compare
New changes are detected. LGTM label has been removed. |
5457a34
to
a11dfa2
Compare
} | ||
|
||
if !featureGates.Enabled(features.FeatureGateExternalOIDC) { | ||
return nil, nil, nil |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you want to set up a child context with cancellation so that the feature gate accessor that is launched above can terminate?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Don't we want to keep the accessor running so that it can os.Exit if the feature gates change?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You're right, I saw that nothing else was using it and did not make the connection that the comment at the top was saying that this all depends on the exit-on-feature-change behavior.
if err != nil { | ||
return fmt.Errorf("could not marshal auth config into JSON: %v", err) | ||
} | ||
authConfigJSON := strings.TrimSpace(string(encoded)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I know the JSON serializer appends a trailing newline, but what problem was that causing?
|
||
var ( | ||
cfgScheme = runtime.NewScheme() | ||
codecs = serializer.NewCodecFactory(cfgScheme, serializer.EnableStrict) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: The EnableStrict option isn't really doing anything for you since you are only building an encoder from this.
var ( | ||
cfgScheme = runtime.NewScheme() | ||
codecs = serializer.NewCodecFactory(cfgScheme, serializer.EnableStrict) | ||
serializerInfo, _ = runtime.SerializerInfoForMediaType(codecs.SupportedMediaTypes(), runtime.ContentTypeJSON) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: Best to check the second return value and panic on false
in case this should somehow break. That would be a clearer failure than a panic later inside of runtime.Encode
.
return err | ||
} | ||
|
||
encoded, err := runtime.Encode(codecs.EncoderForVersion(serializerInfo.Serializer, apiserverv1beta1.ConfigSchemeGroupVersion), authConfig) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is fine, but I'm also unsure what benefit setting up and using a CodecFactory is providing. We're not converting or defaulting anything and we always want it to use JSON. May as well use https://pkg.go.dev/k8s.io/apimachinery/pkg/util/json#Marshal?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, this might be my fault. I mentioned that I am not sure if using the api machinery is better / more idiomatic than json.Marshal.
@liouk's original solution was based on json.Marshal.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The main benefit was to serialize the data in a more structured way, e.g. without having to define the type meta manually. But it gets more complicated than what I thought, so I will revert back to using json.Marshal.
} | ||
|
||
cm := corev1ac.ConfigMap(targetAuthConfigCMName, managedNamespace).WithData(map[string]string{authConfigDataKey: authConfigJSON}) | ||
if _, err := c.configMaps.ConfigMaps(managedNamespace).Apply(ctx, cm, metav1.ApplyOptions{FieldManager: c.name}); err != nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is another field manager going to be writing to the same configmap? If not, it is probably reasonable to set Force: true
to allow stomping conflicts.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, the CAO must be the only one -- great point.
return fmt.Errorf("auth config validation failed: %v", errList) | ||
} | ||
|
||
cm := corev1ac.ConfigMap(targetAuthConfigCMName, managedNamespace).WithData(map[string]string{authConfigDataKey: authConfigJSON}) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We've been trying to reduce no-op Apply requests by extracting the current apply configuration from the local informer's cache with https://pkg.go.dev/k8s.io/client-go/applyconfigurations/core/v1#ExtractConfigMap and doing a apiequality.Semantic.DeepEqual
between them first. I haven't heard of any issues arising from that approach yet, does it make sense to do it here to avoid making a write on every resync?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's the intention of this check: https://github.com/openshift/cluster-authentication-operator/pull/713/files#diff-3c99f304cc2949488aa2fa2b8aea2d7e8ddb0c8baa42e66f128eacd7cdbda11aR135-R137
if existingCM != nil && existingCM.Data[authConfigDataKey] == authConfigJSON {
return nil
}
Do you think the DeepEqual()
is preferable?
jwt.Issuer.CertificateAuthority = caData | ||
} | ||
|
||
switch provider.ClaimMappings.Username.PrefixPolicy { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What if a new valid option is added in the future? Can there be a period of time during an upgrade from N to N+1 where there is a cluster-authentication-operator at N and a CRD at N+1? I would include a default case to avoid all doubt.
// TODO currently validations from k8s.io/apiserver/pkg/apis/apiserver/validation cannot be used here | ||
// since they aren't defined for the beta type; once the feature goes out of beta, we should replace | ||
// this func with the upstream validations (but keep CA cert validation) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
They're defined for the internal type, would it be easier to convert to internal and use those? This seems to be how it is done for unserved APIs like the apiserver configuration files, I don't see any external-versioned validations there.
Also, what happens if there is a bug in the validation used by cluster-authentication-operator that causes the validation to be overly strict? Even if this changes to use the validations from k8s.io/apiserver, this can be out of sync with whatever a particular kube-apiserver was compiled against. This is a form of client-side validation, which we have been trying to move away from. Revisioned kube-apiserver rollouts should mitigate the risk of writing an invalid config here, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Alright, I see the point in keeping validations at the server-side, especially with revisioned rollouts being in place.
However, the KAS pods do not really validate the CA cert, if specified. If the CA cert is not the correct one, the KAS pods will log an error but will not crash, so the rollout will be completed correctly. Therefore I'm considering keeping the CA cert validation at the CAO side, but dropping the rest. What do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pushed a fixup 20a1f72 that demonstrates what I described above. Placing a hold until this gets squashed or dropped.
/hold
a11dfa2
to
20a1f72
Compare
20a1f72
to
08155bf
Compare
From test result perspective, based on good pre-merge test results in https://issues.redhat.com/browse/OCPBUGS-44592?focusedId=26134688&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-26134688 , adding below label: |
@liouk: This pull request references AUTH-541 which is a valid jira issue. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
/retest |
@liouk: The following tests failed, say
Full PR test history. Your PR dashboard. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here. |
This PR adds a controller behind the
ExternalOIDC
feature gate that tracks the auth CR, and when auth type is configured to be OIDC, it:openshift-config-managed
, where it will be picked up by the KAS-o and synced into a static file and passed on to the KAS podsKAS-o functionality PR: openshift/cluster-kube-apiserver-operator#1760
Enhancement: openshift/enhancements#1632