Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Kubectl.apply fails while GenericKubernetesApi works for ClusterRole #3518

Open
agustinventura opened this issue Jun 26, 2024 · 16 comments
Open

Comments

@agustinventura
Copy link

Describe the bug
When creating a ClusterRole Kubectl.apply returns error while GenericKubernetesApi creates it.

Client Version
18.0.0 - 21.0.0-legacy

Kubernetes Version
1.27 (OpenShift 4.14.16)

Java Version
Java 21

To Reproduce
Given a ClusterRole in clusterrole.yaml file:

apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
  name: test-controller-admin
rules:
- apiGroups:
  - '*'
  resources:
  - '*'
  verbs:
  - '*'
- nonResourceURLs:
  - '*'
  verbs:
  - '*'

And a kubeconfig for an OpenShift 4.14.16 in kubeconfig file, load both files and try to apply ClusterRole:

public void createObjectsInCluster() throws URISyntaxException, IOException, KubectlException {  
  String kubeconfig = new String(Files.readAllBytes(Paths.get(ClassLoader.getSystemResource("kubeconfig").toURI())));  
  String ensManifest = new String(Files.readAllBytes(Paths.get(ClassLoader.getSystemResource("clusterrole.yaml").toURI())));  
  createObjects(kubeconfig, ensManifest);  
}  
  
public void createObjects(String kubeconfig, String manifest) throws IOException, KubectlException {  
  KubeConfig kc = KubeConfig.loadKubeConfig(new StringReader(kubeconfig));  
  ApiClient apiClient = Config.fromConfig(kc);  
  apiClient.setVerifyingSsl(false);  
  
  List<Object> objects = Yaml.loadAll(manifest);  
  for (Object object : objects) {  
    final KubernetesObject kubernetesObject = (KubernetesObject) object;  
    Class<KubernetesObject> objectClass = (Class<KubernetesObject>) kubernetesObject.getClass();  
    Kubectl.apply(objectClass).apiClient(apiClient).resource(kubernetesObject).execute();  
  }  
}

This will return the following error:

io.kubernetes.client.extended.kubectl.exception.KubectlException: io.kubernetes.client.openapi.ApiException: class V1Status {
    apiVersion: v1
    code: 404
    details: class V1StatusDetails {
        causes: null
        group: authorization.openshift.io
        kind: clusterroles
        name: test-controller-admin
        retryAfterSeconds: null
        uid: null
    }
    kind: Status
    message: clusterroles.authorization.openshift.io "test-controller-admin" not found
    metadata: class V1ListMeta {
        _continue: null
        remainingItemCount: null
        resourceVersion: null
        selfLink: null
    }
    reason: NotFound
    status: Failure
}

However, if ClusterRole is created using GenericKubernetesApi it succeeds:

public void createObjectsInCluster() throws URISyntaxException, IOException, KubectlException {  
  String kubeconfig = new String(Files.readAllBytes(Paths.get(ClassLoader.getSystemResource("kubeconfig").toURI())));  
  String ensManifest = new String(Files.readAllBytes(Paths.get(ClassLoader.getSystemResource("clusterrole.yaml").toURI())));  
  createObjects(kubeconfig, ensManifest);  
}  
  
public void createObjects(String kubeconfig, String manifest) throws IOException {  
  KubeConfig kc = KubeConfig.loadKubeConfig(new StringReader(kubeconfig));  
  ApiClient apiClient = Config.fromConfig(kc);  
  apiClient.setVerifyingSsl(false);  
  
  List<Object> objects = Yaml.loadAll(manifest);  
  for (Object object : objects) {  
    final V1ClusterRole clusterRole = (V1ClusterRole) object;  
    GenericKubernetesApi<V1ClusterRole, V1ClusterRoleList> clusterRoleClient =  
        new GenericKubernetesApi<>(V1ClusterRole.class, V1ClusterRoleList.class, "rbac.authorization.k8s.io", "v1", "clusterroles",  
            apiClient);  
    KubernetesApiResponse<V1ClusterRole> createClusterRoleResponse = clusterRoleClient.create(clusterRole);  
    if (createClusterRoleResponse.getStatus() != null && !createClusterRoleResponse.isSuccess()) {  
      if (createClusterRoleResponse.getHttpStatusCode() != 409) {  
        log.error("Error creating k8s object {}: {}", clusterRole, createClusterRoleResponse.getStatus());  
      } else {  
        log.info("k8s object {} already exists", clusterRole);  
      }  
    }  
  }  
}

If applying clusterrole.yaml with kubectl cli it suceeds too:

 kubectl --kubeconfig kubeconfig apply -f clusterrole.yaml

Expected behavior
Kubectl.apply should create ClusterRole.

Server (please complete the following information):

  • OS: Linux
  • Cloud: OpenShift 4.14.16 on Azure

Additional context
This looks like some weird interaction with OpenShift, as we found it upgrading from 4.12.25 to 4.14.16 and didn't have this kind of issues with other platforms such as EKS. The error itself returns group authorization.openshift.io instead of rbac.authorization.k8s.io pointing to some specific OpenShift logic but however it is strange that kubectl cli and the GenericKubernetesApi works and only fails when using kubectl Java client.
We also tried to add the group in the error to ModelMapper:

ModelMapper.addModelMap("authorization.openshift.io", "v1", "ClusterRole", "clusterroles", false, V1ClusterRole.class);

But returns the same error.
It fails too for RoleBinding objects but not for Role or ClusterRoleBinding ones.

@brendandburns
Copy link
Contributor

brendandburns commented Jun 26, 2024

You're getting a 404 on the apply. Does this object exist currently in the cluster? Or are you trying to create it for the first time?

My first guess is that Kubectl.Apply doesn't handle object creation correctly, there's probably special purpose code in kubectl cli to handle that case.

@agustinventura
Copy link
Author

Hi Brendan, it's the first I create it.

@brendandburns
Copy link
Contributor

Also, I see clusterroles.authorization.openshift.io in the 404 error message, but it looks from your YAML like you are trying to create a rbac.authorization.k8s.io/v1 is it possible there is a typo in your YAML somewhere?

@brendandburns
Copy link
Contributor

But returns the same error.
It fails too for RoleBinding objects but not for Role or ClusterRoleBinding ones.

When you say that it works for Role, does it successfully create new Role resources that didn't previously exist?

@brendandburns
Copy link
Contributor

The Apply code is just calling ServerSide apply, so we may need to special case that code if it returns a 404.

@brendandburns
Copy link
Contributor

(sorry for lots of little questions :)

Can you try kubectl apply --server-side ... and see if it works?

@agustinventura
Copy link
Author

No problem Brendan, I've already tried lots of them.
There's no typo in the yaml, I've tried to create a bunch of objects together in one yaml, independent objects in different yamls and even creating the objects in Java instead of reading from a yaml, result has always been the 404 error with this change of rbac.authorization.k8s.io to clusterroles.authorization.openshift.io.
I can create previously non existing Role resources or ClusterRoleBinding ones. I always test with a newly provisioned cluster as it is our current use case.
I noticed last week that Apply is using --server-side, so I tested it and it works, creates the ClusterRole. I didn't mention it because I didn't consider it interesting, sorry for any inconvenience.

@brendandburns
Copy link
Contributor

Ok, thanks for the details, this is odd. But I think I may have figured it out. We only supply the type, not the list type when we get the generic API in KubectlApply and then we guess at the list type via the class loader:

https://github.com/kubernetes-client/java/blob/master/extended/src/main/java/io/kubernetes/client/extended/kubectl/Kubectl.java#L247

My guess is that the class for the openshift ClusterRole is being discovered before the class for the standard ClusterRole and that is confusing things.

If you are willing it would be super useful if you could recompile the library with some logging around that code location to verify that's what's going on, that's great.

If not I can try to add some defensive code around that location that would fix the problem.

@agustinventura
Copy link
Author

Hi Brendan, I'm happy to help anyway I can.
I've added the log but discovered that execution path is not getting into the getGenericApi method at line 246 but in the one at line 269 with following parameters:

apiTypeClass = class io.kubernetes.client.openapi.models.V1ClusterRole
apiListTypeClass = interface io.kubernetes.client.common.KubernetesListObject

The resolved groupVersionResource seems to be the problem, containing:

resource = clusterroles
group = authorization.openshift.io
version = v1

As I can see in ModelMapper's classesByGVR kvMap there's two entries for ClusterRole, one with key rbac.authorization.k8s.io and one with authorization.openshift.io. When vkMap gets build there's only left the authorization.openshift.io one. This happens too with RoleBinding.
Anecdotally, the same happens with ClusterRoleBinding and Role but we are lucky enough that rbac.authorization.k8s.io is the one that gets loaded in vkMap.

Any thoughts on how can I address this? Thanks a lot for your help.

@brendandburns
Copy link
Contributor

Ok, there must be some issue in the map key comparison that is causing a problem. I can try to reproduce this when I get some time, but if you have time to explore and send a PR that'd be great too.

@agustinventura
Copy link
Author

I'd love to send a PR, I'm adding a test to reproduce the bug and will try to fix it.

@k8s-triage-robot
Copy link

The Kubernetes project currently lacks enough contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue as fresh with /remove-lifecycle stale
  • Close this issue with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Oct 2, 2024
@agustinventura
Copy link
Author

Hi, sorry for the delay in getting back to this, but summer has been a bit crazy at work.
I've got the test to reproduce the bug:

@Test
  void addingTwoModelsForSameObjClass() {
    Class objClass = io.kubernetes.client.openapi.models.V1ClusterRole.class;
    Class objListClass = io.kubernetes.client.openapi.models.V1ClusterRoleList.class;
    String kubernetesGroup = "rbac.authorization.k8s.io";
    String openshiftGroup = "authorization.openshift.io";
    String version = "v1";
    String kind = "ClusterRoles";
    String resourceNamePlural = "clusterroles";

    ModelMapper.addModelMap(kubernetesGroup, version, kind, resourceNamePlural,  objClass, objListClass);
    ModelMapper.addModelMap(openshiftGroup, version, kind, resourceNamePlural, objClass, objListClass);
    Set<GroupVersionResource> resolvedGVRs = ModelMapper.getGroupVersionResourceByClass(objClass);

    assertThat(resolvedGVRs.stream().filter(resolvedGVR -> resolvedGVR.getGroup().equals(kubernetesGroup)).count()).isEqualTo(1);
    assertThat(resolvedGVRs.stream().filter(resolvedGVR -> resolvedGVR.getGroup().equals(openshiftGroup)).count()).isEqualTo(1);
  }

The fix itself is quite easy, just modifying the BiDirectionalMap in ModelMapper to take into account we can have many GroupVersionResource for a given object class instead of only one:


    private Map<K, V> kvMap = new ConcurrentHashMap<>();

    private Map<V, Set<K>> vkMap = new ConcurrentHashMap<>();

    void add(K k, V v) {
      kvMap.put(k, v);
      vkMap.computeIfAbsent(v, key -> ConcurrentHashMap.newKeySet()).add(k);
    }

    V getByK(K k) {
      return kvMap.get(k);
    }

    Set<K> getByV(V v) {
      return vkMap.getOrDefault(v, Collections.emptySet());
    }
  }

However, this is a deep change that will impact too getGenericApi in Kubectl line 264 as now we can have many GroupVersionResource for an object.
I think I can go ahead with the changes but as I'm not very familiar with the code base I can't measure the implications of this change.
What do you think?

@agustinventura
Copy link
Author

/remove-lifecycle stale

@k8s-ci-robot k8s-ci-robot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Oct 4, 2024
@brendandburns
Copy link
Contributor

/lifecycle-frozen

@agustinventura apologies for not following up on this sooner. I think that the change you propose is ok. We'll need to validate group/version alongside kind.

@agustinventura
Copy link
Author

Hi Brendan, sorry for the delay in getting back to this.
Yes, I tried that change, but as I suspected it propagated and I found a few problems.
I modified the BiDirectionalMap attributes from this:

private Map<K, V> kvMap = new ConcurrentHashMap<>();

private Map<V, K> vkMap = new ConcurrentHashMap<>();

To this:

private Map<K, V> kvMap = new ConcurrentHashMap<>();

private Map<V, Set<K>> vkMap = new ConcurrentHashMap<>();

This works great and now we have both maps synchronized, being able to get many values for the same key, but also many keys for the same value.
However, this affects the public API, and these methods:

public static GroupVersionKind getGroupVersionKindByClass(Class<?> clazz) {
    return classesByGVK.getByV(clazz);
  }

  public static GroupVersionResource getGroupVersionResourceByClass(Class<?> clazz) {
    return classesByGVR.getByV(clazz);
  }

Now must return a set of GroupVersionKind or GroupVersionResource:

public static Set<GroupVersionKind> getGroupVersionKindByClass(Class<?> clazz) {
    return classesByGVK.getByV(clazz);
  }

  public static Set<GroupVersionResource> getGroupVersionResourceByClass(Class<?> clazz) {
    return classesByGVR.getByV(clazz);
  }

And this, in turn affects the user facing API, for example, in Kubectl line 264 we got getGenericAPI which needs to get the GroupVersionResource to determine which API to access:

GroupVersionResource groupVersionResource =
          ModelMapper.getGroupVersionResourceByClass(apiTypeClass);
      if (groupVersionResource == null) {
        throw new KubectlException("Unexpected unknown resource type: " + apiTypeClass);
      }

This would become:

Set<GroupVersionResource> groupVersionResources =
          ModelMapper.getGroupVersionResourceByClass(apiTypeClass);
      if (groupVersionResources.isEmpty()) {
        throw new KubectlException("Unexpected unknown resource type: " + apiTypeClass);
      }

So, once I get the GroupVersionResource set, how can I determine which one is the sought after? I thought about defaulting this method to Kubernetes' namespace and creating a new one which would allow the user to pass the exact GroupVersionResource, but don't know if this makes much sense.

What do you think?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants