Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unexpectedly high resource usage #22

Closed
hsharrison opened this issue May 27, 2022 · 13 comments
Closed

Unexpectedly high resource usage #22

hsharrison opened this issue May 27, 2022 · 13 comments
Labels
bug Something isn't working stale

Comments

@hsharrison
Copy link

What happened?

I noticed the status of the provider became unhealthy.
Autoscaling kicked in and added a node which solved the issue temporarily.
Checking the GKE console I saw that CPU and memory were increasing.

image

Nothing in the logs.

How can we reproduce it?

Could be related to using ArgoCD v2.4.0-rc2+cd5, I wanted to try the argocd-k8s-auth feature with GCP (which works fine, a nice solution to #13).

Provider

apiVersion: pkg.crossplane.io/v1
kind: Provider
metadata:
  name: provider-argocd
spec:
  package: crossplane/provider-argocd:v0.1.0

ProviderConfig

apiVersion: argocd.crossplane.io/v1alpha1
kind: ProviderConfig
metadata:
  name: argocd-provider
spec:
  serverAddr: REDACTED
  insecure: false
  plainText: false
  credentials:
    source: Secret
    secretRef:
      namespace: argocd
      name: argocd-credentials
      key: authToken

Cluster (part of a Composition)

  - name: argocd-cluster
    base:
      apiVersion: cluster.argocd.crossplane.io/v1alpha1
      kind: Cluster
      spec:
        providerConfigRef:
          name: argocd-provider
        forProvider:
          config:
            tlsClientConfig:
              insecure: false
              caDataSecretRef:
                key: clusterCA
            execProviderConfig:
              apiVersion: client.authentication.k8s.io/v1beta1
              command: argocd-k8s-auth
              args:
                - gcp

    patches:
    - fromFieldPath: spec.id
      toFieldPath: metadata.name
    - fromFieldPath: spec.id
      toFieldPath: spec.forProvider.name
    - fromFieldPath: spec.deletionPolicy
      toFieldPath: spec.deletionPolicy
    - fromFieldPath: status.endpoint
      toFieldPath: spec.forProvider.server
      policy:
        fromFieldPath: Required
      transforms:
      - type: string
        string:
          fmt: "https://%s"
    - fromFieldPath: metadata.uid
      toFieldPath: spec.forProvider.config.tlsClientConfig.caDataSecretRef.name
      transforms:
      - type: string
        string:
          fmt: "%s-gkecluster"
    - fromFieldPath: spec.claimRef.namespace
      toFieldPath: spec.forProvider.config.tlsClientConfig.caDataSecretRef.namespace

    readinessChecks:
      - type: None

There are only two ProviderConfigUsages, two clusters.
One is for the in-cluster so it is not acutally used.
The other is working fine.

What environment did it happen in?

  • Crossplane version: 1.7.0
  • Crossplane Provider argocd version: 0.1.0
  • Kubernetes version (use kubectl version)
Client Version: version.Info{Major:"1", Minor:"23", GitVersion:"v1.23.3", GitCommit:"816c97ab8cff8a1c72eccca1026f7820e93e0d25", GitTreeState:"clean", BuildDate:"2022-01-25T21:25:17Z", GoVersion:"go1.17.6", Compiler:"gc", Platform:"linux/amd64"}
Server Version: version.Info{Major:"1", Minor:"23", GitVersion:"v1.23.5-gke.2400", GitCommit:"edad26ea7e78d44536b547193f30209b03e954c9", GitTreeState:"clean", BuildDate:"2022-04-15T09:31:56Z", GoVersion:"go1.17.8b7", Compiler:"gc", Platform:"linux/amd64"}
  • Kubernetes distribution (e.g. Tectonic, GKE, OpenShift): GKE
@hsharrison hsharrison added the bug Something isn't working label May 27, 2022
@hsharrison
Copy link
Author

Already rising on the new pod
image

@MisterMX
Copy link
Collaborator

Is still problem still occurring and if yes, can you describe which resource exactly gets unhealthy? The Provider or the Pod? And if there are any events, can share them as well? Note that the provider logs are empty unless you run it with --debug.

@hsharrison
Copy link
Author

Still happening. It's from the Pod, although I think before I configured it to restart by Memory limit, the pkg also gets unhealthy.

image

Here are the events, I don't see anything out of the ordinary.

$ kubectl get event | grep argocd                                                                                                                       (playground)
32m         Normal   ApplyRoles                namespace/argocd                                                        Applied RBAC Roles
108s        Normal   BindClusterRole           providerrevision/provider-argocd-23faa80795dc                           Bound system ClusterRole to provider ServiceAccount(s)
25m         Normal   ApplyClusterRoles         providerrevision/provider-argocd-23faa80795dc                           Applied RBAC ClusterRoles
12m         Normal   SyncPackage               providerrevision/provider-argocd-23faa80795dc                           Successfully configured package revision
20m         Normal   InstallPackageRevision    provider/provider-argocd                                                Successfully installed package revision

Maybe one thing atypical that we're doing is creating cluster.cluster.argocd.crossplane.io resources as part of a composition. Maybe there is a memory leak triggered by selecting the composition?
As any composition, it happens often.

Events:
  Type    Reason             Age                      From                                                             Message
  ----    ------             ----                     ----                                                             -------
  Normal  SelectComposition  4m49s (x5869 over 4d1h)  defined/compositeresourcedefinition.apiextensions.crossplane.io  Successfully selected composition

The cluster.cluster.argocd.crossplane.io resources also update their external counterparts often, but I assume that is typical.

Events:
  Type    Reason                   Age                  From             Message
  ----    ------                   ----                 ----             -------
  Normal  UpdatedExternalResource  70s (x247 over 76m)  managed/cluster  Successfully requested update of external resource

@hsharrison
Copy link
Author

The interval at which RAM usage increases is consistently 1 minute, if that helps.

@MisterMX
Copy link
Collaborator

That sounds like a memory leak, maybe because the client connection isn't closed after reconcile.

@github-actions
Copy link

Crossplane does not currently have enough maintainers to address every issue and pull request. This issue has been automatically marked as stale because it has had no activity in the last 90 days. It will be closed in 7 days if no further activity occurs. Leaving a comment starting with /fresh will mark this issue as not stale.

@github-actions github-actions bot added the stale label Dec 19, 2022
@MisterMX
Copy link
Collaborator

/fresh

@github-actions github-actions bot removed the stale label Dec 19, 2022
@github-actions
Copy link

Crossplane does not currently have enough maintainers to address every issue and pull request. This issue has been automatically marked as stale because it has had no activity in the last 90 days. It will be closed in 7 days if no further activity occurs. Leaving a comment starting with /fresh will mark this issue as not stale.

@hsharrison
Copy link
Author

hsharrison commented Mar 28, 2023

/fresh too late?

@github-actions github-actions bot removed the stale label Mar 28, 2023
@MisterMX MisterMX reopened this Mar 29, 2023
@github-actions
Copy link

Crossplane does not currently have enough maintainers to address every issue and pull request. This issue has been automatically marked as stale because it has had no activity in the last 90 days. It will be closed in 7 days if no further activity occurs. Leaving a comment starting with /fresh will mark this issue as not stale.

@github-actions github-actions bot added the stale label Jun 28, 2023
@hsharrison
Copy link
Author

/fresh

@github-actions github-actions bot removed the stale label Jun 28, 2023
@MisterMX
Copy link
Collaborator

Still happening. It's from the Pod, although I think before I configured it to restart by Memory limit, the pkg also gets unhealthy.

Maybe one thing atypical that we're doing is creating cluster.cluster.argocd.crossplane.io resources as part of a composition. Maybe there is a memory leak triggered by selecting the composition? As any composition, it happens often.

Events:
  Type    Reason             Age                      From                                                             Message
  ----    ------             ----                     ----                                                             -------
  Normal  SelectComposition  4m49s (x5869 over 4d1h)  defined/compositeresourcedefinition.apiextensions.crossplane.io  Successfully selected composition

The provider is not directly connected to Crossplane itself so there is no way for it to cause a memory leak.

The cluster.cluster.argocd.crossplane.io resources also update their external counterparts often, but I assume that is typical.

Events:
  Type    Reason                   Age                  From             Message
  ----    ------                   ----                 ----             -------
  Normal  UpdatedExternalResource  70s (x247 over 76m)  managed/cluster  Successfully requested update of external resource

That's actually not typical. It seems like the controller is constantly detecing a diff between the spec and the external resource. This seems to be a bug.

@github-actions
Copy link

Crossplane does not currently have enough maintainers to address every issue and pull request. This issue has been automatically marked as stale because it has had no activity in the last 90 days. It will be closed in 7 days if no further activity occurs. Leaving a comment starting with /fresh will mark this issue as not stale.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working stale
Projects
None yet
Development

No branches or pull requests

2 participants