Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Revert "Use config-reload container as an init container" #930

Merged
merged 4 commits into from
Oct 11, 2023

Conversation

timja
Copy link
Member

@timja timja commented Oct 11, 2023

Reverts #926

charts/jenkins/Chart.yaml Outdated Show resolved Hide resolved
@dduportal
Copy link
Contributor

dduportal commented Oct 11, 2023

For info: jenkins-infra/kubernetes-management#4528 (comment)

(edit)

  • The init container was stuck trying to emit requests to the JCasc endpoint, but the jenkins container is not up yet (as we are still at the init phase). I'm not sure how this works? Did I miss something obvious?
{"time": "2023-10-11T14:24:01.331635+00:00", "msg": "Retrying (Retry(total=4, connect=9, read=5, redirect=None, status=None)) after connection broken by 'NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7f7e6393d910>: Failed to establish a new connection: [Errno 111] Connection refused')': /reload-configuration-as-code/?casc-reload-token=<redacted>", "level": "WARNING"}

The pod events were helpful to detect we were stuck in the config-reload-init init container:

Events:
  Type     Reason                  Age    From                     Message
  ----     ------                  ----   ----                     -------
  Normal   Scheduled               6m5s   default-scheduler        Successfully assigned jenkins-infra/jenkins-infra-0 to <redacted>
  Normal   SuccessfulAttachVolume  5m53s  attachdetach-controller  AttachVolume.Attach succeeded for volume "<redacted>"
  Warning  FailedMount             4m3s   kubelet                  Unable to attach or mount volumes: unmounted volumes=[jenkins-home], unattached volumes=[jenkins-secrets jenkins-cache tmp-volume sc-config-volume jenkins-home kube-api-access-n7w6q jenkins-config]: timed out waiting for the condition
  Normal   Pulled                  2m23s  kubelet                  Container image "kiwigrid/k8s-sidecar:1.24.4" already present on machine
  Normal   Created                 2m23s  kubelet                  Created container config-reload-init
  Normal   Started                 2m23s  kubelet                  Started container config-reload-init

Warning: spec.template.spec.initContainers[0].env[8].name: duplicate name "METHOD"

@timja timja marked this pull request as ready for review October 11, 2023 14:42
@timja timja requested a review from a team as a code owner October 11, 2023 14:42
@timja timja merged commit 1662ed1 into main Oct 11, 2023
9 checks passed
@timja timja deleted the revert-926-config-reload-init branch October 11, 2023 14:42
@dominykas
Copy link
Contributor

The init container was stuck trying to emit requests to the JCasc endpoint

Can you confirm some details here? It's not entirely clear to me from the logs here. Do you have a custom config for number of retries, maybe?

I will fix the calls to the endpoint, of course, because that's a bug for sure, but I'm trying to understand what was actually happening here. We're using defaults for retries, etc, so it just exited after the default 5 retries, but it was successful and booted up OK.

The METHOD env var duplication is also something I can fix, but it's also not something that should prevent the thing from starting?

Thanks!

dominykas added a commit to dominykas/helm-jenkins that referenced this pull request Oct 12, 2023
dominykas added a commit to dominykas/helm-jenkins that referenced this pull request Oct 12, 2023
Jenkins has not started at this point so there is no URL to call - we only want to `LIST` things.

See jenkinsci#930 (comment)
@dduportal
Copy link
Contributor

The init container was stuck trying to emit requests to the JCasc endpoint

Can you confirm some details here? It's not entirely clear to me from the logs here. Do you have a custom config for number of retries, maybe?

Hi @dominykas , you can find the values we are using in this helmfile value file: https://github.com/jenkins-infra/kubernetes-management/blob/main/config/jenkins_infra.ci.jenkins.io.yaml . We also have secrets values merged with this but it is only the controller.additionalSecrets key.

There was no retry setup as far as I can tell (but I might forget something).

What happened is that we deployed the 4.7.0 version of the jenkins chart in https://github.com/jenkins-infra/kubernetes-management/pull/4516/files and the controller never went back once applied.

As explained in my comment above, the pod was stuck in the init phase with the container "config-init-reload" running. Checking its logs, we saw a bunch of what I put above: errors when trying to connect to the JCasc endpoint at localhost:8080.

After 35m, we gave up and had to rollback

=> does it clarify the issue we had?

I will fix the calls to the endpoint, of course, because that's a bug for sure, but I'm trying to understand what was actually happening here. We're using defaults for retries, etc, so it just exited after the default 5 retries, but it was successful and booted up OK.

Something which is not clear to me is : why does the config-init-reload init container tries to connect to the JCasc endpoint? It makes sense to have the init container preparing the whole configuration set by placing files on the proper location, etc.
But since it is an init container, the JCasc endpoint provided in the jenkins container cannot exist during the whole init container phase.
What did I miss?

@dominykas
Copy link
Contributor

dominykas commented Oct 12, 2023

I think I understand this now. This must have had to do with your METHOD=SLEEP, as you correctly pointed out - there's a warning about the duplicate value. I guess the last value takes precedence, so instead of exiting, the init container went to sleep.

I'm about to submit the fix, although not sure if there's an easy way we can test it before merging?

Something which is not clear to me is : why does the config-init-reload init container tries to connect to the JCasc endpoint?

My oversight. In my local testing I had that disabled, but when I submitted the final PR, I missed that. #931 will fix that.

@dduportal
Copy link
Contributor

I think I understand this now. This must have had to do with your METHOD=SLEEP, as you correctly pointed out - there's a warning about the duplicate value. I guess the last value takes precedence, so instead of exiting, the init container went to sleep.

I'm about to submit the fix, although not sure if there's an easy way we can test it before merging?

Something which is not clear to me is : why does the config-init-reload init container tries to connect to the JCasc endpoint?

My oversight. In my local testing I had that disabled, but when I submitted the final PR, I missed that. #931 will fix that.

oh good, it make sense and maps to what we saw!

Not sure how to easily reproduce it on a local k3s but we don't mind testing the chart with your change before the release though on our production if it helps ? (It is a private controller of the official Jenkins infrastructure). We can do it on a call

@dominykas
Copy link
Contributor

dominykas commented Oct 12, 2023

It would be great, if you have the option, to try out #931. I still want to make some improvements to the code/output, but it won't change the semantics.

Edit: this is now done.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants