Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Converge log_cache_syslog_tls certificate #961

Conversation

acrmp
Copy link
Member

@acrmp acrmp commented Mar 5, 2022

WHAT is this change about?

Ensuring that operators who had previously used Log Cache syslog ingress continue to see logs following an upgrade to cf-deployment v18.0.0.

What customer problem is being addressed?

  • In split log-cache from doppler, use syslog ingress #949 Log Cache was split out from the doppler instance group to its own log-cache instance group
  • Log Cache was also configured to use syslog ingress by default, rather than the previous behaviour which was to use the Reverse Log Proxy
  • Operators who had previously used the experimental ops-file to opt into syslog ingress (operations/experimental/use-logcache-syslog-ingress.yml) would already have had the log_cache_syslog_tls credential in their CredHub
  • When these operators attempted to upgrade to v18.0.0 the certificate was not re-generated by default, leading to a mismatch between the new service name and the existing certificate
  • Specify update_mode: converge so that the certificate is re-generated and the syslog agent will be able to send logs to the log cache syslog server

Fixes:

failed to write to log-cache.service.cf.internal:6067, retrying in 8.192s, err: x509: certificate is valid for q-s3.doppler.default.cf.bosh, doppler.service.cf.internal, not log-cache.service.cf.internal

Please provide any contextual information.

Has a cf-deployment including this change passed cf-acceptance-tests?

  • YES
  • NO

Does this PR introduce a breaking change?

  • YES - please choose the category from below. Feel free to provide additional details.
  • NO

How should this change be described in cf-deployment release notes?

  • Ensure log_cache_syslog_tls certificate is re-generated to avoid upgrade issues for operators who had previously enabled Log Cache syslog ingress

We should also update the release notes for v18.0.0 to call out that there is this known issue when upgrading from a deployment that had previously enabled Log Cache syslog ingress.

Does this PR introduce a new BOSH release into the base cf-deployment.yml manifest or any ops-files?

  • YES - please specify
  • NO

Does this PR make a change to an experimental or GA'd feature/component?

  • experimental feature/component
  • GA'd feature/component

Please provide Acceptance Criteria for this change?

  • Deploy cf-deployment at v17.1.0 with operations/experimental/use-logcache-syslog-ingress.yml
  • Push an application that generates logs
  • Upgrade to a version containing this change
  • Verify that logs have continued to appear

You can also inspect the credential in CredHub before and after the change. The SANs should include log-cache.service.cf.internal.

$ credhub get -n /bosh-someenv/cf/log_cache_syslog_tls --output-json | jq -r '.value.certificate' | openssl x509 -noout -text | grep -A1 'Subject Alternative Name'
            X509v3 Subject Alternative Name:
                DNS:q-s3.log-cache.default.cf.bosh, DNS:log-cache.service.cf.internal

What is the level of urgency for publishing this change?

  • Urgent - unblocks current or future work
  • Slightly Less than Urgent

Marking this as Urgent because upgrading to v18.0.0 breaks existing Log Cache syslog server users without this change.

Tag your pair, your PM, and/or team!

@Benjamintf1 @ctlong @mkocher @rroberts2222

- In cloudfoundry#949 Log Cache was split out from the doppler instance group to its
  own log-cache instance group
- Log Cache was also configured to use syslog ingress by default, rather
  than the previous behaviour which was to use the Reverse Log Proxy
- Operators who had previously used the experimental ops-file to opt into
  syslog ingress (operations/experimental/use-logcache-syslog-ingress.yml)
  would already have had the `log_cache_syslog_tls` credential in their
  CredHub
- When these operators attempted to upgrade to v18.0.0 the certificate
  was not re-generated by default, leading to a mismatch between the new
  service name and the existing certificate
- Specify `update_mode: converge` so that the certificate is re-generated
  and the syslog agent will be able to send logs to the log cache syslog
  server

Fixes:

```
failed to write to log-cache.service.cf.internal:6067, retrying in 8.192s, err: x509: certificate is valid for q-s3.doppler.default.cf.bosh, doppler.service.cf.internal, not log-cache.service.cf.internal
```
@ctlong ctlong merged commit 83b6090 into cloudfoundry:develop Mar 5, 2022
@acrmp acrmp deleted the pr/regenerate-log-cache-syslog-tls-certificate branch March 5, 2022 01:07
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Development

Successfully merging this pull request may close these issues.

3 participants