Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

(Fix) Updating helm charts to support version 3 windows integration #754

Open
wants to merge 6 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions charts/newrelic-infrastructure/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -161,13 +161,15 @@ integrations that you have configured.
| kubelet.config.retries | int | `3` | Number of retries after timeout expired |
| kubelet.config.scraperMaxReruns | int | `4` | Max number of scraper rerun when scraper runtime error happens |
| kubelet.config.timeout | string | `"10s"` | Timeout for the kubelet APIs contacted by the integration |
| kubelet.enableWindows | bool | `false` | Enable Windows monitoring and update `windowsOsList` with the list of Windows versions present in the cluster. |
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

make sure this readme is auto-generated by helm-docs, otherwise, it will be overwritten

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We are adding the V2 image under the Kubelet component. However, the V2 is a bundle of kubelet+KSM+controlPlane component

| kubelet.enabled | bool | `true` | Enable kubelet monitoring. Advanced users only. Setting this to `false` is not supported and will break the New Relic experience. |
| kubelet.extraEnv | list | `[]` | Add user environment variables to the agent |
| kubelet.extraEnvFrom | list | `[]` | Add user environment from configMaps or secrets as variables to the agent |
| kubelet.extraVolumeMounts | list | `[]` | Defines where to mount volumes specified with `extraVolumes` |
| kubelet.extraVolumes | list | `[]` | Volumes to mount in the containers |
| kubelet.hostNetwork | bool | Not set | Sets pod's hostNetwork. When set bypasses global/common variable |
| kubelet.tolerations | list | Schedules in all tainted nodes | Tolerations for the control plane DaemonSet. |
| kubelet.windowsOsList | list | `[]` | `windowsOsList` is only required if `enableWindows` is true. It contains list of Windows versions present in the cluster. Our Kubernetes integration supported Windows versions LTSC 2019 (1809), 20H2, and LTSC 2022 |
Copy link
Member

@paologallinaharbur paologallinaharbur Jun 5, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"Our Kubernetes integration supported Windows versions"

I would always specify that it is alpha

| labels | object | `{}` | Additional labels for chart objects. Can be configured also with `global.labels` |
| licenseKey | string | `""` | This set this license key to use. Can be configured also with `global.licenseKey` |
| lowDataMode | bool | `false` (See [Low data mode](README.md#low-data-mode)) | Send less data by incrementing the interval from `15s` (the default when `lowDataMode` is `false` or `nil`) to `30s`. Non-nil values of `common.config.interval` will override this value. |
Expand Down
4 changes: 0 additions & 4 deletions charts/newrelic-infrastructure/templates/NOTES.txt
Original file line number Diff line number Diff line change
Expand Up @@ -110,10 +110,6 @@ future. Please migrate your agent config to the new format in the `common.agentC
{{ $errors = printf "%s\n\n%s" $errors (include "newrelic.compatibility.message.image" . ) }}
{{- end }}

{{- if .Values.enableWindows }}
{{ $errors = printf "%s\n\n%s" $errors (include "newrelic.compatibility.message.windows" . ) }}
{{- end }}

{{- if ( or .Values.controllerManagerEndpointUrl .Values.schedulerEndpointUrl .Values.etcdEndpointUrl .Values.apiServerEndpointUrl )}}
{{ $errors = printf "%s\n\n%s" $errors (include "newrelic.compatibility.message.apiURL" . ) }}
{{- end }}
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -123,13 +123,6 @@ Please configure the API Server port as a part of 'apiServer.autodiscover[].endp
------
{{- end -}}

{{- define "newrelic.compatibility.message.windows" -}}
nri-kubernetes v3 does not support deploying into windows Nodes.
Please use the latest 2.x version of the chart.

------
{{- end -}}

{{- define "newrelic.compatibility.message.etcdSecrets" -}}
Values "etcdTlsSecretName" and "etcdTlsSecretNamespace" are no longer supported, please specify them as a part of the
'etcd' config in the values, for example:
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -18,6 +18,7 @@ rules:
- "services"
- "nodes"
- "namespaces"
- "pods" # Needed for our Windows integration to be able to access KSM on a Linux node.
Copy link
Contributor

@xqi-nr xqi-nr Jun 1, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@RagalahariP From v2 to v3, the pods resource is removed for linux support for some reason as you mentioned during the meeting. The reason may be reducing the security surface.

What I mean during the meeting is if we can adding it back for windows only, that will be better .

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@xqi-nr after you left the meeting we agreed that this comment is a good compromise. @RagalahariP will contact @paologallinaharbur to request his review of this line change since he was involved in the project to upgrade V2 to V3 and may have historical context to validate this change

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hello! Yes, I'll take a look!

In general, we have never added support for Windows due to the lack of a canary environment, tooling, and proper pipelines to build the images.
Consider to specify clearly that this is experimental and might change in the future.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

By the way, answering this particular point, it was a permission not needed anymore and therefore it was dropped.
You can reintroduce it without any issues, however if you want to ask for as few permission as possible you can add this specific line behind a "if windows.enabled=true" instead of a comment

verbs: [ "get", "list", "watch" ]
- nonResourceURLs: ["/metrics"]
verbs: ["get"]
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -184,8 +184,9 @@ spec:
tolerations:
{{- . | nindent 8 }}
{{- end }}
{{- with .Values.ksm.nodeSelector | default (fromYaml (include "newrelic.common.nodeSelector" .)) }}
nodeSelector:
kubernetes.io/os: linux
{{- with .Values.ksm.nodeSelector | default (fromYaml (include "newrelic.common.nodeSelector" .)) }}
{{- toYaml . | nindent 8 }}
{{- end -}}
{{- end }}
Original file line number Diff line number Diff line change
Expand Up @@ -3,6 +3,10 @@
{{- include "newrelic.common.naming.truncateToDNSWithSuffix" (dict "name" (include "nriKubernetes.naming.fullname" .) "suffix" "kubelet") -}}
{{- end -}}

{{- define "nriKubernetes.kubelet.windows.fullname" -}}
{{- include "newrelic.common.naming.truncateToDNSWithSuffix" (dict "name" (include "nriKubernetes.naming.fullname" .) "suffix" "windows") -}}
{{- end -}}

{{- define "nriKubernetes.kubelet.fullname.agent" -}}
{{- include "newrelic.common.naming.truncateToDNSWithSuffix" (dict "name" (include "nriKubernetes.naming.fullname" .) "suffix" "agent-kubelet") -}}
{{- end -}}
Expand Down
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I may not understand the approach. However, this seems a V2 component being merged into the V3 chart.

The image used is newrelic/infrastructure-k8s, the is just one container, and the nri-kubernetes process seems controlled by the agent. Maybe this is the fastest way to get Windows support, but there is a big risk of tech dept

Original file line number Diff line number Diff line change
@@ -0,0 +1,70 @@
{{- if and .Values.kubelet.enabled .Values.kubelet.enableWindows }}
{{- range .Values.kubelet.windowsOsList }}
apiVersion: apps/v1
kind: DaemonSet
metadata:
namespace: {{ $.Release.Namespace }}
name: {{ include "nriKubernetes.kubelet.windows.fullname" $ }}-{{ .version }}
{{- $legacyAnnotation:= fromYaml (include "newrelic.compatibility.annotations" $) -}}
{{- with include "newrelic.compatibility.valueWithFallback" (dict "legacy" $legacyAnnotation "supported" $.Values.kubelet.annotations )}}
annotations: {{ . | nindent 4 }}
{{- end }}
spec:
{{- with $.Values.updateStrategy }}
updateStrategy: {{ toYaml . | nindent 4 }}
{{- end }}
selector:
matchLabels:
{{- include "newrelic.common.labels.selectorLabels" $ | nindent 6 }}
app.kubernetes.io/component: kubelet
template:
metadata:
annotations:
checksum/nri-kubernetes: {{ include (print $.Template.BasePath "/kubelet/scraper-configmap.yaml") $ | sha256sum }}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nri-kubernetes in V2 branch does not read any configuration, that was introduced in V3.

I wonder if you actually compiled the V3 branch and pushed it to the old repo newrelic/infrastructure-k8s, is that the case?

There are some discrepancies:

  • kubelet/scraper-configmap.yaml is V3
  • NRIA_PASSTHROUGH_ENVIRONMENT is V2
  • app.kubernetes.io/component: kubelet is V3
  • having just one container is definitely V2

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how a user could change the config mounted? or set a proxy?

checksum/agent-config: {{ include (print $.Template.BasePath "/kubelet/agent-configmap.yaml") $ | sha256sum }}
{{- if include "newrelic.common.license.secret" $ }}{{- /* If the is secret to template */}}
checksum/license-secret: {{ include (print $.Template.BasePath "/secret.yaml") $ | sha256sum }}
{{- end }}
labels:
{{- include "nriKubernetes.labels.podLabels" $ | nindent 8 }}
app.kubernetes.io/component: kubelet
spec:
{{- with include "newrelic.common.dnsConfig" $ }}
dnsConfig:
{{- . | nindent 8 }}
{{- end }}
serviceAccountName: {{ include "newrelic.common.serviceAccount.name" $ }}
containers:
- name: agent
image: newrelic/infrastructure-k8s:{{ .imageTag }}
Copy link
Member

@paologallinaharbur paologallinaharbur Jun 5, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a V2 image, why not installing the nri-bundle-v2 next to the V3 then? In this way we are mixing them up in the same chart!

imagePullPolicy: {{ $.Values.images.integration.pullPolicy }}
env:
- name: NRIA_LICENSE_KEY
valueFrom:
secretKeyRef:
name: {{ include "newrelic.common.license.secretName" $ }}
key: {{ include "newrelic.common.license.secretKeyName" $ }}
- name: "NRI_KUBERNETES_CLUSTERNAME"
value: {{ include "newrelic.common.cluster" $ }}
- name: "NRI_KUBERNETES_VERBOSE"
value: {{ include "newrelic.common.verboseLog.valueAsBoolean" $ | quote }}
- name: "NRK8S_NODE_NAME"
valueFrom:
fieldRef:
apiVersion: "v1"
fieldPath: "spec.nodeName"
- name: "CLUSTER_NAME"
value: {{ include "newrelic.common.cluster" $ }}
- name: "NRIA_PASSTHROUGH_ENVIRONMENT"
value: "KUBERNETES_SERVICE_HOST,KUBERNETES_SERVICE_PORT,CLUSTER_NAME,CADVISOR_PORT,NRK8S_NODE_NAME,KUBE_STATE_METRICS_URL,ETCD_TLS_SECRET_NAME,ETCD_TLS_SECRET_NAMESPACE,API_SERVER_SECURE_PORT,NRIA_CACHE_PATH,TIMEOUT,DISCOVERY_CACHE_TTL"
nodeSelector:
node.kubernetes.io/windows-build: {{ .buildNumber }}
kubernetes.io/os: windows
tolerations:
- key: "node.kubernetes.io/os"
operator: "Equal"
value: "windows"
effect: "NoSchedule"
---
{{- end }}
{{- end }}
Original file line number Diff line number Diff line change
Expand Up @@ -251,8 +251,9 @@ spec:
tolerations:
{{- . | nindent 8 }}
{{- end }}
{{- with .Values.kubelet.nodeSelector | default (fromYaml (include "newrelic.common.nodeSelector" .)) }}
nodeSelector:
kubernetes.io/os: linux
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

are we sure that kubernetes.io/os: linux is present on all user nodes?

{{- with .Values.kubelet.nodeSelector | default (fromYaml (include "newrelic.common.nodeSelector" .)) }}
{{- toYaml . | nindent 8 }}
{{- end -}}
{{- end }}
12 changes: 12 additions & 0 deletions charts/newrelic-infrastructure/tests/nodeSelectors_test.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -25,11 +25,13 @@ tests:
path: spec.template.spec.nodeSelector
value:
disktype: ssd
kubernetes.io/os: linux
template: templates/ksm/deployment.yaml
- equal:
path: spec.template.spec.nodeSelector
value:
disktype: ssd
kubernetes.io/os: linux
template: templates/kubelet/daemonset.yaml
- equal:
path: spec.template.spec.nodeSelector
Expand All @@ -48,11 +50,13 @@ tests:
path: spec.template.spec.nodeSelector
value:
disktype: ssd
kubernetes.io/os: linux
template: templates/ksm/deployment.yaml
- equal:
path: spec.template.spec.nodeSelector
value:
disktype: ssd
kubernetes.io/os: linux
template: templates/kubelet/daemonset.yaml
- equal:
path: spec.template.spec.nodeSelector
Expand All @@ -74,11 +78,13 @@ tests:
path: spec.template.spec.nodeSelector
value:
disktype: real
kubernetes.io/os: linux
template: templates/ksm/deployment.yaml
- equal:
path: spec.template.spec.nodeSelector
value:
disktype: real
kubernetes.io/os: linux
template: templates/kubelet/daemonset.yaml
- equal:
path: spec.template.spec.nodeSelector
Expand All @@ -100,11 +106,13 @@ tests:
path: spec.template.spec.nodeSelector
value:
disktype: real
kubernetes.io/os: linux
template: templates/ksm/deployment.yaml
- equal:
path: spec.template.spec.nodeSelector
value:
disktype: ssd
kubernetes.io/os: linux
template: templates/kubelet/daemonset.yaml
- equal:
path: spec.template.spec.nodeSelector
Expand All @@ -126,11 +134,13 @@ tests:
path: spec.template.spec.nodeSelector
value:
disktype: ssd
kubernetes.io/os: linux
template: templates/ksm/deployment.yaml
- equal:
path: spec.template.spec.nodeSelector
value:
disktype: real
kubernetes.io/os: linux
template: templates/kubelet/daemonset.yaml
- equal:
path: spec.template.spec.nodeSelector
Expand All @@ -152,11 +162,13 @@ tests:
path: spec.template.spec.nodeSelector
value:
disktype: ssd
kubernetes.io/os: linux
template: templates/ksm/deployment.yaml
- equal:
path: spec.template.spec.nodeSelector
value:
disktype: ssd
kubernetes.io/os: linux
template: templates/kubelet/daemonset.yaml
- equal:
path: spec.template.spec.nodeSelector
Expand Down
14 changes: 14 additions & 0 deletions charts/newrelic-infrastructure/values.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -83,6 +83,20 @@ kubelet:
# -- Enable kubelet monitoring.
# Advanced users only. Setting this to `false` is not supported and will break the New Relic experience.
enabled: true
# -- Enable Windows monitoring and update `windowsOsList` with the list of Windows versions present in the cluster.
enableWindows: false
# -- `windowsOsList` is only required if `enableWindows` is true. It contains list of Windows versions present in the cluster.
# Our Kubernetes integration supported Windows versions LTSC 2019 (1809), 20H2, and LTSC 2022
windowsOsList: []
# - version: 2019
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is it intended for those to be commented out?

Copy link
Author

@RagalahariP RagalahariP May 22, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@nrepai Yes, WindowsOsList is required if enableWindows is set to True.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@nrepai I have updated comment.

# imageTag: 2-windows-1809-alpha
# buildNumber: 10.0.17763
# - version: 2022
# imageTag: 2-windows-ltsc2022-alpha
# buildNumber: 10.0.20348
# - version: 20h2
# imageTag: 2-windows-20H2-alpha
# buildNumber: 10.0.19042
annotations: {}
# -- Tolerations for the control plane DaemonSet.
# @default -- Schedules in all tainted nodes
Expand Down