Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added support for persistent buffering in receiver/gateway #1342

Open
wants to merge 5 commits into
base: main
Choose a base branch
from

Conversation

sbylica-splunk
Copy link
Collaborator

POC - Added support for persistent buffering in receiver/gateway

@sbylica-splunk sbylica-splunk requested review from a team as code owners July 11, 2024 08:07
@sbylica-splunk
Copy link
Collaborator Author

setfacl -n -Rm d:m::rx,m::rx,d:g:{{ $clusterReceiver.securityContext.runAsGroup | default 999 }}:rx,g:{{ $clusterReceiver.securityContext.runAsGroup | default 999 }}:rx {{ .Values.splunkPlatform.sendingQueue.persistentQueue.storagePath }}/receiver;
{{- end }}']
securityContext:
runAsUser: 0
Copy link
Contributor

@jvoravong jvoravong Jul 16, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: We probably need to document this runAsUser 0 and above used permissions in https://github.com/signalfx/splunk-otel-collector-chart/blob/main/docs/advanced-configuration.md#data-persistence as well as customer documentation. This would be done for the agent, cluster receiver, and gateway. A number of customers request information like this.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd suggest adding documentation primarily in the Advanced Configuration: Data Persistence section. Then, include a brief note with a hyperlink in the Running the Container in Non-Root User Mode section that points to the data persistence section.

@@ -104,6 +104,31 @@ spec:
mountPath: /splunk-messages
- mountPath: /conf
name: collector-configmap
- name: patch-log-dirs
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: We might be outgrowing "log" functionality here for patch-log-dirs. If it works for now I'm fine with it. It might be time to refactor the names and images a little to make them more abstract to better address these new use cases.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, something like patch-dirs would work for all configurations I guess?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah I'd go with patch-dirs.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does persistent buffering require the .Values.logsCollection.checkpointPath directory for checkpoints? I don't think it does. I recommend removing the code that sets up the logsCollection checkpoint directory for the cluster receiver and gateway to avoid potential race conditions. Since the agent is deployed to all nodes and the cluster receiver and gateway are only deployed to some, they may attempt to set permissions on the same directory simultaneously.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To move forward with this

  • I would just remove lines 111-113 for now so the agent and cluster receiver are not sharing on disk resources.
  • Make sure to only include any of this new cluster receiver code if .Values.splunkPlatform.sendingQueue.persistentQueue.enabled=true

@jvoravong
Copy link
Contributor

@@ -16,6 +15,12 @@ extensions:
observe_nodes: true
{{- end }}

{{- if .Values.splunkPlatform.sendingQueue.persistentQueue.enabled }}
file_storage/persistent_queue_receiver:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit

Suggested change
file_storage/persistent_queue_receiver:
file_storage/persistent_queue_cluster_receiver:

- k8s_observer
{{- end }}
{{- if .Values.splunkPlatform.sendingQueue.persistentQueue.enabled }}
- file_storage/persistent_queue_receiver
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit

Suggested change
- file_storage/persistent_queue_receiver
- file_storage/persistent_queue_cluster_receiver

@@ -62,6 +62,24 @@ spec:
securityContext:
{{- include "splunk-otel-collector.securityContext" (dict "isWindows" .Values.isWindows "securityContext" $gateway.securityContext) | nindent 8 }}
{{- end }}
initContainers:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We ill only want to include the initContainers if persistent queue is enabled

      {{- if .Values.splunkPlatform.sendingQueue.persistentQueue.enabled }}
      initContainers:
...
      {{- end }}

@jvoravong
Copy link
Contributor

Can you also update the enable-persistence-queue example to include gateway.enabled=true. This way the agent, cluster receiver, and gatway will be enabled in the example. This will help us track these changes over time.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants