This RFC outlines how the Vector will integration with Kubernetes (k8s).
Note: This RFC is retroactive and meant to serve as an audit to complete our
Kubernetes integration. At the time of writing this RFC, Vector has already made
considerable progress on it's Kubernetes integration. It has a kubernetes
source, kubernetes_pod_metadata
transform, an example DaemonSet
file, and the
ability automatically reload configuration when it changes. The fundamental
pieces are mostly in place to complete this integration, but as we approach
the finish line we're being faced with deeper questions that heavily affect the
UX. Such as how to properly deploy Vector and exclude it's own logs (pr#2188).
We had planned to perform a 3rd party audit on the integration before
announcement and we've decided to align this RFC with that process.
- RFC 2221 - 2020-04-04 - Kubernetes Integration
- Table of contents
- Motivation
- Guide-level Proposal
- Design considerations
- Minimal supported Kubernetes version
- Reading container logs
- Helm vs raw YAML files
- Helm Chart Repository
- Deployment Variants
- Deployment configuration
- Annotating events with metadata from Kubernetes
- Origin filtering
- Configuring Vector via Kubernetes API
- Changes to Vector release process
- Testing
- Other data gathering
- Windows support
- Security
- Prior Art
- Sales Pitch
- Drawbacks
- Alternatives
- Outstanding Questions
- Plan Of Attack
Kubernetes is arguably the most popular container orchestration framework at the time of writing this RFC; many large companies, with large production deployments, depend heavily on Kubernetes. Kubernetes handles log collection but does not facilitate shipping. Shipping is meant to be delegated to tools like Vector. This is precisely the use case that Vector was built for. So, motivation is three-fold:
- A Kubernetes integration is essential to achieving Vector's vision of being the dominant, single collector for observability data.
- This will inherently attract large, valuable users to Vector since Kubernetes is generally used with large deployments.
- It is currently the #1 requested feature of Vector.
Note: This guide largely follows the format of our existing guides (example). There are two perspectives to our guides: 1) A new user coming from Google 2) A user that is familiar with Vector. This guide is from perspective 2.
This guide covers integrating Vector with Kubernetes. We'll touch on the basic concepts of deploying Vector into Kubernetes and walk through our recommended strategy. By the end of this guide you'll have a single, lightweight, ultra-fast, and reliable data collector ready to ship your Kubernetes logs and metrics to any destination you please.
Our recommended strategy deploys Vector as a Kubernetes
DaemonSet
. Vector is reading the logs files directly
from the file system, so to collect the logs from all the Pod
s
it has to be deployed on every Node
in your cluster.
The following diagram demonstrates how this works:
- Collect data from each of your Kubernetes Pods
- Ability to filter by container names, Pod IDs, and namespaces.
- Automatically merge logs that Kubernetes splits.
- Enrich your logs with useful Kubernetes context.
- Send your logs to one or more destinations.
-
Configure Vector:
Before we can deploy Vector we must configure. This is done by creating a Kubernetes
ConfigMap
:...insert selector to select any of Vector's sinks...
cat <<-CONFIG > vector.toml # Docs: https://vector.dev/docs/ # Container logs are available from "kubernetes" input. # Send data to one or more sinks! [sinks.aws_s3] type = "aws_s3" inputs = ["kubernetes"] bucket = "my-bucket" compression = "gzip" region = "us-east-1" key_prefix = "date=%F/" CONFIG kubectl create secret generic vector-config --from-file=vector.toml=vector.toml
-
Deploy Vector!
Now that you have your custom
ConfigMap
ready it's time to deploy Vector. Create aNamespace
and apply yourConfigMap
and our recommended deployment configuration into it:kubectl create namespace vector kubectl apply --namespace vector -f vector-configmap.yaml kubectl apply -f https://packages.timber.io/vector/latest/kubernetes/vector-global.yaml kubectl apply --namespace vector -f https://packages.timber.io/vector/latest/kubernetes/vector-namespaced.yaml
That's it!
-
Install
helm
. -
Add our Helm Chart repo.
helm repo add vector https://charts.vector.dev helm repo update
-
Configure Vector.
TODO: address this when we decide on the helm chart internals.
-
Deploy Vector!
kubectl create namespace vector # Helm v3 helm upgrade \ --install \ --namespace vector \ --values vector-values.yaml \ vector \ vector/vector # Helm v2 helm upgrade --install \ --namespace vector \ --values vector-values.yaml \ --name vector \ vector/vector
-
Install
kustomize
. -
Prepare
kustomization.yaml
.Use the same config as in
kubectl
guide.# kustomization.yaml namespace: vector resources: - https://packages.timber.io/vector/latest/kubernetes/vector-global.yaml - https://packages.timber.io/vector/latest/kubernetes/vector-namespaced.yaml - vector-configmap.yaml
-
Deploy Vector!
kustomize build . | kubectl apply -f -
The minimal supported Kubernetes version is the earliest released version of Kubernetes that we intend to support at full capacity.
We use minimal supported Kubernetes version (or MSKV for short), in the following ways:
- to communicate to our users what versions of Kubernetes Vector will work on;
- to run our Kubernetes test suite against Kubernetes clusters starting from this version;
- to track what Kubernetes API feature level we can use when developing Vector code.
We can change MSKV over time, but we have to notify our users accordingly.
There has to be one "root" location where current MSKV for the whole Vector
project is specified, and it should be a single source of truth for all the
decisions that involve MSKV, as well as documentation. A good candidate for
such location is a file at .meta
dir of the Vector repo. .meta/mskv
for
instance.
Kubernetes 1.14 introduced some significant improvements to how logs files are organized, putting more useful metadata into the log file path. This allows us to implement more high-efficient flexible ways to filter what log files we consume, which is important for preventing Vector from consuming logs that it itself produces - which is bad since it can potentially result in an flood-kind DoS.
We can still offer support for Kubernetes 1.13 and earlier, but it will be limiting our high-efficient filtering capabilities significantly. It will also increase maintenance costs and code complexity.
On the other hand, Kubernetes pre-1.14 versions are quite rare these days. At the time of writing, the latest Kubernetes version is 1.18, and, according to the Kubernetes version and version skew support policy, only versions 1.18, 1.17 and 1.16 are currently maintained.
Considering all of the above, we assign 1.14 as the initial MSKV.
Kubernetes does not directly control the logging, as the actual implementation of the logging mechanisms is a domain of the container runtime. That said, Kubernetes requires container runtime to fulfill a certain contract, and allowing it to enforce desired behavior.
Kubernetes tries to store logs at consistent filesystem paths for any container
runtime. In particular, kubelet
is responsible of configuring the container
runtime it controls to put the log at the right place.
Log file format can vary per container runtime, and we have to support all the
formats that Kubernetes itself supports.
Generally, most Kubernetes setups will put the logs at the kubelet
-configured
locations in a /var/log
directory on the host.
There is official documentation at Kubernetes project regarding logging. I had a misconception that it specifies reading these log files as an explicitly supported way of consuming the logs, however, I couldn't find a confirmation of that when I checked. Nonetheless, Kubernetes log files is a de-facto well-settled interface, that we should be able to use reliably.
We can read container logs directly from the host filesystem. Kubernetes stores logs such that they're accessible from the following locations:
/var/log/pods
;/var/log/containers
- legacy location, kept for backward compatibility with pre1.14
clusters.
To make our lives easier, here's a link to the part of the k8s source that's responsible for building the path to the log file. If we encounter issues, this would be a good starting point to unwrap the k8s code.
As already been mentioned above, log formats can vary, but there are certain invariants that are imposed on the container runtimes by the implementation of Kubernetes itself.
A particularity interesting piece of code is the ReadLogs
function - it is responsible for reading container logs. We should carefully
inspect it to gain knowledge on the edge cases. To achieve the best
compatibility, we can base our log files consumption procedure on the logic
implemented by that function.
Based on the parseFuncs
(that
ReadLogs
uses), it's evident that k8s supports the
following formats:
- Docker JSON File logging driver format - which is essentially a simple
JSONLines
(akandjson
) format; - CRI format.
We have to support both formats.
Kubernetes uses two log file formats, and both split log messages that are too long into multiple log records.
It makes sense to automatically merge the log records that were split back
together, similarly to how we do in the docker_logs
source.
We will implement automatic partial event merging and enable it by default, while allowing users to opt-out of it if they need to.
We consider both raw YAML files and Helm Chart officially supported installation methods.
With Helm, people usually use the Chart we provide, and tweak it to their needs via variables we expose as the chart configuration. This means we can offer a lot of customization, however, in the end, we're in charge of generating the YAML configuration that will k8s will run from our templates. This means that, while it is very straightforward for users, we have to keep in mind the compatibility concerns when we update our Helm Chart. We should provide a lot of flexibility in our Helm Charts, but also have sane defaults that would be work for the majority of users.
With raw YAML files, they have to be usable out of the box, but we shouldn't
expect users to use them as-is. People would often maintain their own "forks" of
those, tailored to their use case. We shouldn't overcomplicate our recommended
configuration, but we shouldn't oversimplify it either. It has to be
production-ready. But it also has to be portable, in the sense that it should
work without tweaking with as much cluster setups as possible.
We should support both kubectl create
and kubectl apply
flows.
kubectl apply
is generally more limiting than kubectl create
.
We can derive our YAML files from the Helm Charts to fold to a single source of
truth for the configuration. To do that we'd need a values.yaml
, suitable
for rendering the Helm Chart template into a set of YAML files, and a script
combine/regroup/reformat the rendered templates for better usability.
Alternatively, we can hand-write the YAML files. This has the benefit of making them more user-friendly. It's unclear if this is provides a real value compared to deriving them from Helm Charts - since the ultimate user-friendly way is to use Helm Charts.
We should not just maintain a Helm Chart, we also should offer Helm repo to make installations easily upgradable.
Everything we need to do to achieve this is outlined at the The Chart Repository Guide.
We can use a tool like ChartMuseum to manage our repo. Alternatively, we can use a bare HTTP server, like AWS S3 or GitHub Pages. ChartMuseum has the benefit of doing some things for us. It can use S3 for storage, and offers a convenient helm plugin to release charts, so the release process should be very simple.
From the user experience perspective, it would be cool if we expose our chart
repo at https://charts.vector.dev
- short and easy to remember or even guess.
We have two ways to deploy vector:
- as a
DaemonSet
; - as a sidecar
Container
.
Deployment as a DaemonSet
is trivial, applies
cluster-wide and makes sense to as default scenario for the most use cases.
Sidecar container deployments make sense when cluster-wide deployment is not available. This can generally occur when users are not in control of the whole cluster (for instance in shared clusters, or in highly isolated clusters). We should provide recommendations for this deployment variant, however, since people generally know what they're doing in such use cases, and because those cases are often very custom, we probably don't have to go deeper than explaining the generic concerns. We should provide enough flexibility at the Vector code level for those use cases to be possible.
It is possible to implement a sidecar deployment via implementing an
operator to automatically inject Vector
Container
into Pod
s, via a custom
admission controller, but that doesn't make
a lot of sense for us to work on, since DaemonSet
works for most of the use cases already.
Note that DaemonSet
deployment does require special
support at Vector code (a dedicated kubernetes
source), while a perfectly
valid sidecar configuration can be implemented with just a simple file
source.
This is another reason why we don't pay as much attention to sidecar model.
It is important that provide a well-thought deployment configuration for the Vector as part of our Kubernetes integration. We want to ensure good user experience, and it includes installation, configuration, and upgrading.
We have to make sure that Vector, being itself an app, runs well in Kubernetes, and sanely makes use of all the control and monitoring interfaces that Kubernetes exposes to manage Vector itself.
We will provide YAML and Helm as deployment options. While Helm configuration is templated and more generic, and YAML is intended for manual configuration, a lot of design considerations apply to both of them.
For the reasons discussed above, we'll be using
DaemonSet
.
Vector needs a location to keep the disk buffers and other data it requires for operation at runtime. This directory has to persist across restarts, since it's essential for some features to function (i.e. not losing buffered data if/while the sink is gone).
We'll be using DaemonSet
, so, naturally, we can
leverage hostPath
volumes.
We'll be using hostPath
volumes at our YAML config, and at the Helm Chart
we'll be using this by default, but we'll also allow configuring this to provide
the flexibility users will expect.
An alternative to hostPath
volumes would be a user-provided
persistent volume of some kind. The only
requirement is that it has to have a ReadWriteMany
access mode.
This section is about Vector
.toml
config files.
Vector configuration in the Kubernetes environment can generally be split into two logical parts: a common Kubernetes-related configuration, and a custom user-supplied configuration.
A common Kubernetes-related configuration is a part that is generally expected
to be the same (or very similar) across all of the Kubernetes environments.
Things like kubernetes
source and kubernetes_pod_metadata
transform belong
there.
A custom user-supplied configuration specifies a part of the configuration that contains parameters like what sink to use or what additional filtering or transformation to apply. This part is expected to be a unique custom thing for every user.
Vector supports multiple configuration files, and we can rely on that to ship a config file with the common configuration part in of our YAML / Helm suite, and let users keep their custom config part in a separate file.
We will then mount two ConfigMap
s into a
container, and start Vector in multiple configuration files mode
(vector --config .../common.toml --config .../custom.toml
).
It is best to explicitly disable reloads in our default deployment
configuration, because this provides more reliability that eventually consistent
ConfigMap
updates.
Users can recreate the Pod
s (thus restarting Vector, and making it aware of
the new config) via
kubectl rollout restart -n vector daemonset/vector
.
This section is about Kubernetes
.yaml
files.
YAML files storing Kubernetes API objects configuration can be grouped differently.
The layout proposed in guide above is what we're planing to use. It is in line with the sections above on Vector configuration splitting into the common and custom parts.
The idea is to have a single file with a namespaced configuration (DaemonSet
,
ServiceAccount
, ClusterRoleBinding
, common ConfigMap
, etc), a single file
with a global (non-namespaced) configuration (mainly just ClusterRole
) and a
user-supplied file containing just a ConfigMap
with the custom part of the
Vector configuration. Three .yaml
files in total, two of which are supplied by
us, and one is created by the user.
Ideally we'd want to make the presence of the user-supplied optional, but it just doesn't make sense, because sink has to be configured somewhere.
We can offer some simple "typical custom configurations" at our documentation as an example:
- with a sink to push data to our Alloy;
- with a cluster-agnostic
elasticsearch
sink; - for AWS clusters, with a
cloudwatch
sink; - etc...
We must be careful with our .yaml
files to make them play well with not just
kubectl create -f
, but also with kubectl apply -f
. There are often issues
with idempotency when labels and selectors aren't configured properly and we
should be wary of that.
We can use a separate .yaml
file per object.
That's more inconvenient since we'll need users to execute more commands, yet it
doesn't seems like it provides any benefit.
We expect users to "fork" and adjust our config files as they see fit, so they'll be able to split the files if required. They then maintain their configuration on their own, and we assume they're capable and know what they're doing.
Setting resource requirements for Vector container is very important to enable Kubernetes to properly manage node resources.
Optimal configuration is very case-specific, and while we have some understanding of Vector performance characteristics, we can't account for the environment Vector will run at. This means it's nearly impossible for us to come up with sane defaults, and we have to rely on users properly configuring the resources for their use case.
However, it doesn't mean we should ignore this concern. Instead, we must share our understanding of Vector runtime properties and data, and provide as much assistance to the users trying to determine the resource requirements as possible.
We should provide the documentation explaining the inner architecture of Vector and our considerations on how to estimate memory / CPU usage.
At to our configuration, we'll omit the resources
from the YAML files, and
make them configurable at Helm Charts.
It would be great to publish a regularly updated bulletin on Vector runtime properties (i.e. how much memory and CPU Vector can utilize and under what conditions). That would be a real killer feature for everyone that wants to deploy Vector under load, not just in the context of Kubernetes integration. Though it's a lot of hard work to determine these properties, people with large deployments tend to do this anyway to gain confidence in their setup. We could exchange this data with our partners and derive an even more realistic profile for Vector's runtime properties, based on real data from the multiple data sets. This worth a separate dedicated RFC though.
Security considerations on deployment configuration are grouped together with other security-related measures. See here.
Other notable PodSpec
properties
terminationGracePeriodSeconds
- we should set this to a value slightly bigger than Vector topology grace termination period;hostNetwork
- we shouldn't use host network since we need access tokube-apiserver
, and the easiest way to get that is to use cluster network;preemptionPolicy
- our default deployment mode - aggregating logs from pods - is not considered critical for cluster itself, so we should not disable preemption;priorityClassName
- seePriorityClass
docs; we could ship aPriorityClass
and set this value, but the priority value is not normalized, so it's probably not a good idea to provide a default our of the box, and leave it for cluster operator to configure;runtimeClassName
- we'll be using this value in tests to validate that Vector works with non-standard runtime; we shouldn't set it in our default YAMLs, nor set it at Helm by default;
Kubernetes allows configuring a number of Probe
s on
Container
, and taking action based on those probes.
See the documentation to learn more.
-
readinessProbe
Periodic probe of container service readiness. Container will be removed from service endpoints if the probe fails.
-
livenessProbe
Periodic probe of container liveness. Container will be restarted if the probe fails.
-
startupProbe
Startup probe indicates that the container has successfully initialized. If specified, no other probes are executed until this completes successfully. If this probe fails, the container will be restarted, just as if the
livenessProbe
failed.
Vector should implement proper support for all of those one way or another at the code level.
-
startupProbe
can be tight to the initial topology healthcheck - i.e. we consider it failed until the initial topology health check is complete, and consider it ok at any moment after that; -
livenessProbe
should probably be tied to the async executor threadpool responsiveness - i.e. if we can handle an HTTP request in a special liveness server we expose in Vector - consider the probe ok, else something's very wrong, and we should consider the probe failed; -
readinessProbe
is the most tricky one; it is unclear what the semantics makes sense there.
Kubernetes has a lot of metadata that can be associated with the logs, and most of the users expect us to add some parts of that metadata as fields to the event.
We already have an implementation that does this in the form of
kubernetes_pod_metadata
transform.
It works great, however, as can be seen from the next section, we might need
to implement a very similar functionality at the kubernetes
source as well to
perform log filtering. So, if we'll be obtaining pod metadata at the
kubernetes
source, we might as well enhance the event right there. This would
render kubernetes_pod_metadata
useless, as there would be no use case for
it that wouldn't be covered by kubernetes
source.
Of course, kubernetes_pod_metadata
would still make sense if used not in
conjunction with kubernetes
source - which is the case, for instance, in a
sidecar deployment - where file
source is used directly with in-pod logs file.
What parts of metadata we inject into events should be configurable, but we can and want to offer sane defaults here.
Technically, the approach implemented at kubernetes_pod_metadata
already is
pretty good.
One small detail is that we probably want to allow adding arbitrary fields from
the Pod
object record to the event, instead of a predefined set of fields.
The rationale is we can never imagine all the use cases people could have
in the k8s environment, so we probably should be as flexible as possible.
There doesn't seem to be any technical barriers preventing us from offering
this.
We can do highly efficient filtering based on the log file path, and a more comprehensive filtering via metadata from the k8s API, that is, unfortunately, has a bit move overhead.
The best user experience is via k8s API, because then we can support filtering by labels/annotations, which is a standard way of doing things with k8s.
We already do that in our current implementation.
The idea we can derive some useful parameters from the logs file paths. For more info on the logs file paths, see the File locations section of this RFC.
So, Kubernetes 1.14+ exposes the following information via the file path:
pod namespace
pod name
pod uuid
container name
This is enough information for the basic filtering, and the best part is it's available to us without and extra work - we're reading the files anyways.
Filtering by Kubernetes metadata is way more advanced and flexible from the user perspective.
The idea of doing filtering like that is when Vector picks up a new log file to
process at kubernetes
source, it has to be able to somehow decide on whether
to consume the logs from that file, or to ignore it, based on the state at the
k8s API and the Vector configuration.
This means that there has to be a way to make the data from the k8s API related to the log file available to Vector.
Based on the k8s API structure, it looks like we should aim for obtaining the
Pod
object, since it contains essential information about the
containers that produced the log file. Also, it is the Pod
objects that control the desired workload state that kubelet
strives to
achieve on the node, which this makes Pod
objects the best
option for our case. In particular - better than
Deployment
objects. Technically, everything that needs
to run containers will produce a Pod
object, and live
Container
s can only exist inside of the
Pod
.
There in a number of approaches to get the required Pod
objects:
-
Per-file requests.
The file paths provide enough data for us to make a query to the k8s API. In fact, we only need a
pod namespace
and apod uuid
to successfully obtain thePod
object. -
Per-node requests.
This approach is to list all the pods that are running at the same node as Vector runs. This effectively lists all the
Pod
objects we could possibly care about.
One important thing to note is metadata for the given pod can change over time, and the implementation has to take that into account, and update the filtering state accordingly.
We also can't overload the k8s API with requests. The general rule of thumb is we shouldn't do requests much more often that k8s itself generates events.
Each approach has very different properties. It is hard to estimate which set is is a better fit.
A single watch call for a list of pods running per node (2) should generate less overhead and would probably be easier to implement.
Issuing a watch per individual pod (1) is more straightforward, but will definitely use more sockets. We could speculate that we'll get a smaller latency than with doing per-node filtering, however it's very unclear if that's the case.
Either way, we probably want to keep some form of cache + a circuit breaker to avoid hitting the k8s API too often.
One downside is we'll probably have to stall the events originated from a
particular log file until we obtain the data from k8s API and decide whether
to allow that file or filter it. During disasters, if the API server becomes
unavailable, we'll end up stalling the events for which we don't have Pod
object data cached. It is a good idea to handle this elegantly, for instance
if we detect that k8s API is gone, we should pause cache-busting until it comes
up again - because no changes can ever arrive while k8s API server is down, and
it makes sense to keep the cache while it's happening.
We're in a good position here, because we have a good understanding of the system properties, and can intelligently handle k8s API server being down.
Since we'll be stalling the events while we don't have the Pod
object, there's
an edge case where we won't be able to ship the events for a prolonged time.
This scenario occurs when a new pod is added to the node and then kubernetes API
server goes down. If kubelet
picks up the update and starts the containers,
and they start producing logs, but Vector at the same node doesn't get the
update - we're going to stall the logs indefinitely. Ideally, we'd want to talk
to the kubelet
instead of the API server to get the Pod
object data - since
it's local (hence has a much higher chance to be present) and has even more
authoritative information, in a sense, than the API server on what pods are
actually running on the node. However there's currently no interface to the
kubelet
we could utilize for that.
Here's an example of an nginx
deployment.
apiVersion: apps/v1
kind: Deployment
metadata:
name: nginx-deployment
labels:
app: nginx
spec:
replicas: 3
selector:
matchLabels:
app: nginx
template:
metadata:
labels:
app: nginx
annotations:
vector.dev/exclude: "true"
spec:
containers:
- name: nginx
image: nginx:1.14.2
ports:
- containerPort: 80
The vector.dev/exclude: "true"
annotation
at the PodTemplateSpec
is intended to let Vector know that it
shouldn't collect logs from the relevant Pod
s.
Upon picking up a new log file for processing, Vector is intended to read the
Pod
object, see the vector.dev/exclude: "true"
annotation and ignore the
log file altogether. This should save take much less resources compared to
reading logs files into events and then filtering them out.
This is also a perfectly valid way of filtering out logs of Vector itself.
There is a demand for
filtering by namespace via namespace annotations. This is an additional concern
to filtering by just the Pod
object data that was already described above.
The idea is that all Pod
s belong to Namespace
s (docs), and users want to be able
to annotate the Namespace
itself for exclusion, effectively excluding all the
Pod
belonging to it from collection.
To support this, we'll have to maintain the list of excluded Namespace
s, and
filter Pod
s against that list.
Listing the Namespace
s can be done via the
corresponding API in a similar manner to how we do it
for Pod
s. Came concerns regarding caching and load limiting apply.
This is an alternative approach to the previous implementation.
The current implementation allows doing this, but is has certain downsides - the main problem is we're paying the price of reading the log files that are filtered out completely.
In most scenarios it'd be a significant overhead, and can lead to cycles.
We might want to implement support for configuring Vector via annotations
and/or labels in addition to the configuration files at the ConfigMap
s.
This actually should be a pretty easy thing to do with a downward API. It exposes pod data as files, so all we need is a slightly altered configuration loading procedure.
This is how is would look like (very simplified):
apiVersion: v1
kind: Pod
metadata:
name: kubernetes-downwardapi-volume-example
annotations:
vector.dev/config: |
[sinks.aws_s3]
type = "aws_s3"
inputs = ["kubernetes"]
bucket = "my-bucket"
compression = "gzip"
region = "us-east-1"
key_prefix = "date=%F/"
spec:
containers:
- name: vector
image: vector-image
command:
["vector", "--k8s-downward-api-config", "/etc/podinfo/annotations"]
volumeMounts:
- name: podinfo
mountPath: /etc/podinfo
volumes:
- name: podinfo
downwardAPI:
items:
- path: "annotations"
fieldRef:
fieldPath: metadata.annotations
The /etc/podinfo/annotations
file will look something like this:
kubernetes.io/config.seen="2020-04-15T13:35:27.290739039Z"
kubernetes.io/config.source="api"
vector.dev/config="[sinks.aws_s3]\ntype = \"aws_s3\"\ninputs = [\"kubernetes\"]\nbucket = \"my-bucket\"\ncompression = \"gzip\"\nregion = \"us-east-1\"\nkey_prefix = \"date=%F/\"\n"
It's quite trivial to extract the configuration.
While possible, this is outside of the scope of the initial integration.
A much more involved feature than the one above would be making Vector
configurable via Custom Resource Definition
.
This feature is not considered for the initial integration with Kubernetes, and is not even explored, since it is a way more advanced level of integration that we can achieve in the short term in the near future.
This section is here for completeness, and we would probably like to explore this in the future.
This includes both adding the support for CRDs to Vector itself, and
implementing an orchestrating component (such things are usually called
operators in the k8s context, i.e. vector-operator
).
We need to ship a particular Vector version along with a particular set of k8s configuration YAML files and a Helm chart. This is so that we can be sure all our configurations actually tested and known to work for a particular Vector release. This is very important for maintaining the legacy releases, and for people to be able to downgrade if needed, which is one of the major properties for a system-level component like Vector is.
This means we need to orchestrate the releases of the YAML configs and Helm Charts together with the Vector releases.
Naturally, it's easiest to implement if we keep the code for both the YAML configs and the Helm Chart in our Vector repo.
The alternative - having either just the Helm Chart or it together with YAML files in a separate repo - has a benefit of being a smaller footprint to grasp - i.e. a dedicated repo with just the k8s deployment config would obviously have smaller code and history - but it would make it significantly more difficult to correlate histories with Vector mainline, and it's a major downside. For this reason, using the Vector repo for keeping everything is preferable.
During the release process, together with shipping the Vector version, we'd
have to also bump the Vector versions at the YAML and Helm Chart configs, and
also bump the version of the Helm Chart as well. We then copy the YAML configs
to the same location where we keep release artifacts (i.e. .deb
s, .rpm
s,
etc) for that particular Vector version. We also publish a new Helm Chart
release into our Helm Chart repo.
While bumping the versions is human work, and is hard to automate - copying the YAML files and publishing a Helm Chart release is easy, and we should take care of that. We can also add CI lints to ensure the version of Vector at YAML file and Helm Chart and the one the Rust code has baked in match at all times. Ideally, they should be bumped together atomically and never diverge.
If we need to ship an update to just YAML configs or a new Helm Chart without changes to the Vector code, as our default strategy we can consider cutting a patch release of Vector - simply as a way to go through the whole process. What is bumping Vector version as well, even though there's no practical reason for that since the code didn't change. This strategy will not only simplify the process on our end, but will also be very simple to understand for our users.
We want to implement a comprehensive test system to maintain our k8s integration.
As usual, we need a way to do unit tests to validate isolated individual components during development. We also need integration tests, whose purpose is to validate that, as a whole, Vector properly functions when deployed into a real Kubernetes cluster.
To be able to utilize unit tests, we have to build the code from the modular, composable, and loosely-coupled components. These requirements often allow unit testing individual components easily, thus significantly improving the confidence in the overall implementation.
If we have to, we can rely on mocks to test all the edge cases of the individual components.
Integration tests are performed against the real k8s clusters.
We have a matrix of concerns, we'd like to ensure Vectors works properly with.
- Kubernetes Versions
- Minimal Supported Kubernetes Version
- Latest version
- All versions in between the latest and MSKV
- Managed Kubernetes offers (see also CNCF Certified Kubernetes)
- Amazon Elastic Kubernetes Service
- Google Kubernetes Engine
- Azure Kubernetes Service
- DigitalOcean Kubernetes
- Platform9 Managed Kubernetes
- Red Hat OpenShift Container Platform
- IBM Cloud Kubernetes Service
- Alibaba Cloud Container Service for Kubernetes
- Oracle Container Engine for Kubernetes
- OVH Managed Kubernetes Service
- Rackspace Kubernetes-as-a-Service
- Linode Kubernetes Engine
- Yandex Managed Service for Kubernetes
- Tencent Kubernetes Engine
- Kubernetes Distributions (for on-premise deployment)
- Production-grade
- bare
kubeadm
- OKD (deploys OpenShift Origin)
- Rancher Kubernetes Engine
- Metal3
- Project Atomic Kubernetes
- Canonical Charmed Kubernetes
- Kubernetes on DC/OS
- bare
- For small/dev deployments
- Production-grade
- Container Runtimes (CRI impls)
- Docker (Kubernetes still has some "special"
integration with Docker; these days, "using Docker" technically means using
runc
viacontainerd
viadocker-engine
) - OCI (via CRI-O or containerd)
- runc
- runhcs - see more here
- Kata Containers
- gVisor
- Firecracker
- Docker (Kubernetes still has some "special"
integration with Docker; these days, "using Docker" technically means using
We can't possibly expand this matrix densely due to the enormous amount of effort required to maintain the infrastructure and the costs. It may also be inefficient to test everything everywhere, because a lot of configurations don't have any significant or meaningful differences among each other.
Testing various managed offers and distributions is not as important as testing different Kubernetes versions and container runtimes.
It's probably a good idea to also test against the most famous managed Kubernetes provides: AWS, GCP and Azure. Just because our users are most likely to be on one of those.
So, the goal for integration tests is to somehow test Vector with Kubernetes versions from MSKV to latest, all the container runtimes listed above and, additionally, on AWS, GCP and Azure.
We can combine our requirements with offers from cloud providers. For instance,
runhcs
(and Windows containers in general) are supported at Azure. Although,
whether we want to address Windows containers support is a different topic, we
still should plan ahead.
We'll need to come up with an optimal configuration.
This is a very controversial question.
Currently we have:
- the Vector repo (with the GitHub Actions based CI flow)
- the test harness (also integrated with CI, but this is it's own thing)
We don't necessarily have to choose one of those places: we can add a new location if it's justified enough.
Let's outline the requirements on the properties of the solution:
-
We want to have the ability to run the checks from the Vector repo CI, i.e. per commit, per PR, per tag etc. This might not be immediately utilized, but we just want to have that option.
-
We want to consolidate the management of the cloud resources we allocate and pay for Kubernetes test infrastructure in a single place. This is to avoid spreading the responsibility, duplicating the logic, reusing allocated resources for all our testing needs, and simplify accounting and make the configuration management more flexible. We can, for example, have a shared dependency for Vector CI flow, Test Harness invocations, locally run tests - and whatever else we have - to rely on.
-
We want our test infrastructure easily available for the trusted developers (Vector core team) to run experiments and tests against locally. This doesn't mean we want to automate this and include running tests locally against our whole k8s test infrastructure - but the ability to do it with little effort is very important: even if we employ super-reliable CI automation, the turnaround time of going through it is way higher than conducting an experiment locally. Locally means using local code tree and binaries - the infrastructure itself is still in the cloud.
-
Ideally, we want the test system to be available not just to the Vector core team, but to the whole open-source community. Of course, we don't want to give unrestricted access to our cloud testing infrastructure - but the solution we employ should allow third-parties to bring their own resources. Things that are local in essence (like
minikube
) should just work. There shouldn't be a situation where one can't run tests inminikube
because cloud parts aren't available. We already have similar constraints at the Vector Test Harness. -
We need the required efforts to managements the solution to be low, and the price to be relatively small. This means that the solution has to be simple.
-
We want to expose the same kind of interface to each of the clusters, so the cluster we run the tests is easily interchangeable. A kubectl config file is a good option, since it encapsulates all the necessary information tp connect to a cluster.
Based on all of the above, it makes sense to split the infrastructure into two parts.
-
Cloud infrastructure that we manage and pay for.
We will create a dedicated public repo with Terraform configs to setup a long-running Kubernetes test infrastructure. The goal here is to make the real, live cloud environments available for people and automation to work with.
-
Self-hosted infrastructure that we maintain configs for.
This is what keep so that it's easy to run the a self-hosted cluster. Most likely locally - for things like
minikube
, but not limited to. The focus here is lock particular versions and configuration of the tooling, so it's easy to run tests against. Potentially even having multiple versions of the same tool, for instance, when you need to compareminikube
1.9.2
and1.8.2
. The goal here is to address the problem of configuring the self hosted cluster management tools once and for all, and share those configurations. For people it has the benefit of enabling them to spend time on soling the problem (or doing whatever they need to do with k8s) rather than spending time on configuration. For automation flows - it'll make it really simple to reference a particular self-hosted configuration - and offload the complexity of preparing it.This one we'll have to figure out, but most likely we'll create a dedicated repo per tool, each with different rules - but with a single interface.
The interface (and the goal) those repos is to provide kubectl-compatible config files, enabling access to clusters where we can deploy Vector to and conduct some tests (and, in general, other arbitrary activity).
We can recognize three typical categories of integration tests that are relevant to the Kubernetes integration: correctness, performance and reliability. In fact, this is actually how we split things at the Vector Test Harness already.
It is important that with Kubernetes we don't only have to test that Vector itself perform correctly, but also that our YAML configs and Helm Chart templates are sane and work properly. So in a sense, we still have the same test categories, but the scope is broader than just testing Vector binary. We want to test the whole integration.
Ideally we want to test everything: the correctness, performance and reliability. Correctness tests are relatively easy, however, it's not yet clear how to orchestrate the performance and reliability tests. Measuring performance in clusters is quite difficult and requires insight thought to make it right. For example, we have to consider and control a lot more variables of the environment - like CNI driver, underlying network topology and so on - to understand the conditions we're testing. Reliability tests also require more careful designing the test environment. For this reason, the initial Kubernetes integration only focuses on correctness tests. Once we get some experience with correctness test we can expand our test suite with tests from other categories.
It is important that we do actually test correctness on all the configurations - see this comment as an example. Kubernetes is has a lot of LOC, is very complex and properly supporting it is quite a challenge.
The exact design of the tests is an implementation detail, so it's not specified in this RFC, but the suggested approach, as a starting point, could be to deploy Vector using our documented installation methods, then run some log-generating workload and then run assertions on the collected logs.
The things we'd generally want to ensure work properly include (but are not limited to):
- basic log collection and parsing
- log message filtering (both by file paths and by metadata)
- log events enhancement with metadata
- partial log events merging
We want the assertions and tests to be cluster-agnostic, so that they work with any supplied kubectl config.
We already have k8s integration tests implemented in Rust in the Vector repo.
Currently, they're being run as part of the cd tests; make tests
. They
assert that Vector code works properly by deploying Vector plus some test
log producers and asserting that Vector produced the expected output. This is
very elegant solution.
However, these tests are really more like unit tests - in a sense that they
completely ignore the YAMLs and Helm Charts and
use their own test configs. While they do a good job in what they're built for -
we probably shouldn't really consider them integration tests in a broad sense.
It was discussed that we'd want to reuse them as our integration tests, however, for the reasons above I don't think it's a good idea. At least as they're now. We can decouple the deployment of Vector from the deployment of test containers and assertions - then we use just the second half with Vector deployed via YAMLs and/or Helm Charts. For now, we should probably leave them as is, maintain them, but hold the adoption as integration tests.
This section is on gathering data other than container logs.
While our main focus for the integration is collecting log data from the Pod
s,
there are other possibilities to gain observability in the Kubernetes
environment.
Exposing Kubernetes Event
s as Vector events
It is possible to subscribe to Kubernetes Event
s, similarly
to how this command works:
kubectl get events --all-namespaces --watch
Implementing this in Vector would allow capturing the Kubernetes
Event
s and processing them as Vector events.
This feature might be very useful for anyone that wants to see what's going on in their cluster.
Note that this feature would require deploying Vector in a differently: instead of running Vector on every node, here we need only once Vector instance running per cluster. If run on every node, it'd be unnecessarily capturing each event multiple times.
So, to implement this, we'd need to add a special source that captures events
from Kubernetes API, and provide a new workload configuration based on
Deployment
.
See also a section on collecting Kubernetes audit logs.
Prometheus already has a built-in Kubernetes Service Discovery support, so one could just deploy a Prometheus server, make it discover and gather the metrics, and the configure Vector to read metrics from it.
However, to pursue our goal of making Vector the only agent one would need to deploy - we can consider reimplementing what prometheus does in Vector code, eliminate the need for the intermediary.
We don't aim to implement this in the initial Kubernetes integration.
This is very useful for Kubernetes Cluster Operators willing to deploy Vector for the purposes of gaining observability on what's going on with their cluster nodes.
Example use cases are:
- reading
kubelet
/docker
logs fromjournald
; - capturing
kubelet
/docker
prometheus metrics; - gathering system metrics from the node, things like
iostat -x
,df -h
,uptime
,free
, etc; - gathering system logs, like
sshd
,dmesg
and etc.
There are countless use cases here, and good news Vector already well fit to
perform those kinds of tasks! Even without any Kubernetes integration
whatsoever, it's possible to just deploy Vector as a
DaemonSet
, expose the system data to it via
hostPath
volume mounts and/or enabling
hostNetwork
at the PodSpec
.
While nothing prevents users from manually configuring Vector for gathering data from the host OS, it's very hard for us to offer sane defaults that would work out-of-the-box for all clusters, since there's a myriad of configurations.
We can consider offering some kind of user-selectable presets for well known popular setups - like AWS and CGP.
We can also solve this a general problem of automatic discovery of what we can
monitor on a given system - something similar to what netdata
has.
In the context of the current integration efforts, it doesn't make a lot of sense to try to address this issue in Vector code or deployment configs:
- gathering data from the host OS works with manual configuration;
- cluster operators mostly know what they're doing, and are capable to configure Vector as they require;
- there's a myriad of configurations we'd have to support, and it'd be very hard (if even possible) to come up with sane defaults.
- related to the point above, even with sane defaults, in 95% on cases, cluster operators would want to tailor the configuration for their use case.
What we can do, though, is provide guides, blog posts and explainers with concrete examples for Vector usage for Kubernetes Cluster Operators.
We can also collect Kubernetes audit logs.
This is very similar to collecting Kubernetes Events, but provides a more fine-grained control over what events are audited.
It's important to understand that events, unfiltered, should be considered very sensitive and privileged data.
Kubernetes audit Policy
allows cluster operator to configure
kubelet
s to manage the audit data with a high degree of flexibility.
The best part is this is something that should already work great with Vector - we can already support operation via both log and webhook backends.
We don't aim to support Windows Kubernetes clusters initially. The reason for that is Windows support in general (i.e. outside of Kubernetes context) is a bit lacking - we don't measure performance on Windows, don't run unit tests on Windows, don't build Windows docker images, etc. This is a blocker for a proper integration with Kubernetes clusters running on Windows.
To sum up: if it works - it works, if it doesn't - we'll take care of it later.
If you're reading this and want to use Vector with Windows - please let us know.
Windows has it's own specifics. We can learn from the past experience of other implementations to avoid the problems they encounter.
-
This issue is on what seems to be a resource management problem with files on Windows - their implementation doesn't let go of the log file in time when the container (along with it's log files) is about to be removed. This is a non-issue in a typical linux deployment because it's not the path at the filesystem, but the inode that FD bind to. On Windows it's the other way.
There's actually a workaround for that: it's possible request Windows to allow deletion of the opened file - by specifying the
FILE_SHARE_DELETE
flag atCreateFileA
call.See more details:
There are different aspects of security. In this RFC we're going to focus on Kubernetes specific aspects.
Securing in Kubernetes environment plays a major role, and the more we do to ensure our code and deployment recommendations are safe - the better. Big deployments often have dedicated security teams that will be doing what we do on their own - just to double-check, but the majority of our people out there don't have enough resources to dedicate enough attention to the security aspects. This is why implementing security measures in our integration is important.
There have to be automated security audit of the Vector codebase, to ensure we don't have easily detectable issues. Things like automated CVE checks and static analyzers fall into this category. We're already doing a good job in this aspect.
There has to be an automated security audit of the Vector docker images that we ship.
We should consider using tools like this:
... and similar.
We should harden the Vector deployment by default. This means that our suggested YAML files should be hardened, and Helm Chart should be configurable, but also hardened by default.
-
We should properly configure PodSecurityContext (docs):
- properly configure
sysctls
; fsGroup
- should be unset.
- properly configure
-
We should properly configure SecurityContext (docs):
- enable
readOnlyRootFilesystem
since we don't need to write to files at rootfs; - enable
runAsNonRoot
if possible - we shouldn't need root access to conduct most of our operations, but this has to be validated in practice; the aim is to enable it if possible; - disable
allowPrivilegeEscalation
since we shouldn't need extra any special privileges in the first place, and definitely we don't need escalation; - properly configure
seLinuxOptions
; - properly configure
capabilities
- seeman 7 capabilities
for more info; - disable
privileged
- we shouldn't don't need privileged access, and it's me a major security issue if we do.
- enable
-
We should properly use
ServiceAccount
,Role
,RoleBinding
,ClusterRole
andClusterRoleBinding
(docs).The service accounts at Kubernetes by default have no permissions, except for the service accounts at the
kube-system
namespace. We'll be using a dedicatedvector
namespace, so it's our responsibility to request the required permissions.The exact set of permissions to request at default deployment configuration depends on the implementation we'll land and the Vector settings of the default deployment configuration. The goal is to eliminate any non-required permissions - we don't have to keep anything extra there for demonstration purposes.
We also have to document all possible required permissions, so that users are aware of the possible configuration options. At Helm Charts we should allow configuring arbitrary permissions via values (while providing sane defaults).
We can optionally support non-RBAC clusters in the Helm Chart. In the real world, the non-RBAC clusters should be very rare, since RBAC has been recommended for a very long time, and it's the default for the fresh
kubeadm
installations. It's probably not a major concern.
Vector sometimes needs access to secrets, like AWS API access tokens and so on. That data has to be adequately protected.
We should recommend users to use Secret
(docs) instead of ConfigMap
if they
have secret data embedded in their Vector .toml
config files.
We should also consider integrating with tools like Vault and redoctober.
- Suggest using Falco.
- Suggest setting up proper RBAC rules for cluster operators and users;
audit2rbac
is a useful tool to help with this. - Suggest using Pod Security Policies (API).
- Suggest using NetworkPolicy.
- Suggest running kube-bench.
- Suggest reading the Kubernetes security documentation.
The ability to rebuild containers with a CVE fix automatically quickly is a very important part of a successful vulnerability mitigation strategy. We should prepare in advance and rollout the infrastructure and automation to make it possible to rebuild the containers for all (not just the latest or nightly!) the supported Vector versions.
- Filebeat k8s integration
- Fluentbit k8s integration
- Fluentd k8s integration
- LogDNA k8s integration
- Honeycomb integration
- Bonzai logging operator - This is approach is likely outside of the scope of Vector's initial Kubernetes integration because it focuses more on deployment strategies and topologies. There are likely some very useful and interesting tactics in their approach though.
- Influx Helm charts
- Awesome Operators List - an "awesome list" of operators.
See motivation.
- Increases the surface area that our team must manage.
- Not do this integration and rely solely on external community-driven integrations.
What is the best to avoid Vector from ingesting it's own logs? I'm assuming that mySee the Origin filtering section.kubectl
tutorial handles this with namespaces? We'd just need to configure Vector to exclude this namespace?From what I understand, Vector requires the KubernetesRight, this is a requirement since we're using k8s API. The exact set of permissions is to be determined at YAML files design stage - after we complete the implementation. It's really trivial to determine from a set of API calls used. See the Deployment Hardening section.watch
verb in order to receive updates to k8s cluster changes. This is required for thekubernetes_pod_metadata
transform. Yet, Fluentbit requires theget
,list
, andwatch
verbs. Why don't we require the same?- What are some of the details that set Vector's Kubernetes integration apart? This is for marketing purposes and also helps us "raise the bar".
- What significantly different k8s cluster "flavors" are there? Which ones do
we want to test against? Some clusters use
docker
, some useCRI-O
, etc. Some even use gVisor or Firecracker. There might be differences in how different container runtimes handle logs. - How do we want to approach Helm Chart Repository management.
- How do we implement liveness, readiness and startup probes? Readiness probe is a tricky one. See Container probes.
- Can we populate file at
terminationMessagePath
with some meaningful information when we exit or crash? - Can we allow passing arbitrary fields from the
Pod
object to the event? Currently we only to passpod_id
, podannotations
and podlabels
.
- Setup a proper testing suite for k8s.
- Local testing via
make test-integration-kubernetes
.- Ability to "bring your own cluster". See issue#2170.
- Add
make test-integration-kubernetes
to theci.yaml
workflow.- Ensure these tests are stable. See issue#2193, issue#2216, and issue#1635.
- Ensure we are testing all supported minor versions. See issue#2223.
- Run
make test-integration-kubernetes
against AWS' EKS platform in Vector's GitHub actions.
- Local testing via
- Finalize the
kubernetes
source.- Audit the code and ensure the base is high-quality and correct.
- Merge in the
kubernetes_pod_metadata
transform. - Implement origin filtering.
- Merge split logs pr#2134.
- Use the
log_schema.kubernetes_key
setting for context fields. See issue#1867.
- Add
kubernetes
source reference documentation. - Prepare Helm Chart.
- Prepare YAML deployment config.
- Prepare Helm Chart Repository.
- Integrate kubernetes configuration snapshotting into the release process.
- Add Kubernetes setup/integration guide.
- Release
0.10.0
and announce. - Prepare additional guides and blog posts.
- Vector deployment for Kubernetes Cluster Operators.
- Vector deployment as a sidecar.
- Revisit this RFC - see what we can focus on next.
- Start the RFC of the Vector performance properties bulletin.
To include things like:
- Establish continuous data gathering of performance characteristics of the bare Vector event pipeline (i.e. raw speed) and the impact of adding each of its components - sources, transforms, sinks - and their combinations.
- Prepare the format (and, if possible, automate the release of) Vector performance bulletin.