While this tutorial is based on AKS, it can be used on any kind of Kubernetes cluster (self hosted or otherwise)
Inspektor Gadget is an Open Source EBPF based tool to debug Kubernetes resources. It helps in advanced investigation where metrics or app logs are not sufficient. One of the major advantages is being able to map low level linux resources to high level Kubernetes resources such as pods.
This topic is explained further here. In this tutorial, we will focus on inspektor gadget. The usage is very similar in ig
There are a number of ways to install here. In this tutorial, we will be using the latest release
- Clone the repo to begin with
We can use Azure CLI to install inskpektor gadget. The instructions are based on the official documentation here. If you have a cluster, you can skip the the section here. I have included a script in the repo here
At the end of the installation, you should see gadget
pods running on each node of your cluster.
kubectl get pods -n gadget
NAME READY STATUS RESTARTS AGE
gadget-45cxc 1/1 Running 0 19h
gadget-dgtng 1/1 Running 0 21d
gadget-qkntw 1/1 Running 0 22d
gadget-qrhtz 1/1 Running 0 12d
We will be going through a few gadgets in this tutorial. The comprehensive list of gadgets are documented here
- Apply the
test-pod.yml
, which is present in this repoThis will create a pod named io-pod that continuously performs I/O operations for 60 minutes. You can adjust the parameters in the YAML manifest to customize the I/O workload according to your requirements.$ kubectl apply -f test-pod.yml pod/io-pod configured
At this point, depending on the node on which the pod is scheduled, you can go the associated VM instance to look at the IO metrics
- We can use the top block-io gadget to see which pod is performing how many I/O operations
$ kubectl gadget top block-io
K8S.NODE K8S.NAMESPACE K8S.POD K8S.CONTAINER PID COMM R/W MAJOR MINOR BYTES TIME(µs) IOs
You should see something like the below
K8S.NODE K8S.NAMESPACE K8S.POD K8S.CONTAINER PID COMM R/W MAJOR MINOR BYTES TIME OPS
aks-node1 default io-pod io-container 1758184 fio W 8 0 20713472 299529 5057
aks-node1 default io-pod io-container 1758184 fio R 8 0 20258816 579032 4946
- Now, we will use the profile block-io gadget to see the distribution of the requests
First, we need to get the node on which the pod is running, the NODE column should have the information
$ kubectl get pod io-pod -o wide
- We run the profile block-io gadget on that specific node. Once you quit, you can view the I/O latency and the number of operations in each bucket
bash
kubectl gadget profile block-io --node <node-name>
INFO[0000] Running. Press Ctrl + C to finish
^C µs : count distribution
0 -> 1 : 0 | |
2 -> 3 : 0 | |
4 -> 7 : 0 | |
8 -> 15 : 0 | |
16 -> 31 : 25 | |
32 -> 63 : 830561 |****************************** |
64 -> 127 : 1101439 |****************************************|
128 -> 255 : 206979 |******* |
256 -> 511 : 14736 | |
512 -> 1023 : 1354 | |
1024 -> 2047 : 122 | |
2048 -> 4095 : 45 | |
4096 -> 8191 : 27 | |
8192 -> 16383 : 10 | |
16384 -> 32767 : 5 | |
32768 -> 65535 : 2 | |
Let's now delete the pod created
bash
kubectl delete pod io-pod
When you run the profile block-io gadget again, you see a different distribution. We can observe two trends: the number of operations is significantly lower, and the latencies are also reduced.
NFO[0000] Running. Press Ctrl + C to finish
^C µs : count distribution
0 -> 1 : 0 | |
2 -> 3 : 0 | |
4 -> 7 : 0 | |
8 -> 15 : 0 | |
16 -> 31 : 11 |* |
32 -> 63 : 81 |************** |
64 -> 127 : 126 |********************** |
128 -> 255 : 226 |****************************************|
256 -> 511 : 138 |************************ |
512 -> 1023 : 4 | |
Another interesting example of out of memory tracing is available here
bash
$ kubectl gadget undeploy