-
Notifications
You must be signed in to change notification settings - Fork 0
/
k8s-notes.txt
executable file
·1062 lines (953 loc) · 51.6 KB
/
k8s-notes.txt
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
1000
- Previously, our focus has been deploying applications to individual machines
[limited workflow]
K8s allows us to abstract the way in the individual machines and treat the
entire cluster as a single logical machine. the application becomes a first
class citizen which enables us to manage them using high level abstractions.
In k8s, we can describe a set of application and how they should interact with
each other and the let k8s figure out how to make that happen.
Versioning:
k8s versions are expressed using semantic versioning
a K8S version is expressed as MAJOR.MINOR.PATCH
there is a new patch release whenever needed.
- generally, there is a bout 2 to 4 weeks between patch releases, except when
a critical bug or vulunerability is found.
- there is a new minor release approximately every 3 months
- at any given time, 3 minor releases are maintained.
about 9 months maintaince for a minor release.
cluster is a set of nodes
node: can be VM or physical machine.
unlike vm, containers are not meant to host an os, containers are meant to run specific
task or process
default command in entrypoint, defaut args in CMD both in json array
override in pod definitation Enterypoint -> command
CMD -> args
ENv variable in pod definiation
containers:
env:
-name: KEY
valueFrom:
(configMapKeyRef|secretKeyRef)
-----
k8s node:
control
containerd <-- kubelet(r/w etcd)[k8s part that interfaces the container runtime] <--
[scheduler|Api server|controller-manager(taking care of generic cluster ops)] <--- DB[etcd]
worker
:
[infrastructure] + [os] + [container runtime] + [kubelet kube-proxy]
kubelet kube-proxy : specific k8s processes
kubelet:
- kubernetes agent run on every node, it's mission is to connect the node
to the controlplace of the cluster .
- perform liveness check[liveness probes] for the container running into the pods
kube-proxy: one of network component of k8s.
------
[etcd (key-value DB, SSOT)]
|[gRPC]
[Controller Manager(Controller Loops)] <-protobuf-> [API Server(REST API)] <--protobuf----> [Scheduler(Bind Pod To Node)]
API Server, Scheduler, and Controller Manager are stateless. everything is stored in etcd[single source of truth]
User communicate with API SERVER[mostly REST], API SERVER communicate with work nodes via their
kubelet[PROTOBUF].
Kubelet communicate with [container runtime](CRI)
Pod is a group of containers working together run on the same machine has the same IP.
Together or separate
- should we put e.g. a web application server and a cache together?
("cache" being something like e.g. Memcached or Redis)
- Putting them in the same pod means:
- they have to be scaled together.
- they can communicate very efficiently over localhost.
- Putting them in different pods means:
- they can scaled separately
- they must communicate over remote IP addresses
- Both scenarios can make sense, depending on our goals
apiVersion, Kind, metadata :- will be in every single api response from k8s
Endpoints only have a plural form
general form of kubectl:
kubectl VERB/ACTION KIND/TYPE NAME_OF_RESOURCE --FLAGS
when we use kubectl logs --selector label -> for each matching pod, there will be a connection with the api server.
--max-log-request
---------------------
in a cloud native environment the application is not directly related to any server, so all information the application needs is stored in the cloud.
---------------------
master node is the brain of the cluster where all decisions are made.
master node contains control plane[scheduler, api server, cluster store, contoller manager, cloud controller Manager: communicate with cloud provider via its api]
API SERVER:
frontend to k8s control plane.
all communications go through API server External and Internal
Exposes Restful Api on port 443.
Authentication & authorization check
Cluster Store:
Stores configuration and state of the entire cluster.
Distributed Key/Value data store
Single source of truth[etcd]
- etcd is the backend for service discovery and stores cluster state and configuration.
Scheduler:
Watches for new workload/pods and assigns them to a node based on several scheduling factors[is node healty ?, has enough resources?, port available?, affinity and anti affinity rules, etc]
Controller Manager:
Daemon that manages the controller loop.
-Controller of controllers.
[node controller]whenever the current state don't match the desired
state. it react to reach to the desired state.[#of running nodes for example]
it has several controllers each one of them is responsible for managing/control part of the system. [replica, endpoint, namepaces, node ...] controller.
Cloud controller manager:
interact with underlying cloud provider.
congigure[load balancer, storage, instance]
Master runs all cluster's control plane services..
------------------------------------------------------------------
worker node do the work [running the application for example].
-Vm or Physical machine running linux.
-provides running environment for the application
-contain pod
-Components:
Kubelet container-runtime kube-proxy
------------------------------------------------------------------------
kubelet:
Main agent run on every single node.
Recives pod definitions from api server.
interacts with container runtime to run containers associated with the pod.
reports node and pod state to the master node.
container-runtime:
-pull image from container registries[docker hub, ecr, gcr, local registery...]
-starting containers [running container and abstract its management for k8s
-stop containers
- CRI - container runtime interface
-for 3rd party container runtime[docker was the default for cri but now deprecated and k8s using containerd]
Kube-proxy:
agent runs on every node through DaemonSets
Responsible for :
Local cluster networking [make sure each node has a unique ip address]
Routing network traffic to load balanced services.
_-----------------------------------
Pod[abstraction of a server] is a smallest deployable unit.
Pod contain Main-Container, may or may not has Init-container[executed before main-container], side-containers[0 to *][support main-container may be as a proxy to main-container], Volumes[share data between container].
Pod has unique IP address.
Pod:
-Group of 1 or more container
-Represent running process
-Share network and volumes
-Never Create Pods on its own. Use Controller instead.
-Ephemeral and Disposable[not long-lived]. so if u create it on its own and it down k8s will not know about it and won't create a new one.
- Pods on its own don't self heal.
- Never deploy pods using kind:Pod
kubectl get pods -o yaml --> get yaml behind the pod.
any thing written after -- will be run on the pod not part of kubectl options.
Creating pod Declarative vs Imperative Configuraion
-Imperative:
kubectl run [pod-name] --image=[name] ..
kubectl delete po podName
restartPolicy on pod for the running container inside the pod not for the pod
- [Learning, Troubleshooting, Expermienting]
-Declarative:
Using configuration file
-[Reproducible in any env] Best-Practice.
- Generating yaml file:
- kubectl run podName --image= --dry-run=client -o yaml > yamlFile
to access pod for testing purpose:
kubectl port-forward pod/[name] [host-port]:[pod-port ]
kubectl get pods --> return pods in the current namespace.
-Use the
kubectl edit pod <pod-name> command to edit pod properties.
- kubectl get pod <pod-name> -o yaml > pod-definition.yaml
get namespaces:
kubectl get namespaces || kubectl get ns
specify namespace to apply on
kubectl get pod -n [namespace]
- each namespace has its own DNS zone.
- so we can run different copies from our app in diff ns
to change ns
- kubens [name]<-- doesn't come by default
- kubectl config set-context --current --namespace=[name] <- edit .kube/config
u can access svc in specific namespace curl svc.ns
- ns organize app, not isolate it.
to isolate u can use NetworkPolicy
while working with multiple clusters env insall [kube-ps1]. it shows
the current cluster/context/namespace
to format output use --output/-o [wide|json|yaml]
when createing pod by default it get event scheduled -->pulled -->created --> started.
to execute command on pod
kubectl exec [-it] [podName] -c [specify-container-if-pod-contain-more-than-one] -- [command]
to expose port for testing and experimenting purpose only. in prod we use services[NodePort Service]:
kubectl port-forward [podName] [hostPort]:[containerPort]
kubectl port-forward pod/[podName] [hostPort]:[containerPort]
kubectl port-forward service/[podName] [hostPort]:[containerPort]
kubectl port-forward anyResourceType/[podName] [hostPort]:[containerPort]
restartPolicy for the container in the pod not for pod itself.
to show k8s resources:
kubectl api-resources
https://kubernetes.io/docs/reference/kubectl/cheatsheet/
-------------------------
Deployments
- Manages release of new application
- Zero downtime deployments
- Create ReplicaSet.
- Manage pods with deployments
Deployment Strategy:
- Recreate -> delete all pods running before creating the new for the next version [dangerous: downtime]
- Rolling Update: [default] forwarding traffic to previous version. until finish creation of the new one.
keep in ur mind: when u change the ui, and the old & new pods are available. how will we deal with that?
spec.strategy.type=[Recreate|RollingUpdate]
Recreate
- useful when u cannot run multiple version for the application at the same time
[DB schema]
spec.strategy.minReadySeconds : #seconds new pods should be ready to be considered healthy. default 0
spec.strategy.revisionHistoryLimit: default 10
spec.strategy.rollingUpdate.maxSurge -> extra pods above desired ones. default 25%
spec.strategy.rollingUpdate.maxUnavailable -> unavilable pods during update. default 25%
read: [instructor use --save-config in creating a deploy, and --record]
--record : record the command in the deployment revision history.
--save-config: save configs in resource's annotations.
- when we update a Deployment with a new Pod definition:
- a new Replia Set is created with the new pod definition.
- that new Replica Set is progressively scaled up.
- meanwhile, the old Replica Set(s) is(are) scaled down.
ReplicaSet always make sure that the desired number of pods is always running.
*to pause rollout
kubectl rollout pause deployment [name]
k set image deploy/frontend simple-webapp=kodekloud/webapp-color:v2
-
kubectl rollout history deploy [name]
describe specific one:
kubectl rollout history deploy [name] --revision=[]
status of rollout:
kubectl rollout status -f
kubectl rollout undo -f
kubectl rollout undo deploy [name] --to-revision=[]
metadata -> allows us to specify labels to access our resources.
kubectl scale deploy [name] --replicas={num} -> scale the #of current running replicas.
kubectl scale deploy [name] edit the #of replicas manually.
start imperatively
kubectl create deploy .. --replicas
change in #of replicas is not reflected in rollout history because it is not a change in the app
it's change in how k8s is dealing with the app.
=======
Imagine we have n nodes, one node is down.
k8s will run the pods that were on the unavaiable node on the other nodes.
when the node become. now it's empty. k8s doesn't re-balance the pods on the nodes
when we scale up, because that node is empty. k8s will schedule all pods on that node.
=======
DaemonSet and Stateful set
- a DaemonSet is a deploy that starts one po instance on every node in the cluster
- when nodes are added or removed, the DaemonSet automatically change the #of Po accordingly
=======
AutoScaling:
- In real clusters, Po are often automatically scaled based on resource usage properties that
are collected by the metrics server.
- The Horizontal po autoscaler observes usage statistics and after passing a threshold, will
add additional replicas
=====
ReplicaSet
- ensure desired number of pods always running using [control loop].
deployment --create --> replicatset.
- ReplicaSet implement a background control loop checks the desired #of pods are always present on the cluster.
to rollback:
kubectl rollout undo deployment [name] --to-revision=[revision]
to view history:
kubectl rollout history deployment [name] --revision=
k8s by default keep 10 revision of ur deployments to change that. spec.revisionHistoryLimit
- replicasets give full compatibility with previously use replicationcontrollers
======
Observability:
- pod status [Pending(when scheduler try to figure out where to place the pod ),
ContainerCreating,
Running
]
conditions arr of T or F: [PodScheduled, Initialized, ContainerReady, Ready]
- Ready : indicate app inside the pod is running and ready to accept traffic.
- service rely on pod Ready cond to route traffic.
Readiness Probes: in case of web app u can hit an endpoint to check,
in case of db u can check if a particular port is listening,
using exec to run script inside a container to exist successfully if the app is ready.
under Containers def> readinessProbe:
tcpSocket httpGet: exec:
port: 5432 path: /api/ready command:
port: 8080 - cat
- /app/is_rea
- U CAN set an initial time to wait initialDelaySeconds:, periodSeconds, failureThreshold[3]
Liveness Probes: same as readiness but to check if the app inside the container remain healthy.
Monitoring:minikube addons enable metrics-server
kubectl top node: This provides the CPU and memory consumption of each of the nodes.
kubectl top pod: view performance metrics of pods in Kubernetes.
======
- node level metrics[#nodes in cluster, resource consumption]
- pod level metrics [#pods, resource consumption]
Metrics Server: save in-mem.
kubelet contains cAdvisor, cAdvisor is responsible for retrieving performance metrics from pods
and exposing them through the kubelet API to meet the metrics available for the metrics server.
=====
taint and toleration
taint -> node.
=======
Node Affinity:
- to ensure some pods deployed on a particular nodes.
===========
Services
- a stable endpoint to connect to a pod or a group of pods.
- expose a logical set of pods
- applying round-robin load balancing to forward traffic to specific pods.
- set of pods is targeted by a selector
- stable IP address
- Stable DNS Name
- stable Port
Types:
- ClusterIP(default) [access internally "headless"]
- Only Internal access. No External [service-name:port]
- NodePort
- allocate specific node port on the node that forward to the svc cluster ip add
- ExternalName
- work on DNS names; redirection is happening at a DNS level, useful in migration
- LoadBalancer
kube-controller-manager will continuously scan for pods that match the selector and include
these in the service
Decoupling:
- service exist independently from the [deployment]app they provide access to.
- the only thing they do, is watch for Pods that have a specific label set
matching the selector that is specified in the svc
- that mean that one service can provide access to pods in multiple deploy,
and while doing so, k8s will automatically load balance between these pods
[used in canary deploy].
======
[service] NodePort:
- Allow you open a port on all nodes
- Allowed Port Range: [30000-32767]
- if client choose to send the req to Node with no healthy pods, Node responsible to send
the req to be handled by another node with healty pods.
- spec.ports[port -> port service listening on, targetPort -> container in service listening on,
nodePort: exposed port on the node] -> if not specified, it will be chosen randomely.
if u specify a port the targetPort will take the same val if it is not be specified
- nodePort must be unique.
- Under the hood: kube-proxy sets up a bunch of iptables rules on our nodes****
- Sometimes, it's the only available option for external traffic
(e.f. most clusters deployed with kubeadm or on-premises)
DisAdvantages:
- One service per port.
- If Node IP address change then we have a problem.
=====
LoadBalancer
- An external load balancer is allocated for the service (typically cloud load balancer, AWS..
- this is available only when the underlying infrastructure provides some kind of "load
balancer as aservice:
- each service of that type will typically cost a little bit of money.
- Ideally, traffic would flow directly from the load balancer to the pods.
- In practice, it will often flow through NodePort first.
- Standard way of exposing Applications to the internet.
- Create a load balancer per service.
- AWS & GCP - Network Load Balancer NLB[cloud controller manager. handle that].
====
k8s default service [kubernetes]:
- automatically created for us. to make us able to talk to k8s api from and within the app.
- all communication go via kubernetes-api-server[kube-apiserver-minikube]
===
Labels:
- kubectl get pods --show-labels
spec.template.metadata.labels[app|environment|tier]
select pods:
spec.selector.matchLabels[key:value]
all match or nothing
- kubectl get pods --selector="key=value,.." === kubectl get pods -l key=value,k2=v2
- kubectl get po -l 'labelKey1 in (v1, v2), labelKey2 notin (v1)'
- deploy find its pods using selector
- svc find its endpoints pod using a selector
- used to facilitate resource management and selection
===
annotations:
Annotations is an unstructured key/value map stored with resource that may be set by external tools to store
and retrieve arbitary metadata, they are not querable and should be preserved when modifying objects. [used by external tools]
spec.template.annotations
===
DNS:
- k8s use coreDNS
- each pod has a file /etc/resolv.conf contains the ip of the nameserver
-----
when a sa is created, it automatically create a token that used by the external
app to authenticate when access to k8s api - create a secret resource to store the token.
etcd is a consistent and highly-available key value store used as Kubernetes' backing store for all cluster data.
A Kubernetes cluster already comes with a ServiceAccount, the default ServiceAccount that lives in the default namespace. Any Pod that doesn’t explicitly assign a ServiceAccount uses the default ServiceAccount.
the API server creates a Secret holding the API token and assigns it to the ServiceAccount. The Secret and token names use the ServiceAccount name as a prefix.
- kubectl create serviceaccount build-bot
--declarative--
apiVersion: v1
kind: ServiceAccount
metadata:
name: build-bot
----
to use kubectl without providing the key,cert,ca
u set them in .kube/config file and they will be provided by default for every commands.
in config file u define ur clusters, users, context[which user and which cluster]
u don't create any new user in config , u add the already existing one
current-context: user@cluster
cmd:
kubectl config view
k config use-context user@cluster -> will update current-context field in the config file.
k config -h
O'relliy
RBAC:
- RBAC consists of three key building blocks.
Subject
The user or process that wants to access a resource[ user, a group, or a ServiceAccount.]
Resource
The Kubernetes API resource type (e.g., a Deployment or node)
Verb
The operation that can be executed on the resource (e.g., creating a Pod or deleting a Service)
Authentication strategy ------ Description
X.509 client certificate ------- Uses an OpenSSL client certificate to authenticate
Basic authentication -------- Uses username and password to authenticate
Bearer tokens --------- Uses OpenID (a flavor of OAuth2) or webhooks as a way to authenticate
---
The Role API primitive declares the API resources and their operations this rule should operate on.
The RoleBinding API primitive binds the Role object to the subject(s).
To define new Roles and RoleBindings, you will have to use a context that allows for creating or modifying them, that is, cluster-admin or admin.
Roles and RoleBindings apply to a particular namespace. You will have to specify the namespace at the time of creating both objects. Sometimes, a set of Roles and Rolebindings needs to apply to multiple namespaces or even the whole cluster. For a cluster-wide definition, Kubernetes offers the API resource types ClusterRole and ClusterRoleBinding. The configuration elements are effectively the same. The only difference is the value of the kind attribute
--verb for defining the verbs aka operations, and --resource for declaring a list of API resources.
- kubectl create role read-only --verb=list,get,watch --resource=pods,deployments,services
--section rules lists the resources and verbs. --
apiVersion: rbac.authorization.k8s.io/v1
kind: Role
metadata:
name: read-only
rules:
- apiGroups:
- ""
resources:
- pods
- services
verbs:
- list
- get
- watch
- apiGroups:
- apps
resources:
- deployments
verbs:
- list
- get
- watch
---------
-----Rolebinding--
apiVersion: rbac.authorization.k8s.io/v1
kind: RoleBinding
metadata:
name: read-only-binding
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: Role
name: read-only
subjects:
- apiGroup: rbac.authorization.k8s.io
kind: User
name: johndoe
-----------
aggregate rules
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
name: []
namespace: []
aggregationRule:
clusterRoleSelectors:
- matchLabels:
- matchLabels:
- matchLabels: ..
------
Default ClusterRole ----Description
cluster-admin:
Allows read and write access to resources across all namespaces.
admin:
Allows read and write access to resources in namespace including Roles and RoleBindings.
edit:
Allows read and write access to resources in namespace except Roles and RoleBindings. Provides access to Secrets.
view:
Allows read-only access to resources in namespace except Roles, RoleBindings, and Secrets.
---
kubectl explain [resource] -> show what to define in yaml. keys are case sensitive.
explain pods.spec.volumes
kubectl explain --recursive [resource] -> option for each keys
kubectl apply -f [].yaml -> update mechanism not available for pod. presented to deploy
s
-------------------------
Multi-Container Pods: [logging, monitoring, syncing]
The essence is that the main container and the sidecar container have access to shared resource to exchange information. [Volume ..]
- sidecar container: a container that enhances the primary application, for instance for logging [Istio service mesh: monitor and shape traffic in a k8s env. fetch traffic statistics about the real app and istio injects sidecar in the real app]
- Ambassador container: a container that represents the primary container to the outside world, such as a proxy.
- Adapter container: used to adopt the traffic or data pattern to match the traffic or data pattern in other applications in the cluster.
-- sidecar containers, etc. are not defined by a specific Pod propertites; from a k8s API resource, it's just a multi-container pod.
----
Init Container
- initial container in a pod that complete a task before the "regular" container is started.
- if init container has not run to completion, the main container is not started.
----
kubectl describe pod podname
to see all parameters and their current settings as they currently are in the [etcd -> store in json format]
----
to view pid in pod don't have ps
cd /proc
ls
u will find pid
cat {pid}/cmdline
---
if shell is pid 1 and u run exit -> u will shut down the container so in this case use CTRL+P CTRL+Q
----
logs
- the pod entrypoint application does not connect to any STDOUT
- Application output is sent to the k8s cluster.
- use kubectl logs to connect to this output
------------
securtyContext:
define privilage and access control settings
for a Pod or container, and includes the following:
- Discretionary Access Control which is about permissions to access an object.
- Security Enhanced Linux, where security labels can be applied
- Running privileged or unprivileged user.
- Using Linux capabilities.
- AppArmor, which is an alternative to SELinux.
- AllowPrivilegeEscalation which controls if a process can
gain more privileges than its parent.
use kubectl explain for a complete overview.
----
Managing Jobs
- Pods normally created within deployment to run forever.
- To create a Pod that runs up to completion, use Jobs instead.
- Jobs are useful for one-shot tasks, like [backup, calculation, batch processing and more]
- Use spec.ttlSecondAfterFinished to clean up completed Jobs automatically.
- create one or more Pods and ensures that a specified #pods successfully terminate.
-ex: can be configured to run multiple pod in parallel; u need to get data from multiple servers.
-ex: retrieve data from dbs and cache it in redis; backup; move data between DBs.
- once a job is deleted its pods are removed.
Jobs Types:
- 3 differents job types can be started, which
is specified by [completions] and [parallelism] parameters:
- Non parallel jobs: One Pod is started, unless the pod fails
- completions=1
- parallelism=1
- Parallel Jobs with a fixed completion count: the job is complete
after successfully running as many times as specified in jobs.spec.completions
- completions=n
- parallelism=m
- Parallel Jobs with a work queue: multiple Jobs are started,
when one completes successfully, the job is complete.
- completions=1
- parallelism=n
spec.template.spec.restartPolicy[Never|OnFailure]
spec.completions:
spec.parallelism:
spec.activeDeadlineSeconds:
-----
CronJobs used for task that need to run on regular basis.
when running it a job being scheduled. job will start a pod.
- names mist be less than 52 because k8s append some extra info.
kubectl create cronjob every3mins --schedule="*/min hr dd "
cron format: * * * * *
min hr (dayOfMonth) mm dayOfWeek(sunday=0..6)
run at 22:30 every monday
30 22 * * 1
additional Cron Formats[non-standard]: @hourly @daily @monthly @yearly @weekly
*/30 * * * * every 30 mins
spec.concurrencyPolicy: [Allow|Forbid|Replace] <== if job is running and the time to run other job pass, are u allow overlapping or not.
spec.schedule
spec.jobTemplate
.spec.backoffLimit to specify the number of retries before considering a Job as failed.
----------------------
kubectl create deployment web --image=nginx
- client send to API Server
- API Server write to etcd [Depoyment] respond to the user deployment is created.
- Controller manager notice the new deployment in etcd via API SERVER, and create
the replicaSet corresponding to that definiation, and write to etcd
- another controller manager see the ReplicaSet through API SERVER create
it's pods in PENDING state.
- Scheduler check if their is a resourc limit on the pending pod,
and assign it to worker node.
- kublet on the worker node connect to the API SERVER and reported
to create the pod and respond to the API SERVER with the state of it.
------------
K8s network model:
- TL, DR:
our cluster (nodes and pods) is one big flat IP network.
- In detail:
- all nodes must be able to reach each other, without NAT.
- all pods must be able to reach each other, without NAT.
- pods and nodes must be able to reach each other, without NAT.
- each pod is aware of its IP address (no NAT)
- Pod IP address are assigned by the network implementation
- kubernetes doesn't mandate any particular implementation
--
1 the pod to pod network
2 the pod to service network:
- provide internal communication and load balancing.
- is generally implemented with kube-proxy(or e.g. kube-router)
3 Network policies
---
Resources Limitations
- By default, a Pod will use as much CPU and memory as necessary to do its work.
- This can be managed by using Memory/CPU requests and limits in
pod.spec.containers.resources
- A request is an initial request for resources.
- A limit defines the upper threshold of resources a Pod can use.
- Memory as well as CPU limits can be used.
- CPU limits are expressed in millicore or millicpu, 1/1000 of a CPU core
- So, 500 millicore is 0.5 CPU
- When being scheduled, the kube-scheduler ensures that the node running
the Pods has all requested resources available
- If a Pod with resource limits cannot be scheduled, it will show a status of
Pending
- when using a deployment, use [kubectl set resources] to change resource limitations
on running applications. u cannot do that for pods because pods don't updateable.
- pod/deployment resource limitations can be combined with quota on
namespaces to restrict these applications in specific namespace only.
- Try not to force resource to deletion, it may bring them in an
unmanageable state.
kubectl delete all --all
don't do this: kubectl delete all --all -A --force --grace-period=-1
----
Networking in K8S
there is an [internal network]pod-network connect all pods. each pod given an ip address.
there is an [internal] cluster network exposed to another network (external [node network])
we have services that allow cluster network to access pod:
- clusterIp: service available on the cluster net[used to expose internal compnents of the
app to the other compnents; for example DB]
- nodePort: service on the cluster net; expose high port on nodes.
- those node ports are redirecting to the node-port service side.
* now, user can access node on that port, but we don't user to be aware of the node ip so
there is another service called Ingress the expose our app to the outside world
-------------
Ingress
- Controller [contains LoadBalancer implementation] + API resource
- controller connects to the cluster nodes.
- as a resource in the cluster it has direct connection to the other svc
- url resolver --via DNS--> connect to the controller in Ingress -> go to api resource -->
now packets can be forwarded to the backend pods
- Ingress is used to provide external access to internal Kubernetes cluster resources
- To do so, Ingress uses a load balancer that is present on the external cluster.
- This Load Balancer is implemented by the ingress controller
- As an API resource, Ingress uses selector label to connect pods that are used as a
service endpoints.
- To access resource in the cluster, DNS must be configured to resolve to the Ingress
Load Balancer IP.
- Ingress expose HTTP and HTTPS routes from outside the cluster to services within the
cluster
- Traffic routing is controlled by rules defined on the Ingress resource
[which url redirect to which svc in the cluster]
- Ingress can be configured to do the following:
- give services externally-reachable URLs
- Load balance traffic
- Terminate SSL/TLS
- Offer name based virtual host
- Controllers:
- Creating ingress resources without ingress controller has no effect.
- Many Ingress COntroller exist:
- nginx, haproxy, traefik, kong, contour.
Rules:
- An optional host. if no host is specified, the rule applies to all inbound
HTTP traffic.
- A list of paths (like /testpath). each path has its own backend. Paths can
be exposed as a POSIX regular expression.
- The backend, which consists of either a default backend in an Ingress
controller for incoming traffic that doesn't match a specific path.
Ingress Backends:
- A service backend relates to a k8s service.
- A resource backend typically refers to cloud based object storage,
where Ingress is used to provide access to storage resources
- Resource backends and svc backends are mutually exclusive:
- either u do it for svc, or u do it for resources
- A defaulttBackend can be set to handle traffic that doesn't deal with
any specific backend.
Path Types:
- The Ingress pathType specifies how to deal with path requests.
- The Exact pathType indicates that an exact match should occur
- if the path is set to /foo, and req /foo/ there is no match.
- The Prefix pathType indicates taht the request path should start with
- if the path set to /, any requested path woll match
- if the path set to /foo, /foo as well as /foo/ will match
Ingress Types:
- Ingress backed by a single service: there is one rule that defines access
to one backend Service
- kubectl create ingress single --rule="/files=fileservice:80"
- simple fanout: there are 2 or more rules defining different paths that refer
to different services
- kubectl create ingress single --rule="/files=fileservice:80" \
rule="/db=dbservice:80"
- Name-based virtual hosting: there are 2 or more rules that route requests
based on the host header
- Make sure there is DNS entry for each host header
.kubectl create ingress single --rule="my.examples.com/files*=fileservice:80" \
rule="my.example.org/data*=dataservice:80"
IngressClass
- used to set a specific ingress controller as the cluster default.
- each ingress resource should specify a class, which refers to the cluster
default IngressClass
Network Policies
- By default, there are no restrictions to network traffic in k8s.
- Pods can always communicate, even if they're in other NameSpaces
- To limit this, Network Policies can be used.
- Network Policies need to be supported by the network plugin though.
- If in a policy there is no match, traffic will be denied.
- a network policy without any rules -> nobody can communicate.
- If no Network Policy is used, all traffic is allowed.
Plugin for minikube [minikube start --cni=calico] -verify-> [kubectl get pods -n kube-system]
Network Policy Identifiers:
- when definig Pod- or NameSpace-based NetworkPolicy, a selector label is used to specify
what traffic allowed to and from the Pods that match the selector
- In Network Policies, three different identifiers can be used to mach network traffic
- podSelector: allows access to a Pod with the corresponding selector label.
- namespaceSelector: allows incoming traffic from namespaces with the matching
selector label.
- ipBlock: used to specify a range of cluster-external IPs that should get access.
- Network Policies do not conflict , they are additive.
- if a network policy exists, all traffic is blocked. Unless there is a match.
---------------
Storage:
container can has a volume[only support pointing to(NFS or local storage)]
container can has a volume point to any type of storage [..cloud storage, fiber channel, ISCSI]
don't use HostPath in pv in prod env.
- we can configure pods with locally configured volumes, but as App developer
we want to distribute a manifest file and we want the mainfest to be able to
work at our customer site as many customer as possible. so we are not going to
point any site-specific information in the manifest file. --> PVC come in, so we
won't point to a specific type of storage in the pod manifest. we'll point volume
to PVC.
- when u deploying a manifest on customer site u need its admin to create
a persistence volume [a separate Api resource], allow site admin to connect
to any site type of storage, when pod is deployed, it will bound to PV.
* the last step is not a cloud friendly, that's where StorageClass comes in.
- StorageClass: no matter which site u're using, any site if it's local cluster or
an AWS cluster... all of those sites, they have their own specific storage.
.. StorageClass automatically provisioning the storage for u. if PVC requests access
to specific persistence volume and the specific persistence volume is not there,
it can request storageClass and SC will create a persistence volume on demand.
* now, u don't need a site admin anymore. the only thing we do need is a site-soecific
storage, being allocated by a provisioner. so storage provisioner need to be available.
--> finding the right storage provisioner is sometimes challenging.
--in more formal way---
- File stored in container will only live as long as the container itself.
- Pod Volumes are used to allocate storage the outlives a container and
stays available during pod lifetime.
- Pod Volumes can directly bind to a specific storage type
- Pod Volumes can also use persistent volume claims, to decouple
the pod from the request site-specific-storage[size, r/w ].
- A Persistent volume define access to external storage available in a specific
cluster.
- a wide rang of PV types is available.
- Persistent Volume Claim(PVC) are used to connect to PV
- when a PVC is created, it will search for an available PV that matches
the storage request.
- if a perfect match doesn't exist, StorageClass can automatically allocate it.
- in this scenario, the Pod uses the PVC storage type is completely decoupled
from the site-specific storage.
- A PVC bind is exclusive: after successfully binding to a PV, the PV cannot
be used by other PVC anymore
..StorageClass
- provide a way for an admins to describe a class of storage, useful when u
are in an env with diff type of storage are available
- some cloud provide default StorageClass if u create a PVC without specifing StorageClass ur platform will set pvc storageClass to default
- every cluster env]s have their own specific storage type .
- kubernetes StorageClass allows for automatic provisioning of PVs when a
PVC request for storage comes in.
- StorageClass must be backed by a storage provisioner, which takes care
of the colume configuration.
- Kubernetes comes with some internal provisioner, alternatively external
provisioner can be installed using Operators.
..StorageClass..as..a.selector
- Normally, PVC binding to a PV is based on best match.
- While using Storage class in PV as well as PVC, a PVC will only bind to a PV
that has the same StorageClass setting.
- if such a PV is not available, the PVC will stay in a status of pending
PVC(Persistence volume claim):
read-writable or read only storage and a specific amount of storage.
-------------
pod local volume:
pod.spec.volumes -> point to a specific volume type.
--> for testing purpose emptyDir and hostPath are common
emptyDir: a temp directory that will dynamically be created on the host that runs
the pod.
hostPath: a persistent directory that is defined and will outlive the lofetime of
the pod.
hostPath is a little bit better than emptyDir
The volume is mounted through pod.spec.containers.volumeMounts
create deployments
expose svc
-----------------
decouple site specific information from ur generic app code
Providing Variables to kubernetes Application
- k8s does not offer a command line option to provide variables while running
a deployment with [kubectl create deploy ]
- first, use kubectl create deploy mydb --image=mariadb
- second, use kubectl set env deploy mydb KEY=VALUE
- while running a Pod, env var acn be provided, but u shouldn't
run naked Pods
- kubectl run mydb --image=mysql --env="KEY=VALUE"
Understanding the need for decoupling
- It's good idea to separate site-specific information from code.
- Code should be static, which makes it portable, so that it can be used in other
environments
- Variables are site-specific, and for that reason should not be provided in the
the Deployment configuration.
- K8s provides ConfigMaps to deal this issue.
- A ConfigMap can be used to define variablesm and the Deployment just points to
the ConfigMap.
- While using ConfigMaps, site-specifi information is included in the ConfigMap,
which allows you to keep the Deployment manifest files generic.
Understanding ConfigMap
- ConfigMaps are used to provide different types of configuration to an application
- ConfigMaps can be used for three different types of configuration
- Variables
- Configuration Files
- Command Line arguments(not common to use cm for it)
--> configMap created in different ways according to for what they are created.
- Note that according to these different types of configurations, ConfigMaps are created
and used in a very different way!
- If an application refers to a ConfigMap, the ConfigMap should exist in the cluster
before running the applciation.
--
env
envFrom:
- configMapRef:
name: cm
--
single env
env:
- name: KEY
valueFrom:
configMapKeyRef:
name: cm-name
key:
--
volumes:
- name: app-config-vol
configMap:
name: app-config
----
Secret is a special type of configMap that add encoding
- secret allow for storage of sensitive data such as pass, Auth tokens, and SSH Keys.
- some secrets are automatically created by the system, users can also use secrets.
- system-created secrets are important for k8s resources to connect to other cluster
resources.
types:
- docker-registry used for connecting to docker registry <-
- TLS used to store TLS key material
- generic creates a secret from local file, directory or literal value.
kubectl create secret [type] ..
----
- to access k8s API, all k8s resources need access to TLS keys.
because at every single level, TLS key provide authorization between different
k8s components, between ur pods and the backend svc
- these key are provided by secrets and used through ServiceAccounts.
- By using the ServiceAccount, the application gets access tp its secret/
--
Secrets in application
- There a different use cases for using secrets in applications:
- To provide TLS keys to the application:
kubectl create secret tls my-tls-keys --cert=tls/my.crt --key=tls/my.key
- To provide security to passwords:
kubectl create secret generic my-secret-pw --from-literal=password=verysecret
- To provide access to an ssh private key:
kubectl create secret generic my-ssh-key --from-file=ssh-private-key=.ssh/id_rsa
- to sensitive files which would be mounted in the app with root access only:
kubectl create secret generic my-secret-file --from-file=/my/secret
https://kubernetes.io/docs/tasks/administer-cluster/encrypt-data/
https://kubernetes.io/docs/concepts/configuration/secret/#risks
----------
kustomize:
- base yaml, and then have overlay[for each deployment] patch on top of that yaml.
helm:
- more powerful and complex.
- let us to write yaml template, and the helm will evaluate it before applying in k8s cluster
- can used as package manager ,
labels are limited in size, annotations have no limit
labels can be used in selectors, annotations cannot.
----
KubeVirt technology addresses the needs of development
teams that have adopted or want to adopt Kubernetes but
possess existing Virtual Machine-based workloads that cannot
be easily containerized. More specifically, the technology provides
a unified development platform where developers can build, modify,
and deploy applications residing in both Application Containers
as well as Virtual Machines in a common, shared environment.
other use case for vm over container: high security app where the security
of hypervisor and certification are important.
Some other resources, such as namespaces, nodes, and PersistentVolumes,
do not belong to namespaces and should have unique cluster-wide names.
Service belongs to a namespace and gets a corresponding Domain Name Service (DNS)
record that has the namespace in the form of <service-name>.<namespace-name>.svc.cluster.local.
ResourceQuotas provide constraints that limit the aggregated resource consumption per namespace:
ADMIN CAN control the number of objects per type that are allowed in a namespace.[#SVC, #DEPLOY,..]
ResourceQuotas can also limit the total sum of computing resources we can request in a given namespace.
it is best to have enough labels to describe all important aspects of the Pods. For example, having labels to
indicate the logical group of an application, the business characteristics and criticality,
the specific runtime platform dependencies such as hardware architecture, or location preferences are all useful.
The information on the annotations is not intended for querying and matching objects. Instead, it is
intended for attaching additional metadata to objects from various tools and libraries we want to use.
while labels are used primarily for query matching and performing actions on the matching resources,
annotations are used to attach metadata that can be consumed by a machine.
Predictable Demands pattern indicates how you should declare application requirements,
whether they are hard runtime dependencies or resource requirements. Declaring your
requirements is essential for Kubernetes to find the right place for your application within the cluster.
resource requirements, application runtimes also have dependencies on platform-managed capabilities
like data storage or application configuration.
Volumes are an example of a runtime dependency that affects what kind of infrastructure
a Pod can run and whether the Pod can be scheduled at all.
The usage of a hostPort creates another runtime dependency on the nodes and limits where a Pod can be scheduled.
hostPort reserves the port on each node in the cluster and is limited to a maximum of one Pod scheduled per node.
Configurations introduces a runtime dependency of your container to the named ConfigMaps. If not all of the expected ConfigMaps
are created, the containers are scheduled on a node, but they do not start up.
Some of these dependencies limit where a Pod gets scheduled (if anywhere at all),
and other dependencies may prevent the Pod from starting up.
When designing your containerized applications with such dependencies,
always consider the runtime constraints they will create later.
Resources [compressible[cpu, bandwidth], incompressible[mem]]
(we define heap size for a Java application by using the -Xms and -Xmx command-line options.)
The requests amount (but not limits) is used by the scheduler when placing Pods to nodes. For a given Pod, the scheduler considers
only nodes that still have enough capacity to accommodate the Pod and all of its containers by summing up the requested resource amounts.
containers[].resources.requests : Initial resource request for CPU and memory.
containers[].resources.limits : Upper limit until we want our application to grow at max. We don’t specify CPU limits by intention.
memory: