Kubernetes Autoscaling

You are able to scale automatically the clusters you create by using Kubernetes to optimize resource usage. You are able to do this manually, however, Horizontal Pod Autoscaler is a built-in component that can do this automatically.

Kubernetes has to have a Metrics Server to collect the metrics of the pods. To provide metric via the Metrics API, a metric server monitoring must be deployed on the cluster. Horizontal Pod Autoscaler uses this API to collect metrics.

Don't forget to change your deployment name according to yours before starting how it is done.

In Metric Server installation, cloud providers usually come as deployed, if you are using custom a kubernetes or a non-deployed structure, you must deploy to the metric server.

We can check if metric-server is installed using the command below.

kubectl get pods --all-namespaces | grep -i "metric"

You are going to see a screen exactly like the output below.

kube-system metrics-server-5bb577dbd8-7f58c 1/1 Running 7 23h

1. Metric Server Installation

For this, download the components.yaml file on the master.

wget https://github.com/kubernetes-sigs/metrics-server/releases/latest/download/components.yaml

and add the following line to line 132 of the file.

--kubelet-insecure-tls

The lines are going to seem exactly as below.

spec:
       containers:
       - args:
         - --kubelet-insecure-tls
         - --cert-dir=/tmp
         - --secure-port=4443
         - --kubelet-preferred-address-types=InternalIP,ExternalIP,Hostname
         - --kubelet-use-node-status-port
         image: k8s.gcr.io/metrics-server/metrics-server:v0.4.2

Now let's deploy the yaml file that we make changes on it.

kubectl apply -f components.yaml

Let's check whether everything is working properly.

kubectl get apiservices |grep "v1beta1.metrics.k8s.io"

The output of the command should be as follows.

v1beta1.metrics.k8s.io                 kube-system/metrics-server   True        21h

2. Configuring Horizontal Pod Autoscaling

If everything is fine, let's make a small change in our yaml file in Ant Media Server.

kubectl edit deployment ant-media-server

Edit and save the following lines under the container according to yourself.

        resources:
          requests:
            cpu: 1000m

What does 1000m mean: Kubernetes has a new metric called Millicores which is used to measure CPU usage. It is a CPU core divided into 1000 units (milli = 1000). 1000 = 1 core

Let's check the accuracy of the value we entered using the command below.

kubectl describe deployment/ant-media-server

Now that the deployment is running, we're going to create a Horizontal Pod Autoscaler for it. To create this, you are able to run the following command.

kubectl autoscale deployment ant-media-server --cpu-percent=60 --min=1 --max=10

or you can use the following yaml file.

kubectl create -f https://raw.githubusercontent.com/ant-media/Scripts/master/kubernetes/ams-k8s-hpa.yaml

In the above configuration, we set the CPU average as 60% and we set the pods as min 1 and maximum 10. A new pod will be created every time the CPU avarage passes 60%.

You can see the general situation in the following output.

root@k8s-master:~# kubectl get hpa
NAME               REFERENCE                     TARGETS   MINPODS   MAXPODS   REPLICAS   AGE
ant-media-server   Deployment/ant-media-server   3%/60%   1         10         1          20h

New pods are going to be created when we start giving load and the cpu starts to go above 60%. When the cpu avarage value is below 60%, then the pods are going to be terminated according to the load.

root@k8s-master:~# kubectl get hpa
NAME               REFERENCE                     TARGETS   MINPODS   MAXPODS   REPLICAS   AGE
ant-media-server   Deployment/ant-media-server   52%/60%   1         10         4          20h

We can check the number of pods running using the following command.

root@k8s-master:~# kubectl get pods
NAME                                READY   STATUS    RESTARTS   AGE
ant-media-server-7b9c6844b9-4dtwj   1/1     Running   0          42m
ant-media-server-7b9c6844b9-7b8hp   1/1     Running   0          19h
ant-media-server-7b9c6844b9-9rrwf   1/1     Running   0          18m
ant-media-server-7b9c6844b9-tdxhl   1/1     Running   0          47m
mongodb-9b99f5c-x8j5x               1/1     Running   0          20h

3. Some of Commands:

It gives information about AutoScale.

kubectl get hpa

We can check the load of pods running using the command below.

kubctl top nodes

root@k8s-master:~# kubectl top node
NAME         CPU(cores)   CPU%   MEMORY(bytes)   MEMORY%   
k8s-node     111m         5%     717Mi           38%       
k8s-node-2   114m         5%     1265Mi          68%       
k8s-node-3   98m          4%     663Mi           35%       
k8s-node-4   102m         5%     666Mi           35%       
n8s-master   236m         11%    1091Mi          58%

User Guide

Introduction
Quick Start
- Samples List
Installation
Publishing Live Streams
- WebRTC Publishing
  - Screen Sharing
- RTMP Publishing
- Re-streaming Sources & IP Cameras
Playing Live Streams
- WebRTC Playing
  - Force Quality
- HLS Playing
  - AES Encryption
- DASH Playing - CMAF
- Embedded Web Player
- Preview Guide
Conference Call
Peer to Peer Call
Adaptive Bitrate(Multi-Bitrate) Streaming
Data Channel
Video on Demand Streaming
Simulcasting to Social Media Channels
Clustering & Scaling
- Generic Clustering
- Clustering in AWS
- Clustering in Azure
- Multi-Level Cluster
Monitor Ant Media Servers with Apache Kafka and Grafana
WebRTC SDKs
- JavaScript SDK
- WebRTC Android SDK
  - Reference
- WebRTC iOS SDK
  - Reference
- Embedded SDK
Security
- Stream Security
- SSL Setup for Ant Media Server
- REST API Security
  - JWT REST API Filter
- Web Panel
  - IP Filtering
  - JWT REST API for Web Panel
Integration with your Project
- REST Guide
  - cURL Samples
- Web Hooks
- S3 Integration
- User-defined Scripts
- HTTP Forwarding
Advanced
- GPU Usage - Hardware Encoder NVENC
- Install QuickSync Driver and SDK
- Server Configuration
- App Configuration
  - How to Generate AppSettings Javadoc
- Create New Application
- Collect Logs in Cluster
- WebRTC Codecs - H.264 & VP8
- Quality Filter: FPS, Resolution, Bitrate
- Build from Source
- Decrease Boot-up time
- Measure E2E Latency
WebRTC Load Testing
TURN Servers
- Simple TURN Server Installation
- TURN Server Cluster
AWS Wavelength Deployment
- Install SSL
- Configure STUN Server
- Standalone Server Deployment with Cloudformation
- Auto-Scalable Cluster Deployment with Cloudformation

Reference

Troubleshooting

Draft

Proposals

Developer Quick Start
Recording HLS, MP4 and how to recover
Re-streaming update
Git Branching
UML Diagrams

Provide feedback

Saved searches

Use saved searches to filter your results more quickly