In this tutorial, we will be deploying a simple FastAPI app with an endpoint that calculates the entropy of a list of numbers to a local Kubernetes cluster. The goal is to help you understand how to deploy a containerized application on a local Kubernetes cluster, see Kubernetes Key Concepts section in the Appendix for a primer on Kubernetes concepts.
- This is a very simple getting started type example, it's not production ready. This is a local deployment, so it's not really a cluster, but it's a good way to get started with Kubernetes.
- Assumes familiarity with Docker, Kubernetes, and FastAPI (but not necessary for the walkthrough)
- Has only been tested on MacOS.
Before setting up your environment for this project, you need to install the following tools:
We need this Minikube addons to get ingress working and check pod usage metrics later on.
minikube addons enable ingress
minikube addons enable ingress-dns
minikube addons enable metrics-server
We're going to use Minikube to run a local Kubernetes cluster - we're going to fire up 5 replicas of the app and expose it via an Ingress.
- Build the Docker image
docker build -t fastapi-entropy-app .
- You can verify test the image locally by running the container with
docker run --rm -p 80:80 fastapi-entropy-app
and testing the endpoint withcurl http://0.0.0.0:80
. You should get something like:{"message":"Hello World","instance_id":"ddf3de45-0234-4fb0-82d5-0c9a5460a49e"}
- the instance_id is the same for all requests as we've fired up a single containerised instance of the app.
- You can verify test the image locally by running the container with
- Start a local Kubernetes cluster with Minikube
minikube start --driver=docker
(in a new terminal window). - Load the Docker image into Minikube
minikube image load fastapi-entropy-app
- Configure Minikube addons for ingress and metrics-server
minikube addons enable ingress && minikube addons enable ingress-dns && minikube addons enable metrics-server
- Create the deployment
kubectl apply -f k8s/static/deployment.yaml
and view itkubectl get deploy
. - Create the service
kubectl apply -f k8s/static/service.yaml
and view itkubectl get svc
.- At this point we have a service running on the cluster consisting of 5 replicas, and we can access it from within the cluster.
minikube ssh
to ssh into the cluster, then you cancurl <CLUSTER-IP>
from the fastapi-app-service CLUSTER-IP above to check the service is running as expected. Now note that you get different instance_id's as you're hitting different pods thanks to the load balancing.- However, we want to access the service from outside the cluster, so we need to create an ingress.
- Create the ingress
kubectl apply -f k8s/static/ingress.yaml
and add the ingress domain to your hosts file:echo "127.0.0.1 fastapi-app.com" | sudo tee -a /etc/hosts
.- Troubleshooting: First check that the ingress is running and ready with
kubectl get pods -n ingress-nginx
.
- Troubleshooting: First check that the ingress is running and ready with
minikube tunnel
(in a new terminal window) - opens a secure tunnel between the Minikube cluster and the host machine to route the traffic to the serviceminikube tunnel
- (Optional) Open the minikube dashboard
minikube dashboard
to see the cluster resources in a GUI. - Now you can send API requests to the host surfaced by the ingress (A.K.A our locally deployed Kubernetes cluster!):
curl -X POST fastapi-app.com/entropy -H 'Content-Type: application/json' -d '{"values":[1,12,2,3,2,43,2,4,2,42]}'
Now the fun part, lets load test the app. We'll use Locust to do this - it's a simple Python based load testing tool.
First we need to install the dependencies:
poetry install
Now we can run the load test:
poetry run locust -f load_test/locustfile.py --host=http://fastapi-app.com
Then open the Web UI at http://localhost:8089 and start the load test (you can change the number of users and spawn rate, I used 10 users and 1 spawn rate).
Spawn rate is the rate in which users are spawned. So if you have 100 users and a spawn rate of 10, then 10 users will be spawned every second.
Number of users is the total number of users that will be spawned.
You can experiment with different values to see how the app performs under load. Try ramping up the load test, then you can view the pods handling the load with:
kubectl top pods
There's a delay in load scaling and the metrics to keep re-running kubectl top pods
and you should see the pods usage metrics increase. You can scale up the number of replicas to 10 pods with:
kubectl scale --replicas=10 deployment/fastapi-app-deployment
We can use the Horizontal Pod Autoscaler to scale the number of pods based on CPU usage. In k8s/autoscale.yaml
we create
a HorizontalPodAutoscaler
resource that will scale the number of pods based on the CPU usage of the pods. We set a minimum of 2 pod and a maximum of 10 pods.
Create the resource: kubectl apply -f k8s/autoscale.yaml
Then you can test that the autoscaler is working by running the load test again (e.g. spawn rate - 10, num users - 50):
Check the pod usage metrics:
kubectl top pods
You should see the number of pods eventually increase as the load increases (you can lower the averageUtilization
in the k8s/autoscale.yaml
to make it scale up quicker).
If you run into issues, ask chatGPT for help, 90% of the time it will guide you to a solution.
Kubernetes is a container orchestration system that automates the management and deployment of containerized applications. It provides a set of core concepts that are used to define and manage the components of a Kubernetes cluster. Here are some of the core concepts of Kubernetes:
- Pods: A Pod is the smallest and simplest unit in the Kubernetes object model. It represents a single instance of a running process in a cluster. Pods can contain one or more containers, and all containers in a Pod share the same network namespace, IP address, and storage volumes.
- Replication Controllers: A Replication Controller ensures that a specified number of replicas of a Pod are running at any given time. It replaces or automatically restarts Pods that fail or are deleted.
- Deployments: A Deployment is an abstraction that provides declarative updates for Pods and Replication Controllers. It allows you to define the desired state of your application (e.g. the number of replicas and the container image to use) and the system takes care of making the necessary updates to reach that state.
- Services: A Service provides a stable endpoint for accessing one or more Pods in a cluster. It abstracts the network details of the Pods from the clients that need to access them, and can load balance traffic across multiple replicas of a Pod.
- Namespaces: Namespaces provide a way to group resources together in a cluster. Each Namespace has its own set of resources, including Pods, Services, and Deployments, and can be used to isolate different parts of an application or different teams.
- Volumes: Volumes provide a way to persist data in a cluster. They can be used to store data that should survive the lifetime of a Pod, such as logs or configuration files. Kubernetes supports different types of volumes such as HostPath, EmptyDir, ConfigMap, Secret etc.
- Ingress: Ingress allows external users to access Services in a cluster by providing a single point of entry. It allows to configure rules to route external requests to different Services based on the URL path or hostname.
- ConfigMaps: ConfigMaps allow you to decouple configuration artifacts from image content to keep containers portable. You can use a ConfigMap to store configuration files, environment variables, command-line arguments, or other data that a container needs to run.
- Secrets: Secrets allow you to store sensitive information, such as passwords, tokens, or certificates, in a Kubernetes cluster. They are similar to ConfigMaps but with added encryption and more strict access controls.