Skip to content

paulgmiller/k8s-pdb-autoscaler

 
 

Repository files navigation

K8s-pdb-autoscaler

Go Report Card GoDoc License CI Pipeline

Table of Contents

Introduction

This project originated as an intern project and is still available at github.com/Javier090/k8s-pdb-autoscaler.

The general idea is that Kubernetes (k8s) deployments already have a max surge concept, and there's no reason this surge should only apply to new deployments and not to node maintenance or other situations where PodDisruptionBudget (PDB)-protected pods need to be evicted. This project uses node cordons or, alternatively, an eviction webhook to signal PDBWatcher Custom Resources that map to a PodDisruptionBudget. A controller then attempts to scale up a deployment that corresponds to the PodDisruptionBudget.

Why Not Overprovision?

Overprovisioning isn't free. Sometimes it makes sense to run as cost-effectively as possible, but you still don't want to experience downtime due to a cluster upgrade or even a VM maintenance event.

Your app might also experience issues for unrelated reasons, and a maintenance event shouldn't result in downtime if adding extra replicas can save you.

Features

  • Node Controller: Signals PDBWatchers for all pods on cordoned nodes selected by PDBs.
  • Optional Webhook: Signals PDBWatcehrs for any pod getting an evicted. See issue #10 for more information.
  • PDB Watcher Controller: Watches PDBWatcher resources. If there a recent eviction singals and the PDB's AllowedDisruotions is zero, it triggers a surge in the corresponding deployment.
  • Scaledown: The PDB Watcher Controller restores the deployment to its original state after a cooldown period when eviction signals stop.
  • PDB Controller (Optional): Automatically creates PDBWatcher Custom Resources for existing PDBs.
  • Deployment Controller (Optional): Creates PDBs for deployments that don't already have them.
graph TD;
    Cordon[Cordon]
    NodeController[Cordoned Node Controller]
    Eviction[Eviction]
    Webhook[Admission Webhook]
    CRD[Custom Resource Definition]
    Controller[Kubernetes Controller]
    Deployment[Deployment or StatefulSet]

    Cordon -->|Triggers| NodeController
    NodeController -->|writes spec| CRD
    Eviction -->|Triggers| Webhook
    Webhook -->|writes spec| CRD 
    CRD -->|spec watched by| Controller
    Controller -->|surges and shrinks| Deployment
    Controller -->|Writes status| CRD
Loading

Installation

Prerequisites

  • Docker
  • kind for e2e tests.
  • A sense of adventure

Install

Clone the repository and install the dependencies:

git clone https://github.com/paulgmiller/k8s-pdb-autoscaler.git
cd k8s-pdb-autoscaler
hack/install.sh

Usage

Here's how to see how this might work.

kubectl create ns laboratory
kubectl create deployment -n laboratory piggie --image nginx
# unless disabled there will now be a pdb and a pdbwatcher that map to the deployment
# show a starting state
kubectl get pods -n laboratory
kubectl get poddisruptionbudget piggie -n laboratory -o yaml # should be allowed disruptions 0
kubectl get pdbwatcher piggie -n laboratory -o yaml
# cordon
NODE=$(kubectl get pods -n laboratory -l app=piggie -o=jsonpath='{.items[*].spec.nodeName}')
kubectl cordon $NODE
# show we've scaled up
kubectl get pods -n laboratory
kubectl get poddisruptionbudget piggie -n laboratory -o yaml # should be allowed disruptions 1
kubectl get pdbwatcher piggie -n laboratory -o yaml
# actually kick the node off now that pdb isn't at zero.
kubectl drain $NODE --delete-emptydir-data --ignore-daemonsets

Here's a drain of Node on a to node cluster that is running the aks store demo (4 deployments and two stateful sets). You can see the drains being rejected then going through on the left and new pods being surged in on the right.

Screenshot 2024-09-07 173336

TODO

Mostly see issues.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Go 89.2%
  • Makefile 6.8%
  • Shell 3.0%
  • Dockerfile 1.0%