Kubernetes quota management and InstaSlice

Quota management in Kubernetes is essential for controlling and managing the allocation and consumption of resources within a cluster. It ensures fair and efficient resource utilization, prevents resource contention, and has a promise to provide stability in a multi-tenant environment.

We use GPU memory for quota management because GPU memory is dominant factor to create new MIG, for instance check out profile 5 in this document: https://docs.nvidia.com/datacenter/tesla/mig-user-guide/index.html#a100-profiles . Single CI is lost due to unavailability of GI or GPU memory, also most inference servers will keep KV cache and inference requests in GPU memory making GPU memory a scarce resource.

Integration steps

Apply sample quota present

    kubectl apply -f samples/resource-quota.yaml

Submit sample pod

    kubectl apply -f samples/test-pod.yaml

When the pod starts running quota should be exhausted, submit the same pod with different name

    kubectl apply -f samples/test-pod.yaml

Kubernetes quota's should block the request until the previous pod completes or is deleted from the system.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

instaslice_kube_quota_int.md

instaslice_kube_quota_int.md

Kubernetes quota management and InstaSlice

Integration steps

Files

instaslice_kube_quota_int.md

Latest commit

History

instaslice_kube_quota_int.md

File metadata and controls

Kubernetes quota management and InstaSlice

Integration steps