Skip to content

Latest commit

 

History

History
91 lines (82 loc) · 2.48 KB

TEAM-SETUP.md

File metadata and controls

91 lines (82 loc) · 2.48 KB

Team Setup

A team in MLBatch is a group of users that share a resource quota.

Before setting up your teams and quotas, please read QUOTA_MAINTENANCE.md for a discussion of our recommended best practices.

Setting up a new team requires the cluster admin to create a project, a user group, a quota, a queue, and the required role bindings as described below.

Create project:

oc new-project team1

Create user group:

oc adm groups new team1-edit-group

Add users to group for example:

oc adm groups add-users team1-edit-group user1

Bind cluster role to group in namespace:

oc adm policy add-role-to-group mlbatch-edit team1-edit-group --role-namespace="" --namespace team1

Specify the intended quota for the namespace by creating a ClusterQueue:

oc apply -f- << EOF
apiVersion: kueue.x-k8s.io/v1beta1
kind: ClusterQueue
metadata:
  name: team1-cluster-queue
spec:
  namespaceSelector: {}
  cohort: default-cohort
  preemption:
    withinClusterQueue: LowerOrNewerEqualPriority
    reclaimWithinCohort: Any
    borrowWithinCohort:
      policy: Never
  resourceGroups:
  - coveredResources: ["cpu", "memory", "nvidia.com/gpu", "nvidia.com/roce_gdr", "pods"]
    flavors:
    - name: default-flavor
      resources:
      - name: "cpu"
        nominalQuota: 8000m
        # borrowingLimit: 0
        # lendingLimit: 0
      - name: "memory"
        nominalQuota: 128Gi
        # borrowingLimit: 0
        # lendingLimit: 0
      - name: "nvidia.com/gpu"
        nominalQuota: 16
        # borrowingLimit: 0
        # lendingLimit: 0
      - name: "nvidia.com/roce_gdr"
        nominalQuota: 4
        # borrowingLimit: 0
        # lendingLimit: 0
      - name: "pods"
        nominalQuota: 100
        # borrowingLimit: 0
        # lendingLimit: 0
EOF

Edit the above quantities to adjust the quota to the desired values. Pod counts are optional and can be omitted from the list of covered resources.

Uncomment all borrowingLimit lines to prevent this namespace from borrowing quota from other namespaces. Uncomment all lendingLimit lines to prevent other namespaces from borrowing quota from this namespace.

Create a LocalQueue to bind the ClusterQueue to the namespace:

oc apply -n team1 -f- << EOF
apiVersion: kueue.x-k8s.io/v1beta1
kind: LocalQueue
metadata:
  name: default-queue
spec:
  clusterQueue: team1-cluster-queue
EOF

We recommend naming the local queue default-queue as AppWrappers will default to this queue name.