A set of tools built to simplify daily driving of cloud resources for individual VM access, Kubernetes batch jobs and miscellaneous useful functionality related to cloud-based ML research
To install the toolset, and get your environment ready to run Kubernetes jobs, you need to:
-
Log into a machine to be used as the job generator and submission terminal. We recommend that this is a google cloud VM with at least 4 CPU cores, or, your local machine -- although doing this on a google cloud VM generally has less probability of issues.
-
Pull and run the controller docker by running:
docker pull ghcr.io/bayeswatch/controller:0.1.0 docker run -it ghcr.io/bayeswatch/controller:0.1.0
-
Clone repository to your local machine, or to a remote machine meant to be the job submission client
git clone https://github.com/BayesWatch/bwatchcompute.git
-
If intenting to develop new features to push to the Github repository, you need to:
- Log into your github account
gh auth login
- Set up your defaul email and name in github
git config --global user.email "[email protected]" git config --global user.name "Your Name"
- Log into your github account
-
Sign in to your gcloud account:
gcloud auth login
-
Select the gcloud project to be tali-multi-modal by running:
gcloud config set project tali-multi-modal
-
Sign into the gpu kubernetes cluster
gcloud container clusters get-credentials gpu-cluster-1 --zone us-central1-a --project tali-multi-modal
-
Set up the environment variables by filling the variables in
tutorial/setup_variables.sh
and then running:source tutorial/setup_variables.sh
A list of commands to make your life easier when working with kubernetes
Listing VM nodes of the cluster
kubectl get nodes
Listing pods of the cluster
kubectl get pods
Listing jobs of the cluster
kubectl get jobs
Read logs of a particular pod
kubectl logs <pod_id>
Read meta logs of a particular pod
kubectl describe pod <pod_id>
Submit a job to the cluster
kubectl create -f job.yaml
To enable autocomplete for kubectl:
The kubectl completion script for Fish can be generated with the command kubectl completion fish. Sourcing the completion script in your shell enables kubectl autocompletion.
To do so in all your shell sessions, add the following line to your ~/.config/fish/config.fish
file:
kubectl completion fish | source
For other autocompletion tools see the autocompletion documentation
You now need to ensure that the kubectl completion script gets sourced in all your shell sessions. There are two ways in which you can do this:
echo 'source <(kubectl completion bash)' >>~/.bashrc
If you have an alias for kubectl, you can extend shell completion to work with that alias:
echo 'alias k=kubectl' >>~/.bashrc
echo 'complete -o default -F __start_kubectl k' >>~/.bashrc
Note: bash-completion sources all completion scripts in /etc/bash_completion.d
.
Both approaches are equivalent. After reloading your shell, kubectl autocompletion should be working. To enable bash autocompletion in current session of shell, run exec bash:
exec bash
This section contains commands that help one configure secrets for kubernetes with the bear minimum of commands. For a more detailed description look at this article.
Create a namespace to store your secrets in:
kubectl create namespace <namespace-name>
Store secrets using the following:
kubectl create secret generic <folder-for-secrets-name> \
--from-literal=PASSWORD=password1234 \
--namespace=<namespace-name>
To see the saved secrets use:
kubectl -n <namespace-name> get secret <folder-for-secrets-name> -o jsonpath='{.data.PASSWORD}' | base64 --decode
kubectl -n <namespace-name> describe secrets/<folder-for-secrets-name>
kubectl -n <namespace-name> get secrets
See the documentation at https://cloud.google.com/kubernetes-engine/docs/concepts/persistent-volumes
the TLDR is, use the below as a guiding manifest for your disk claims and consumptions
# pvc-pod-demo.yaml
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: pvc-demo
spec:
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 30Gi
storageClassName: standard-rwo
---
kind: Pod
apiVersion: v1
metadata:
name: pod-demo
spec:
volumes:
- name: pvc-demo-vol
persistentVolumeClaim:
claimName: pvc-demo
containers:
- name: pod-demo
image: nginx
resources:
limits:
cpu: 10m
memory: 80Mi
requests:
cpu: 10m
memory: 80Mi
ports:
- containerPort: 80
name: "http-server"
volumeMounts:
- mountPath: "/usr/share/nginx/html"
name: pvc-demo-vol