diff --git a/CHANGELOG.md b/CHANGELOG.md index 0aa5c2f..479662f 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -19,4 +19,5 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0 - [Feature] Simple feedforward neural network with MNIST dataset to map input images to their corresponding digit classes - [Feature] CNN architecture training considering COCO dataset for image classification AI applications (**NOTE:** Compute and storage intensive. Read `Download the COCO dataset images` comments on preferred hardware specs) - [Feature] CD workflow for on-demand Azure Container Registry deployments in order to store internal Docker images. -- [Feature] Dockerizing Python (pytorch or tensorflow) applications for ML training and inference \ No newline at end of file +- [Feature] Dockerizing Python (pytorch or tensorflow) applications for ML training and inference +- [Feature] Installation of the [Training Operator for CRDs](https://github.com/kubeflow/training-operator) and applying sample [TFJob and PyTorchJob](https://www.kubeflow.org/docs/components/training/overview/) k8s manifest \ No newline at end of file diff --git a/README.md b/README.md index e2f27f5..122c490 100644 --- a/README.md +++ b/README.md @@ -24,17 +24,18 @@ Repository showcasing ML Ops practices with kubeflow and mlflow - [x] CD workflow for on-demand AKS deployments and kubeflow operator or mlflow helm chart installations - [x] CD wofklow for on demand deployments of an Azure Storage Account Container **(For storing terraform state files)** - [x] CD workflow for on-demand Azure Container Registry deployments in order to store internal Docker images. -- [ ] CI workflow for building internal docker images and uploading those to an Azure Container Resgitry -- [ ] CD workflows for internal helm chart installations in deployed AKS clusters +- [ ] ~~CI workflow for building internal docker images and uploading those to an Azure Container Resgitry~~ +- [ ] ~~CD workflows for internal helm chart installations in deployed AKS clusters~~ - [x] Added `devcontainer.json` with necessary tooling for local development - [x] Python (pytorch or tensorflow) application for ML training and inference purposes and Jupyter notebooks - [x] Simple feedforward neural network with MNIST dataset to map input images to their corresponding digit classes - [x] CNN architecture training and inference considering COCO dataset for image classification AI applications (**NOTE:** Compute and storage intensive. Read `Download the COCO dataset images` comments on preferred hardware specs) - [ ] ~~(**OPTIONAL**) Transformer architecture training considering pre-trained models for chatbot AI applications~~ - [x] Dockerizing Python (pytorch or tensorflow) applications for ML training and inference -- [ ] Helm charts with K8s manifests for ML jobs considering the [Training Operator for CRDs](https://github.com/kubeflow/training-operator) -- [ ] Demonstration of model training and model deployment trough automation workflows -- [ ] (**OPTIONAL**) mlflow experiments for the machine learning lifecycle +- [ ] ~~Helm charts with K8s manifests for ML jobs considering the [Training Operator for CRDs](https://github.com/kubeflow/training-operator)~~ +- [x] Installation of the [Training Operator for CRDs](https://github.com/kubeflow/training-operator) and applying sample [TFJob and PyTorchJob](https://www.kubeflow.org/docs/components/training/overview/) k8s manifest +- [x] Demonstration of model training and model deployment trough automation workflows~~ +- [ ] ~~(**OPTIONAL**) mlflow experiments for the machine learning lifecycle ## Getting started @@ -72,7 +73,7 @@ and visit in a browser of choice `localhost:8080`. ![kubeflow-dashboard](./images/kubeflow-dashboard.PNG) -#### CNN architecture training considering pre-trained models for image classification AI applications +#### Jupyter notebooks When creating the Jupyter notebook instance consider the following data volume: @@ -102,6 +103,10 @@ Execute a [Jupyter notebook](./notebooks/) to either train the model or perform ![Run jupyter notebook example](./images/run-jupyter-notebook-example.PNG) +#### Applying TFJob or PyTorchJob k8s manifests + +After successful installation of the Kubeflow Training Operator, apply some sample k8s ML training jobs, e.g. [for PyTorch](https://www.kubeflow.org/docs/components/training/user-guides/pytorch/) and [for Tensorflow](https://www.kubeflow.org/docs/components/training/user-guides/tensorflow/). + ### mlflow To access the MLflow dashboard following the installation of the MLflow Helm chart, execute the following command: