-
Notifications
You must be signed in to change notification settings - Fork 1.7k
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Update test READMEs to reflect correct information on testing workflows
Signed-off-by: ddalvi <[email protected]>
- Loading branch information
Showing
2 changed files
with
141 additions
and
115 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,93 +1,100 @@ | ||
# ML pipeline test infrastructure | ||
# ML Pipeline Test Infrastructure | ||
|
||
This folder contains the integration/e2e tests for ML pipeline. We use Argo workflow to run the tests. | ||
This folder contains the integration and end-to-end (E2E) tests for the ML pipeline. Tests are executed using Kind (Kubernetes IN Docker) to simulate a Kubernetes cluster locally. GitHub Actions (GHAs) handle automated testing on pull requests. | ||
|
||
At a high level, a typical test workflow will | ||
At a high level, a typical test workflow will: | ||
- Build images for all components. | ||
- Create a dedicated test namespace in the cluster. | ||
- Deploy the ML pipeline using the newly built components. | ||
- Run the tests. | ||
- Clean up the namespace and temporary resources. | ||
|
||
- build docker images for all components | ||
- create a dedicate test namespace in the cluster | ||
- deploy ml pipeline using the newly built components | ||
- run the test | ||
- delete the namespace | ||
- delete the images | ||
These steps are performed in the same Kubernetes cluster. | ||
|
||
All these steps will be taking place in the same Kubernetes cluster. | ||
You can use GKE to test against the code in a Github Branch. The image will be temporarily stored in the GCR repository in the same project. | ||
--- | ||
|
||
Tests are run automatically on each commit in a Kubernetes cluster using | ||
[Prow](https://github.com/kubernetes/test-infra/tree/master/prow). | ||
Tests can also be run manually, see the next section. | ||
## Running Tests Locally with Kind | ||
|
||
## Run tests using GKE | ||
|
||
You could run the tests against a specific commit. | ||
To run tests locally, set up a Kind cluster and follow the same steps as the GitHub Actions workflows. This section details the process. | ||
|
||
### Setup | ||
|
||
Here are the one-time steps to prepare for your GKE testing cluster: | ||
- Follow the [deployment guide](https://www.kubeflow.org/docs/components/pipelines/installation/standalone-deployment/#set-up-your-cluster) to | ||
create a GKE cluster. | ||
- Install [Argo](https://github.com/argoproj/argo-workflows/) | ||
in the cluster. | ||
- Create cluster role binding. | ||
``` | ||
kubectl create clusterrolebinding default-as-admin --clusterrole=cluster-admin --serviceaccount=default:default | ||
``` | ||
- Follow the | ||
[guideline](https://developer.github.com/v3/guides/managing-deploy-keys/) to | ||
create a | ||
[ssh](https://help.github.com/articles/generating-a-new-ssh-key-and-adding-it-to-the-ssh-agent/) | ||
deploy key, and store as Kubernetes secret in your cluster, so the job can | ||
later access the code. Note it requires admin permission to add a deploy key | ||
to github repo. This step is not needed when the project is public. | ||
``` | ||
kubectl create secret generic ssh-key-secret | ||
--from-file=id_rsa=/path/to/your/id_rsa | ||
--from-file=id_rsa.pub=/path/to/your/id_rsa.pub | ||
``` | ||
|
||
### Run tests | ||
Simply submit the test workflow to the GKE cluster, with a parameter | ||
specifying the commit you want to test (master HEAD by default): | ||
``` | ||
argo submit integration_test_gke.yaml -p commit-sha=<commit> | ||
``` | ||
You can check the result by doing: | ||
``` | ||
argo list | ||
``` | ||
The workflow will create a temporary namespace with the same name as the Argo | ||
workflow. All the images will be stored in | ||
**gcr.io/project_id/workflow_name/branch_name/***. By default when the test is | ||
*finished, the namespace and images will be deleted. | ||
However you can keep them by providing additional parameter. | ||
``` | ||
argo submit integration_test_gke.yaml -p branch="my-branch" -p cleanup="false" | ||
``` | ||
|
||
### Run presubmit-tests-with-pipeline-deployment.sh locally | ||
|
||
Run the following commands from root of kubeflow/pipelines repo. | ||
``` | ||
# $WORKSPACE are env variables set by Prow | ||
export WORKSPACE=$(pwd) # root of kubeflow/pipelines git repo | ||
export SA_KEY_FILE=PATH/TO/YOUR/GCP/PROJECT/SERVICE/ACCOUNT/KEY | ||
# (optional) uncomment the following to keep reusing the same cluster | ||
# export TEST_CLUSTER=YOUR_PRECONFIGURED_CLUSTER_NAME | ||
# (optional) uncomment the following to disable built image caching | ||
# export DISABLE_IMAGE_CACHING=true | ||
./test/presubmit-tests-with-pipeline-deployment.sh \ | ||
--workflow_file e2e_test_gke_v2.yaml \ # You can specify other workflows you want to test too. | ||
--test_result_folder ${FOLDER_NAME_TO_HOLD_TEST_RESULT} \ | ||
--test_result_bucket ${YOUR_GCS_TEST_RESULT_BUCKET} \ | ||
--project ${YOUR_GCS_PROJECT} | ||
``` | ||
|
||
## Troubleshooting | ||
|
||
**Q: Why is my test taking so long on GKE?** | ||
|
||
The cluster downloads a bunch of images during the first time the test runs. It will be faster the second time since the images are cached. | ||
The image building steps are running in parallel and usually takes 2~3 minutes in total. If you are experiencing high latency, it might due to the resource constrains | ||
on your GKE cluster. In that case you need to deploy a larger cluster. | ||
1. **Install Prerequisites**: | ||
- [Docker](https://docs.docker.com/get-docker/) | ||
- [Kind](https://kind.sigs.k8s.io/docs/user/quick-start/#installation) | ||
- [kubectl](https://kubernetes.io/docs/tasks/tools/) | ||
|
||
2. **Set Up a Kind Cluster**: | ||
Create a configuration file for your Kind cluster (optional): | ||
```yaml | ||
kind: Cluster | ||
apiVersion: kind.x-k8s.io/v1alpha4 | ||
nodes: | ||
- role: control-plane | ||
- role: worker | ||
``` | ||
Create the cluster: | ||
`kind create cluster --name kfp-test-cluster --config kind-config.yaml` | ||
|
||
Verify the cluster: | ||
|
||
`kubectl cluster-info --context kind-kfp-test-cluster` | ||
|
||
3. **Prepare the Test Environment**: | ||
|
||
- Install Python test dependencies: | ||
|
||
`pip install -r test/requirements.txt` | ||
|
||
- Deploy Kubeflow Pipelines to the Kind cluster: | ||
|
||
`kubectl apply -k manifests/` | ||
|
||
4. **Run the Tests**: | ||
Execute the desired test suite: | ||
|
||
`pytest test/kfp-functional-test/` | ||
|
||
For additional guidance on deploying Kubeflow Pipelines in Kind, refer to: | ||
|
||
- [Kind Local Cluster Deployment Guide](https://www.kubeflow.org/docs/components/pipelines/legacy-v1/installation/localcluster-deployment/#kind) | ||
- [Operator Deployment Guide](https://www.kubeflow.org/docs/components/pipelines/operator-guides/installation/#deploying-kubeflow-pipelines) | ||
|
||
|
||
## Automated Testing with GitHub Actions | ||
|
||
|
||
Tests are automatically triggered on GitHub when: | ||
|
||
- A pull request is opened or updated. | ||
|
||
GitHub Actions workflows are defined in the `.github/workflows/` directory. | ||
|
||
### Reproducing CI Steps Locally | ||
|
||
To replicate the steps locally: | ||
|
||
1. Clone the Kubeflow Pipelines repository: | ||
|
||
`git clone https://github.com/kubeflow/pipelines.git | ||
cd pipelines` | ||
|
||
2. Follow the steps outlined in the **Running Tests Locally with Kind** section. | ||
|
||
3. To mimic the GitHub Actions environment, export any required environment variables found in the workflow files. | ||
|
||
* * * * * | ||
|
||
Troubleshooting | ||
--------------- | ||
|
||
**Q: Why is my test taking so long?** | ||
|
||
- The first run downloads many container images. Subsequent runs will be faster due to caching. | ||
- If you experience high latency, ensure the local system running Kind has sufficient resources (CPU, memory). | ||
|
||
**Q: How do I clean up the Kind cluster?** | ||
|
||
- Delete the Kind cluster: | ||
`kind delete cluster --name kfp-test-cluster` |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters