From db159e121d7843121ada8af851e04593188e2160 Mon Sep 17 00:00:00 2001 From: davidmirror-ops Date: Mon, 28 Aug 2023 15:21:54 -0500 Subject: [PATCH 01/24] Updates to deployment guides Signed-off-by: davidmirror-ops --- charts/flyte-binary/eks-production.yaml | 2 +- .../deployment/cloud_production.rst | 15 ++--- rsts/deployment/deployment/cloud_simple.rst | 8 +++ rsts/deployment/deployment/index.rst | 32 ++-------- rsts/deployment/deployment/multicluster.rst | 63 +++++++++---------- rsts/deployment/deployment/sandbox.rst | 16 ++--- 6 files changed, 54 insertions(+), 82 deletions(-) diff --git a/charts/flyte-binary/eks-production.yaml b/charts/flyte-binary/eks-production.yaml index 727b9b10fa..bda8356c98 100644 --- a/charts/flyte-binary/eks-production.yaml +++ b/charts/flyte-binary/eks-production.yaml @@ -132,7 +132,7 @@ ingress: nginx.ingress.kubernetes.io/app-root: /console grpcAnnotations: nginx.ingress.kubernetes.io/backend-protocol: GRPC - host: development.uniondemo.run + host: development.uniondemo.run # change for the URL you'll use to connect to Flyte rbac: extraRules: - apiGroups: diff --git a/rsts/deployment/deployment/cloud_production.rst b/rsts/deployment/deployment/cloud_production.rst index 90997556c9..ff8e182d98 100644 --- a/rsts/deployment/deployment/cloud_production.rst +++ b/rsts/deployment/deployment/cloud_production.rst @@ -28,18 +28,18 @@ To turn on ingress, update your ``values.yaml`` file to include the following bl .. literalinclude:: ../../../charts/flyte-binary/eks-production.yaml :caption: charts/flyte-binary/eks-production.yaml :language: yaml - :lines: 123-131 + :lines: 127-135 .. note:: - This currently assumes that you have nginx ingress. We'll be updating these - in the near future to use the ALB ingress controller instead. + This section assumes that you're using the NGINX Ingress controller. Instructions and annotations for the ALB controller + are covered in the `Flyte The Hard Way `__ tutorial. *************** Authentication *************** -Authentication comes with Flyte in the form of OAuth 2. Please see the +Authentication comes with Flyte in the form of OAuth 2.0. Please see the `authentication guide `__ for instructions. .. note:: @@ -60,10 +60,3 @@ compatibility being maintained, for the most part. If you're using the :ref:`multi-cluster ` deployment model for Flyte, components should be upgraded together. - -.. note:: - - Expect to see minor version releases roughly 4-6 times a year - we aim to - release monthly, or whenever there is a large enough set of features to - warrant a release. Expect to see patch releases at more regular intervals, - especially for flytekit, the Python SDK. diff --git a/rsts/deployment/deployment/cloud_simple.rst b/rsts/deployment/deployment/cloud_simple.rst index b675df00b9..b280546708 100644 --- a/rsts/deployment/deployment/cloud_simple.rst +++ b/rsts/deployment/deployment/cloud_simple.rst @@ -115,6 +115,14 @@ hello world example: cd flytesnacks/cookbook pyflyte run --remote core/flyte_basics/hello_world.py my_wf +*********************************** +Flyte in on-premises infrastructure +*********************************** + +Sometimes, it's also helpful to be able to set up a Flyte environment in an on-premises Kubernetes environment or even on a laptop for testing and development purposes. +Check out `this community-maintained tutorial `__ to learn how to setup the required dependencies and deploy the `flyte-binary` chart to a local Kubernetes cluster. + + ************* What's Next? ************* diff --git a/rsts/deployment/deployment/index.rst b/rsts/deployment/deployment/index.rst index ac0765412a..e253ae480c 100644 --- a/rsts/deployment/deployment/index.rst +++ b/rsts/deployment/deployment/index.rst @@ -49,29 +49,6 @@ deployment comes with a containerized `Minio `__, which offers - **GCP**: `GCS `__ - **Azure**: `Azure Blob Storage `__ - -Cluster Configuration -===================== - -Flyte configures K8s clusters to work with it. For example, as your Flyte userbase evolves, adding new projects is as -simple as registering them through the command line: - -.. prompt:: bash $ - - flytectl create project \ - --id my-flyte-project \ - --name "My Flyte Project" \ - --description "My first project onboarding onto Flyte" - -Once you invoke this command, this project should immediately show up in the Flyte console after refreshing. - -Flyte runs at a configurable cadence that ensures that all Kubernetes resources necessary for the new project are -created and new workflows can successfully be registered and executed within it. - -.. note:: - - For more information, see :std:ref:`flytectl `. - ************************ Flyte Deployment Paths ************************ @@ -108,7 +85,7 @@ There are three different paths for deploying a Flyte cluster: This option is appropriate if all your compute can `fit on one EKS cluster `__ . As of this writing, a single Flyte cluster can handle more than 13,000 nodes. - Whatever path you choose, note that ``FlytePropeller`` itself can be sharded as well, though typically it's not required. + Regardless of using single or multiple Kubernetes clusters for Flyte, note that ``FlytePropeller`` -tha main data plane component- can be sharded as well, if scale demands require it. Helm ==== @@ -156,10 +133,13 @@ Deployment Tips and Tricks Due to the many choices and constraints that you may face in your organization, the specific steps for deploying Flyte can vary significantly. For example, which cloud platform to use is typically a big fork in the road for many, and there -are many choices to make in terms of ingresses, auth providers, and versions of different dependent libraries that +are many choices to make in terms of Ingress controllers, auth providers, and versions of different dependent libraries that may interact with other parts of your stack. -In addition to searching and posting on the `Flyte Slack community `__, +Considering the above, we recommend checking out the `"Flyte The Hard Way" `__ set of community-maintained tutorials that can guide you through the process of preparing the infrastructure and +deploying Flyte. + +In addition to searching and posting on the `#flyte-deployment Slack channel `__, we have a `Github Discussion `__ section dedicated to deploying Flyte. Feel free to submit any hints you've found helpful as a discussion, ask questions, or simply document what worked or what didn't work for you. diff --git a/rsts/deployment/deployment/multicluster.rst b/rsts/deployment/deployment/multicluster.rst index 69c34989ae..2b8e15084c 100644 --- a/rsts/deployment/deployment/multicluster.rst +++ b/rsts/deployment/deployment/multicluster.rst @@ -8,8 +8,8 @@ Multiple K8s Cluster Deployment .. note:: - The multicluster deployment described in this doc assumes you have deployed - the ``flyte`` Helm chart, which runs the individual Flyte services separately. + The multicluster deployment described in this section, assumes you have deployed + the ``flyte-core`` Helm chart, which runs the individual Flyte services separately. This is needed because in a multicluster setup, the execution engine is deployed to multiple K8s clusters. This will not work with the ``flyte-binary`` Helm chart, since that chart deploys all Flyte service as one single binary. @@ -24,23 +24,22 @@ Scaling Beyond Kubernetes execution. The data plane fulfills these workflows by launching pods in Kubernetes. -At very large companies, total compute needs could exceed the limits of a single +At large organizations, total compute needs could exceed the limits of a single Kubernetes cluster. To address this, you can deploy the data plane to multiple Kubernetes clusters. The control plane (FlyteAdmin) can be configured to load-balance workflows across these individual data planes, protecting you from failure in a single Kubernetes -cluster increasing scalability. +cluster, thus increasing scalability. -To achieve this, first, you have to create additional Kubernetes clusters. -For now, let's assume you have three Kubernetes clusters and that you can access +To achieve this, first you have to create additional Kubernetes clusters. + +This gude assumes that you have three Kubernetes clusters and that you can access them all with ``kubectl``. Let's call these clusters ``cluster1``, ``cluster2``, and ``cluster3``. -Next, deploy *only* the data planes to these clusters. To do this, remove the -data plane components from the ``flyte`` overlay, and create a new overlay -containing *only* the data plane resources. +Next, deploy *only* the data planes to these clusters. To do this, use the `values-dataplane.yaml `__ provided with the Helm chart. Data Plane Deployment ********************* @@ -61,16 +60,16 @@ Install Flyte data plane Helm chart .. code-block:: - helm upgrade flyte -n flyte flyteorg/flyte-core values.yaml \ + helm upgrade -n flyte -f values.yaml \ -f values-eks.yaml \ -f values-dataplane.yaml \ - --create-namespace flyte --install + --create-namespace flyte flyteorg/flyte-core --install .. tabbed:: GCP .. code-block:: - helm upgrade flyte -n flyte flyteorg/flyte-core values.yaml \ + helm upgrade flyte -n flyte flyteorg/flyte-core -f values.yaml \ -f values-gcp.yaml \ -f values-dataplane.yaml \ --create-namespace flyte --install @@ -83,24 +82,24 @@ Some Flyte deployments may choose to run the control plane separate from the dat plane. FlyteAdmin is designed to create Kubernetes resources in one or more Flyte data plane clusters. For the admin to access remote clusters, it needs credentials to each cluster. +Flyte makes use of Kubernetess Service Accounts to enable every data plane cluster to perform +authenticated requests to the K8s API Server. +The default behaviour is that ``FlyteAdmin`` creates a `ServiceAccount `_ +in each data plane cluster. +In order to verify requests, the API Server expects a `signed bearer token `__ +attached to the Service Account. -In Kubernetes, scoped service credentials are created by configuring a "Role" -resource in a Kubernetes cluster. When you attach the role to a "ServiceAccount", -Kubernetes generates a bearer token that permits access. Hence, create a -FlyteAdmin `ServiceAccount `_ -in each data plane cluster to generate these tokens. -.. warning:: - - **Never delete a ServiceAccount 🛑** - - When you first create the FlyteAdmin ``ServiceAccount`` in a new cluster, a - bearer token is generated and will continue to allow access unless the - "ServiceAccount" is deleted. +.. note:: + As of Kubernetes 1.24 an above, the bearer token has to be generated manually for a Service Account, using the following command: -To feed the credentials to FlyteAdmin, you must retrieve them from your new data plane cluster and upload them to admin (for example, within Lyft, `Confidant `__ is used). + .. prompt:: bash $ + + kubectl create token -n + +To feed the credentials to FlyteAdmin, you must retrieve them from your new data plane cluster and upload them to ``FlyteAmin``. -The credentials have two parts ("ca cert" and "bearer token"). Find the generated secret via: +The credentials have two parts (``ca cert`` and ``bearer token``). Find the generated secret via: .. prompt:: bash $ @@ -133,12 +132,12 @@ file named ``secrets.yaml`` that looks like: namespace: flyte type: Opaque data: - cluster_1_token: {{ cluster 1 token here }} - cluster_1_cacert: {{ cluster 1 cacert here }} - cluster_2_token: {{ cluster 2 token here }} - cluster_2_cacert: {{ cluster 2 cacert here }} - cluster_3_token: {{ cluster 3 token here }} - cluster_3_cacert: {{ cluster 3 cacert here }} + cluster_1_token: "cluster-1-token-here" + cluster_1_cacert: "cluster-1-cacert-here" + cluster_2_token: "cluster-2-token-here" + cluster_2_cacert: "cluster-2-cacert-here" + cluster_3_token: "cluster-3-token-here" + cluster_3_cacert: "cluster-3-cacert-here" Create cluster credentials secret in the control plane cluster. diff --git a/rsts/deployment/deployment/sandbox.rst b/rsts/deployment/deployment/sandbox.rst index 073125e5cc..98d1f48582 100644 --- a/rsts/deployment/deployment/sandbox.rst +++ b/rsts/deployment/deployment/sandbox.rst @@ -6,11 +6,11 @@ Sandbox Deployment .. tags:: Kubernetes, Infrastructure, Basic -A sandbox deployment of Flyte is bundles together portable versions of Flyte's +A sandbox deployment of Flyte bundles together portable versions of Flyte's dependencies such as a relational database and durable object store. For the blob store requirements, Flyte Sandbox uses `Minio `__, -which offers an S3 compatible interface, and for Postgres, we use the stock +which offers an S3 compatible interface, and for Postgres, it uses the stock Postgres Docker image and Helm chart. .. important:: @@ -41,7 +41,7 @@ Requirements - Install `docker `__ or any other OCI-compatible tool, like Podman or LXD. - Install `flytectl `__, the official CLI for Flyte. -While Flyte can run any OCI-compatible task image, using the default Kubernetes container runtime (cri-o), the Flyte +While Flyte can run any OCI-compatible task image using the default Kubernetes container runtime (cri-o), the Flyte core maintainers typically use Docker. Note that the ``flytectl demo`` command does rely on Docker APIs, but as this demo environment is just one self-contained image, you can also run the image directly using another run time. @@ -79,12 +79,4 @@ who wish to dig deeper into the storage layer. 📂 The Minio API is hosted on localhost:30002. Use http://localhost:30080/minio/login for Minio console Now that you have the sandbox cluster running, you can now go to the :ref:`User Guide ` or -:ref:`Tutorials ` to run tasks and workflows written in ``flytekit``, the Python SDK for Flyte. - -************************** -Flyte Sandbox on the Cloud -************************** - -Sometimes it's also helpful to be able to install a sandboxed environment on a cloud provider. That is, you have access -to an EKS or GKE cluster, but provisioning a separate database or blob storage bucket is harder because of a lack of -infrastructure support. Instructions for how to do this will be forthcoming. +:ref:`Tutorials ` to run tasks and workflows written in ``flytekit``, the Python SDK for Flyte. \ No newline at end of file From 4063745bf35ff2b20182c293bebe78504177d3cf Mon Sep 17 00:00:00 2001 From: davidmirror-ops Date: Tue, 19 Sep 2023 16:23:56 -0500 Subject: [PATCH 02/24] Update multicluster docs round 2 Signed-off-by: davidmirror-ops --- .../deployment/cloud_production.rst | 43 ++- rsts/deployment/deployment/multicluster.rst | 333 ++++++++++++------ 2 files changed, 252 insertions(+), 124 deletions(-) diff --git a/rsts/deployment/deployment/cloud_production.rst b/rsts/deployment/deployment/cloud_production.rst index ff8e182d98..804dcbd726 100644 --- a/rsts/deployment/deployment/cloud_production.rst +++ b/rsts/deployment/deployment/cloud_production.rst @@ -23,17 +23,42 @@ guide already contains the ingress rules, but they are not enabled by default. To turn on ingress, update your ``values.yaml`` file to include the following block. -.. tabbed:: AWS - ``flyte-binary`` - - .. literalinclude:: ../../../charts/flyte-binary/eks-production.yaml - :caption: charts/flyte-binary/eks-production.yaml +.. tabs:: + + .. group-tab:: ``flyte-binary`` on EKS using NGINX + + .. literalinclude:: ../../../charts/flyte-binary/eks-production.yaml + :caption: charts/flyte-binary/eks-production.yaml + :language: yaml + :lines: 127-135 + + .. group-tab:: ``flyte-binary`` on EKS using ALB + + .. code-block:: yaml + + ingress: + create: true + commonAnnotations: + alb.ingress.kubernetes.io/certificate-arn: '' + alb.ingress.kubernetes.io/group.name: flyte + alb.ingress.kubernetes.io/listen-ports: '[{"HTTP": 80}, {"HTTPS":443}]' + alb.ingress.kubernetes.io/scheme: internet-facing + alb.ingress.kubernetes.io/ssl-redirect: '443' + alb.ingress.kubernetes.io/target-type: ip + kubernetes.io/ingress.class: alb + httpAnnotations: + alb.ingress.kubernetes.io/actions.app-root: '{"Type": "redirect", "RedirectConfig": {"Path": "/console", "StatusCode": "HTTP_302"}}' + grpcAnnotations: + alb.ingress.kubernetes.io/backend-protocol-version: GRPC + host: #use a DNS CNAME pointing to your ALB + + .. group-tab:: ``flyte-core`` on GCP using NGINX + + .. literalinclude:: ../../../charts/flyte-core/values-gcp.yaml + :caption: charts/flyte-core/values-gcp.yaml :language: yaml - :lines: 127-135 + :lines: 156-164 -.. note:: - - This section assumes that you're using the NGINX Ingress controller. Instructions and annotations for the ALB controller - are covered in the `Flyte The Hard Way `__ tutorial. *************** Authentication diff --git a/rsts/deployment/deployment/multicluster.rst b/rsts/deployment/deployment/multicluster.rst index 2b8e15084c..af2b9039f3 100644 --- a/rsts/deployment/deployment/multicluster.rst +++ b/rsts/deployment/deployment/multicluster.rst @@ -1,7 +1,7 @@ .. _deployment-deployment-multicluster: ################################## -Multiple K8s Cluster Deployment +Multiple Kubernetes Cluster Deployment ################################## .. tags:: Kubernetes, Infrastructure, Advanced @@ -9,10 +9,10 @@ Multiple K8s Cluster Deployment .. note:: The multicluster deployment described in this section, assumes you have deployed - the ``flyte-core`` Helm chart, which runs the individual Flyte services separately. + the ``flyte-core`` Helm chart, which runs the individual Flyte componentes separately. This is needed because in a multicluster setup, the execution engine is - deployed to multiple K8s clusters. This will not work with the ``flyte-binary`` - Helm chart, since that chart deploys all Flyte service as one single binary. + deployed to multiple K8s clusters; it won't work with the ``flyte-binary`` + Helm chart, since it deploys all Flyte services as one single binary. Scaling Beyond Kubernetes ------------------------- @@ -24,27 +24,29 @@ Scaling Beyond Kubernetes execution. The data plane fulfills these workflows by launching pods in Kubernetes. -At large organizations, total compute needs could exceed the limits of a single -Kubernetes cluster. -To address this, you can deploy the data plane to multiple Kubernetes clusters. +.. image:: https://raw.githubusercontent.com/flyteorg/static-resources/main/common/flyte-multicluster-arch.png + +The case for multiple Kubernetes clusters may arise due to security constraints, +cost effectiveness or a need to scale out computing resources. + +To address this, you can deploy Flyte's data plane to multiple Kubernetes clusters. The control plane (FlyteAdmin) can be configured to load-balance workflows across these individual data planes, protecting you from failure in a single Kubernetes cluster, thus increasing scalability. -To achieve this, first you have to create additional Kubernetes clusters. - -This gude assumes that you have three Kubernetes clusters and that you can access -them all with ``kubectl``. - -Let's call these clusters ``cluster1``, ``cluster2``, and ``cluster3``. -Next, deploy *only* the data planes to these clusters. To do this, use the `values-dataplane.yaml `__ provided with the Helm chart. +To deploy *only* the data planes to these clusters, use the `values-dataplane.yaml `__ provided with the Helm chart. Data Plane Deployment ********************* -First, add the Flyteorg Helm repo +This gude assumes that you have three Kubernetes clusters and that you can access +them all with ``kubectl``. + +Let's call these clusters ``dataplane1``, ``dataplane2``, and ``dataplane3``. + +1. Add the ``flyteorg`` Helm repo: .. code-block:: @@ -54,16 +56,19 @@ First, add the Flyteorg Helm repo helm fetch --untar --untardir . flyteorg/flyte-core cd flyte-core -Install Flyte data plane Helm chart +2. Install Flyte data plane Helm chart: + +.. note:: + + Use here the same ``values-eks`` or ``values-gcp.yaml`` file you used to deploy the controlplane. .. tabbed:: AWS .. code-block:: - helm upgrade -n flyte -f values.yaml \ - -f values-eks.yaml \ - -f values-dataplane.yaml \ - --create-namespace flyte flyteorg/flyte-core --install + helm install flyte-core-data flyteorg/flyte-core -n flyte \ + --values values-eks.yaml --values values-dataplane.yaml \ + --create-namespace .. tabbed:: GCP @@ -78,49 +83,77 @@ Install Flyte data plane Helm chart User and Control Plane Deployment ********************************* -Some Flyte deployments may choose to run the control plane separate from the data -plane. FlyteAdmin is designed to create Kubernetes resources in one or more -Flyte data plane clusters. For the admin to access remote clusters, it needs -credentials to each cluster. +For ``flyteadmin`` to access and create Kubernetes resources in one or more +Flyte data plane clusters , it needs credentials to each cluster. Flyte makes use of Kubernetess Service Accounts to enable every data plane cluster to perform -authenticated requests to the K8s API Server. -The default behaviour is that ``FlyteAdmin`` creates a `ServiceAccount `_ +authenticated requests to the Kubernetes API Server. +The default behaviour is that ``flyteadmin`` creates a `ServiceAccount `_ in each data plane cluster. In order to verify requests, the API Server expects a `signed bearer token `__ -attached to the Service Account. +attached to the Service Account. As of Kubernetes 1.24 an above, the bearer token has to be generated manually. -.. note:: - As of Kubernetes 1.24 an above, the bearer token has to be generated manually for a Service Account, using the following command: +1. Use the following manifest to create a long-lived secret for the ``flyteadmin`` Service Account in your dataplane cluster: .. prompt:: bash $ - kubectl create token -n - -To feed the credentials to FlyteAdmin, you must retrieve them from your new data plane cluster and upload them to ``FlyteAmin``. + kubectl apply -f - < Date: Thu, 28 Sep 2023 12:47:44 -0500 Subject: [PATCH 03/24] Updates instructions from last run Signed-off-by: davidmirror-ops --- rsts/deployment/deployment/multicluster.rst | 297 ++++++++++++++------ 1 file changed, 216 insertions(+), 81 deletions(-) diff --git a/rsts/deployment/deployment/multicluster.rst b/rsts/deployment/deployment/multicluster.rst index af2b9039f3..4edc1cdb03 100644 --- a/rsts/deployment/deployment/multicluster.rst +++ b/rsts/deployment/deployment/multicluster.rst @@ -31,24 +31,170 @@ The case for multiple Kubernetes clusters may arise due to security constraints, cost effectiveness or a need to scale out computing resources. To address this, you can deploy Flyte's data plane to multiple Kubernetes clusters. -The control plane (FlyteAdmin) can be configured to load-balance workflows across -these individual data planes, protecting you from failure in a single Kubernetes -cluster, thus increasing scalability. +The control plane (FlyteAdmin) can be configured to submit workflows to +these individual data planes. Additionally, Flyte provides the mechanisms for +administrators to retain control on the workflow placement logic while enabling +users to reap the benefits using simple abstractions like ``projects`` and ``domains``. + +Prerequisites +************* + +To make sure that your multicluster deployment is able to scale and process +requests successfully, the following environment-specific requirements should be met: + +.. tabbed:: AWS + + 1. An IAM Policy that defines the permissions needed for Flyte. A minimum set of permissions include: + + .. code-block:: json + + "Action": [ + "s3:DeleteObject*", + "s3:GetObject*", + "s3:ListBucket", + "s3:PutObject*" + ], + "Resource": [ + "arn:aws:s3:::*", + "arn:aws:s3:::*/*" + ], + + 2. At least three IAM Roles configured: one for the controlplane components, another for the dataplane + and one more for the worker Pods that are bootstraped by Flyte to execute workflow tasks. + + 3. An OIDC Provider associated with each of your EKS clusters. You can use the following command to create and connect the Provider: + + .. prompt:: bash + + eksctl utils associate-iam-oidc-provider --cluster --approve + + 4. An IAM Trust Relationship that associates each EKS cluster type (controlplane or dataplane) with the Service Account(s) and namespaces + where the different elements of the system will run. + + Use the steps in this section to complete the requirements indicated above: + + **Control plane role** + + 1. Use the following command to simplify the process of both creating a role and configuring an initial Trust Relationship: + + .. prompt:: bash + + eksctl create iamserviceaccount --cluster= --name=flyteadmin --role-only --role-name=flyte-controlplane-role --attach-policy-arn --approve --region --namespace flyte + + 2. Go to the **IAM** section in your **AWS Management Console** and select the role that was just created + 3. Go to the **Trust Relationships** tab and **Edit the Trust Policy** + 4. Add the ``datacatalog`` Service Account to the ``sub`` section + + The end result should look similar to the following example: + + .. code-block:: json + + { + "Version": "2012-10-17", + "Statement": [ + { + "Effect": "Allow", + "Principal": { + "Federated": "arn:aws:iam:::oidc-provider/oidc.eks..amazonaws.com/id/" + }, + "Action": "sts:AssumeRoleWithWebIdentity", + "Condition": { + "StringEquals": { + "oidc.eks..amazonaws.com/id/:aud": "sts.amazonaws.com", + "oidc.eks..amazonaws.com/id/:sub": [ + "system:serviceaccount:flyte:flyteadmin", + "system:serviceaccount:flyte:datacatalog" + ] + } + } + } + ] + } + + **Data plane role** + + 1. Create the role and Trust Relationship: + + .. prompt:: bash + + eksctl create iamserviceaccount --cluster= --name=flytepropeller --role-only --role-name=flyte-dataplane-role --attach-policy-arn --approve --region --namespace flyte + + 2. Verify the Trust Relationship configuration: + + .. prompt:: bash + + aws iam get-role --role-name flyte-dataplane-role --query Role.AssumeRolePolicyDocument + + Example output: + + .. code-block:: json + + { + "Version": "2012-10-17", + "Statement": [ + { + "Effect": "Allow", + "Principal": { + "Federated": "arn:aws:iam:::oidc-provider/oidc.eks..amazonaws.com/id/" + }, + "Action": "sts:AssumeRoleWithWebIdentity", + "Condition": { + "StringEquals": { + "oidc.eks.us-east-1.amazonaws.com/id/66CBAF563FD1438BC98F1EF39FF8DACD:aud": "sts.amazonaws.com", + "oidc.eks.us-east-1.amazonaws.com/id/66CBAF563FD1438BC98F1EF39FF8DACD:sub": "system:serviceaccount:flyte:flytepropeller" + } + } + } + ] + } + + **Workers role** + + 1. Create role and initial Trust Relationship: + + .. prompt:: bash + + eksctl create iamserviceaccount --cluster= --name=default --role-only --role-name=flyte-workers-role --attach-policy-arn --approve --region --namespace flyte + + 2. Go to the **IAM** section in your **AWS Management Console** and select the role that was just created + 3. Go to the **Trust Relationships** tab and **Edit the Trust Policy** + 4. By default, every Pod created for Task execution, uses the ``default`` Service Account on their respective namespace. In your cluster, you'll have as many + namespaces as ``project`` and ``domain`` combinations you may have. Hence, it might be useful to use a ``StringLike`` condition and to set a wildcard for the namespace in the Trust Policy: + + .. code-block:: json + + { + "Version": "2012-10-17", + "Statement": [ + { + "Effect": "Allow", + "Principal": { + "Federated": "arn:aws:iam:::oidc-provider/oidc.eks..amazonaws.com/id/" + }, + "Action": "sts:AssumeRoleWithWebIdentity", + "Condition": { + "StringLike": { + "oidc.eks..amazonaws.com/id/:sub": "system:serviceaccount:*:default", + "oidc.eks..amazonaws.com/id/:aud": "sts.amazonaws.com" + } + } + } + ] + } -To deploy *only* the data planes to these clusters, use the `values-dataplane.yaml `__ provided with the Helm chart. Data Plane Deployment ********************* -This gude assumes that you have three Kubernetes clusters and that you can access +This guide assumes that you have two Kubernetes clusters and that you can access them all with ``kubectl``. -Let's call these clusters ``dataplane1``, ``dataplane2``, and ``dataplane3``. +Let's call these clusters ``dataplane1`` and ``dataplane2``. 1. Add the ``flyteorg`` Helm repo: -.. code-block:: +.. prompt:: bash helm repo add flyteorg https://flyteorg.github.io/flyte helm repo update @@ -56,7 +202,22 @@ Let's call these clusters ``dataplane1``, ``dataplane2``, and ``dataplane3``. helm fetch --untar --untardir . flyteorg/flyte-core cd flyte-core -2. Install Flyte data plane Helm chart: +2. Open the ``values-dataplane.yaml`` file and add the following contents: + + .. code-block:: yaml + + configmap: + admin: + admin: + endpoint: :443 #use here the URL you're using to connect to Flyte + insecure: false #enables secure communication over SSL. Requires a signed certificate + +.. note:: + + This step is needed so the ``flytepropeller`` instance in the data plane cluster is able to send notifications + back to the ``flyteadmin`` service in the control plane. + +3. Install Flyte data plane Helm chart: .. note:: @@ -74,13 +235,14 @@ Let's call these clusters ``dataplane1``, ``dataplane2``, and ``dataplane3``. .. code-block:: - helm upgrade flyte -n flyte flyteorg/flyte-core -f values.yaml \ - -f values-gcp.yaml \ - -f values-dataplane.yaml \ - --create-namespace flyte --install + helm install flyte-core-data -n flyte flyteorg/flyte-core \ + --values values-gcp.yaml \ + --values values-dataplane.yaml \ + --create-namespace flyte +4. Repeat step 2 and 3 for each dataplane cluster in your environment. -User and Control Plane Deployment +Control Plane Deployment ********************************* For ``flyteadmin`` to access and create Kubernetes resources in one or more @@ -89,11 +251,11 @@ Flyte makes use of Kubernetess Service Accounts to enable every data plane clust authenticated requests to the Kubernetes API Server. The default behaviour is that ``flyteadmin`` creates a `ServiceAccount `_ in each data plane cluster. -In order to verify requests, the API Server expects a `signed bearer token `__ -attached to the Service Account. As of Kubernetes 1.24 an above, the bearer token has to be generated manually. +In order to verify requests, the Kubernetes API Server expects a `signed bearer token `__ +attached to the Service Account. As of Kubernetes 1.24 and above, the bearer token has to be generated manually. -1. Use the following manifest to create a long-lived secret for the ``flyteadmin`` Service Account in your dataplane cluster: +1. Use the following manifest to create a long-lived bearer token for the ``flyteadmin`` Service Account in your dataplane cluster: .. prompt:: bash $ @@ -129,7 +291,7 @@ attached to the Service Account. As of Kubernetes 1.24 an above, the bearer toke .. prompt:: bash $ - kubectl get secret -n flyte cluster-credentials \ + kubectl get secret -n flyte dataplane1-token \ -o jsonpath='{.data.token}' | base64 -D | pbcopy 4. Go to ``secrets.yaml`` and add a new entry under ``stringData`` with the dataplane cluster token: @@ -150,7 +312,7 @@ attached to the Service Account. As of Kubernetes 1.24 an above, the bearer toke .. prompt:: bash $ - kubectl get secret -n flyte cluster-credentials \ + kubectl get secret -n flyte dataplane1-token \ -o jsonpath='{.data.ca\.crt}' | base64 -D | pbcopy 6. Add another entry on your ``secrets.yaml`` file for the cert, making sure that indentation resembles the following example: @@ -207,51 +369,45 @@ attached to the Service Account. As of Kubernetes 1.24 an above, the bearer toke additionalVolumeMounts: - name: cluster-credentials mountPath: /var/run/credentials + initContainerClusterSyncAdditionalVolumeMounts: + - name: cluster-credentials + mountPath: /etc/credentials configmap: clusters: labelClusterMap: - project1: + label1: - id: dataplane_1 weight: 1 - project2: + label2: - id: dataplane_2 - weight: 0.5 - - id: dataplane_3 - weight: 0.5 + weight: 1 clusterConfigs: - name: "dataplane_1" - endpoint: https://dataplane-1-kubeapi-endpoint.com:443 + endpoint: https://:443 enabled: true auth: type: "file_path" tokenPath: "/var/run/credentials/dataplane_1_token" certPath: "/var/run/credentials/dataplane_1_cacert" - name: "dataplane_2" - endpoint: https://dataplane-2-kubeapi-endpoint.com:443 - enabled: true - auth: - type: "file_path" - tokenPath: "/var/run/credentials/dataplane_2_token" - certPath: "/var/run/credentials/dataplane_2_cacert" - - name: "dataplane_3" - endpoint: https://dataplane-3-kubeapi-endpoint.com:443 + endpoint: https://:443 enabled: true auth: - type: "file_path" - tokenPath: "/var/run/credentials/dataplane_3_token" - certPath: "/var/run/credentials/dataplane_3_cacert" + type: "file_path" + tokenPath: "/var/run/credentials/dataplane_2_token" + certPath: "/var/run/credentials/dataplane_2_cacert" .. note:: - Typically, you can obtain your Kubernetes endpoint URL using the following command: + Typically, you can obtain your Kubernetes API endpoint URL using the following command: .. prompt:: bash $ kubectl cluster-info -In this configuration, ``team1`` and ``team2`` are just labels that we will use later in the process +In this configuration, ``label1`` and ``label2`` are just labels that we will use later in the process to configure the necessary mappings so workflow executions matching those labels, are scheduled -on one or multiple clusters depending on the weight (e.g. ``team1`` on ``dataplane_1``) +on one or multiple clusters depending on the weight (e.g. ``label1`` on ``dataplane_1``) 10. Update the control plane Helm release: @@ -271,10 +427,9 @@ on one or multiple clusters depending on the weight (e.g. ``team1`` on ``datapla .. code-block:: helm upgrade flyte -n flyte flyteorg/flyte-core values.yaml \ - -f values-gcp.yaml \ - -f values-controlplane.yaml \ - -f values-override.yaml \ - --create-namespace flyte --install + --values values-gcp.yaml \ + --values values-controlplane.yaml \ + --values values-override.yaml 11. Verify that all Pods in the ``flyte`` namespace are ``Running``: @@ -286,7 +441,6 @@ Example output: NAME READY STATUS RESTARTS AGE datacatalog-86f6b9bf64-bp2cj 1/1 Running 0 23h datacatalog-86f6b9bf64-fjzcp 1/1 Running 0 23h - flyteadmin-5bb4c4976d-rdk5l 0/1 Pending 0 23h flyteadmin-84f666b6f5-7g65j 1/1 Running 0 23h flyteadmin-84f666b6f5-sqfwv 1/1 Running 0 23h flyteconsole-cdcb48b56-5qzlb 1/1 Running 0 23h @@ -294,28 +448,6 @@ Example output: flytescheduler-947ccbd6-r8kg5 1/1 Running 0 23h syncresources-6d8794bbcb-754wn 1/1 Running 0 23h -12. Verify that your cluster configs landed on the ``flyte-clusterresourcesync-config`` ConfigMap: - - .. code-block:: yaml - - clusters.yaml: - ---- - clusters: - clusterConfigs: - - auth: - certPath: /var/run/credentials/dataplane_1_cacert - tokenPath: /var/run/credentials/dataplane_1_token - type: file_path - enabled: true - endpoint: https://dataplane-1-kubeapi-endpoint.com:443 - name: dataplane_1 - - labelClusterMap: - project1: - - id: dataplane_1 - weight: 1 - ... - Configure Execution Cluster Labels ********************************** @@ -331,23 +463,14 @@ Kubernetes cluster. domain: development project: project1 - value: project1 + value: label1 .. note:: Change ``domain`` and ``project`` according to your environment. The ``value`` file has to match with the entry on ``clusterLabelMap`` that's in your ``flyte-clusterresourcesync-config`` ConfigMap. - Also, in order to automate the creation of the corresponding ``project-domain`` namespaces in the dataplane, add the following to your ``values-dataplane`` file: - - Example: - - .. code-block:: yaml - - initialProjects: - - project1 - - 2. Repeat step 1 for each project-domain mapping you need to configure, creating a YAML file for each one. + 2. Repeat step 1 for every project-domain mapping you need to configure, creating a YAML file for each one. 3. Update the execution cluster label of the project and domain: @@ -362,8 +485,14 @@ Kubernetes cluster. Updated attributes from team1 project and domain development + 4. Execute a workflow indicating project and domain: -.. tabbed:: Configure Specific Workflow + .. prompt:: bash $ + + pyflyte run --remote --project team1 --domain development example.py training_workflow \  ✔ ╱ docs-development-env  + --hyperparameters '{"C": 0.1}' + +.. tabbed:: Configure a Specific Workflow mapping 1. Create a ``workflow-ecl.yaml`` file with the following example contents: @@ -371,19 +500,25 @@ Kubernetes cluster. domain: development project: project1 - workflow: core.control_flow.run_merge_sort.merge_sort + workflow: example.training_workflow value: project1 - 3. Update execution cluster label of the project and domain + 2. Update execution cluster label of the project and domain .. prompt:: bash $ flytectl update execution-cluster-label \ -p project1 -d development \ - core.control_flow.run_merge_sort.merge_sort \ + example.training_workflow \ --attrFile workflow-ecl.yaml + 3. Execute a workflow indicationg project and domain: + + .. prompt:: bash $ + + pyflyte run --remote --project team1 --domain development example.py training_workflow \  ✔ ╱ docs-development-env  + --hyperparameters '{"C": 0.1}' Congratulations 🎉! With this, the execution of workflows belonging to a specific -project-domain or a single workflow will be scheduled on the target label +project-domain or a single specific workflow will be scheduled on the target label cluster. From 08692a7237af016b3a61c78b2efe3d7c711c4d7e Mon Sep 17 00:00:00 2001 From: davidmirror-ops Date: Fri, 29 Sep 2023 10:55:17 -0500 Subject: [PATCH 04/24] Add instructions to add clusters Signed-off-by: davidmirror-ops --- rsts/deployment/deployment/multicluster.rst | 265 +++++++++++++++++--- 1 file changed, 233 insertions(+), 32 deletions(-) diff --git a/rsts/deployment/deployment/multicluster.rst b/rsts/deployment/deployment/multicluster.rst index 4edc1cdb03..c4ba11755e 100644 --- a/rsts/deployment/deployment/multicluster.rst +++ b/rsts/deployment/deployment/multicluster.rst @@ -1,8 +1,8 @@ .. _deployment-deployment-multicluster: -################################## +###################################### Multiple Kubernetes Cluster Deployment -################################## +###################################### .. tags:: Kubernetes, Infrastructure, Advanced @@ -25,7 +25,7 @@ Scaling Beyond Kubernetes Kubernetes. -.. image:: https://raw.githubusercontent.com/flyteorg/static-resources/main/common/flyte-multicluster-arch.png +.. image:: https://raw.githubusercontent.com/flyteorg/static-resources/main/common/flyte-multicluster-arch-v2.png The case for multiple Kubernetes clusters may arise due to security constraints, cost effectiveness or a need to scale out computing resources. @@ -71,7 +71,7 @@ requests successfully, the following environment-specific requirements should be 4. An IAM Trust Relationship that associates each EKS cluster type (controlplane or dataplane) with the Service Account(s) and namespaces where the different elements of the system will run. - Use the steps in this section to complete the requirements indicated above: + Follow the steps in this section to complete the requirements indicated above: **Control plane role** @@ -135,13 +135,13 @@ requests successfully, the following environment-specific requirements should be { "Effect": "Allow", "Principal": { - "Federated": "arn:aws:iam:::oidc-provider/oidc.eks..amazonaws.com/id/" + "Federated": "arn:aws:iam:::oidc-provider/oidc.eks..amazonaws.com/id/" }, "Action": "sts:AssumeRoleWithWebIdentity", "Condition": { "StringEquals": { - "oidc.eks.us-east-1.amazonaws.com/id/66CBAF563FD1438BC98F1EF39FF8DACD:aud": "sts.amazonaws.com", - "oidc.eks.us-east-1.amazonaws.com/id/66CBAF563FD1438BC98F1EF39FF8DACD:sub": "system:serviceaccount:flyte:flytepropeller" + "oidc.eks..amazonaws.com/id/:aud": "sts.amazonaws.com", + "oidc.eks..amazonaws.com/id/:sub": "system:serviceaccount:flyte:flytepropeller" } } } @@ -154,7 +154,7 @@ requests successfully, the following environment-specific requirements should be .. prompt:: bash - eksctl create iamserviceaccount --cluster= --name=default --role-only --role-name=flyte-workers-role --attach-policy-arn --approve --region --namespace flyte + eksctl create iamserviceaccount --cluster= --name=default --role-only --role-name=flyte-workers-role --attach-policy-arn --approve --region --namespace flyte 2. Go to the **IAM** section in your **AWS Management Console** and select the role that was just created 3. Go to the **Trust Relationships** tab and **Edit the Trust Policy** @@ -182,7 +182,7 @@ requests successfully, the following environment-specific requirements should be ] } - +.. _dataplane-deployment: Data Plane Deployment ********************* @@ -190,7 +190,8 @@ Data Plane Deployment This guide assumes that you have two Kubernetes clusters and that you can access them all with ``kubectl``. -Let's call these clusters ``dataplane1`` and ``dataplane2``. +Let's call these clusters ``dataplane1`` and ``dataplane2``. In this section, you'll prepare +the first cluster only. 1. Add the ``flyteorg`` Helm repo: @@ -209,7 +210,7 @@ Let's call these clusters ``dataplane1`` and ``dataplane2``. configmap: admin: admin: - endpoint: :443 #use here the URL you're using to connect to Flyte + endpoint: :443 #indicate the URL you're using to connect to Flyte insecure: false #enables secure communication over SSL. Requires a signed certificate .. note:: @@ -221,7 +222,7 @@ Let's call these clusters ``dataplane1`` and ``dataplane2``. .. note:: - Use here the same ``values-eks`` or ``values-gcp.yaml`` file you used to deploy the controlplane. + Use the same ``values-eks.yaml`` or ``values-gcp.yaml`` file you used to deploy the controlplane. .. tabbed:: AWS @@ -240,9 +241,9 @@ Let's call these clusters ``dataplane1`` and ``dataplane2``. --values values-dataplane.yaml \ --create-namespace flyte -4. Repeat step 2 and 3 for each dataplane cluster in your environment. +.. _control-plane-deployment: -Control Plane Deployment +Control Plane configuration ********************************* For ``flyteadmin`` to access and create Kubernetes resources in one or more @@ -257,7 +258,7 @@ attached to the Service Account. As of Kubernetes 1.24 and above, the bearer tok 1. Use the following manifest to create a long-lived bearer token for the ``flyteadmin`` Service Account in your dataplane cluster: - .. prompt:: bash $ + .. prompt:: bash kubectl apply -f - <:443 @@ -389,13 +386,6 @@ attached to the Service Account. As of Kubernetes 1.24 and above, the bearer tok type: "file_path" tokenPath: "/var/run/credentials/dataplane_1_token" certPath: "/var/run/credentials/dataplane_1_cacert" - - name: "dataplane_2" - endpoint: https://:443 - enabled: true - auth: - type: "file_path" - tokenPath: "/var/run/credentials/dataplane_2_token" - certPath: "/var/run/credentials/dataplane_2_cacert" .. note:: @@ -409,7 +399,7 @@ In this configuration, ``label1`` and ``label2`` are just labels that we will us to configure the necessary mappings so workflow executions matching those labels, are scheduled on one or multiple clusters depending on the weight (e.g. ``label1`` on ``dataplane_1``) -10. Update the control plane Helm release: +9. Update the control plane Helm release: .. note:: This step will disable ``flytepropeller`` in the control plane cluster, leaving no possibility of running workflows there. @@ -431,7 +421,7 @@ on one or multiple clusters depending on the weight (e.g. ``label1`` on ``datapl --values values-controlplane.yaml \ --values values-override.yaml -11. Verify that all Pods in the ``flyte`` namespace are ``Running``: +10. Verify that all Pods in the ``flyte`` namespace are ``Running``: Example output: @@ -467,8 +457,8 @@ Kubernetes cluster. .. note:: - Change ``domain`` and ``project`` according to your environment. The ``value`` file has - to match with the entry on ``clusterLabelMap`` that's in your ``flyte-clusterresourcesync-config`` ConfigMap. + Change ``domain`` and ``project`` according to your environment. The ``value`` has + to match with the entry under ``labelClusterMap`` in the ``values-override.yaml`` file. 2. Repeat step 1 for every project-domain mapping you need to configure, creating a YAML file for each one. @@ -522,3 +512,214 @@ Kubernetes cluster. Congratulations 🎉! With this, the execution of workflows belonging to a specific project-domain or a single specific workflow will be scheduled on the target label cluster. + +Day 2 Operations +---------------- + +Add another Kubernetes cluster +****************************** + +Find in this section the necessary steps to scale out your deployment by adding one Kubernetes cluster. +The process can be repeated for additional clusters. + +.. tabbed:: AWS + + + + 1. Create the new cluster: + + .. prompt:: bash $ + + eksctl create cluster --name flyte-dataplane-2 --region --version 1.25 --vpc-private-subnets , --without-nodegroup + + .. note:: + + This is only one of multiple ways to provision an EKS cluster. Follow your organization's policies to complete this step. + + + 2. Add a nodegroup to the cluster. Typically ``t3.xlarge`` instances provide enough resources to get started. Follow your organization's policies in this regard. + + 4. Create an OIDC Provider for the new cluster: + + .. prompt:: bash $ + + eksctl utils associate-iam-oidc-provider --cluster flyte-dataplane-2 --region --approve + + 5. Take note of the OIDC Provider ID: + + .. prompt:: bash $ + + aws eks describe-cluster --region --name flyte-dataplane-2 --query "cluster.identity.oidc.issuer" --output text + + 6. Go to the **IAM** section in the **AWS Management Console** and edit the **Trust Policy** of the ``flyte-dataplane-role`` + 7. Add a new ``Principal`` with the new cluster's OIDC Provider ID. Include the ``Action`` and ``Conditions`` section: + + .. code-block:: json + + { + "Version": "2012-10-17", + "Statement": [ + { + "Effect": "Allow", + "Principal": { + "Federated": "arn:aws:iam:::oidc-provider/oidc.eks..amazonaws.com/id/" + }, + "Action": "sts:AssumeRoleWithWebIdentity", + "Condition": { + "StringLike": { + "oidc.eks..amazonaws.com/id/:aud": "sts.amazonaws.com", + "oidc.eks..amazonaws.com/id/:sub": "system:serviceaccount:flyte:flytepropeller" + } + } + }, + { + "Effect": "Allow", + "Principal": { + "Federated": "arn:aws:iam:::oidc-provider/oidc.eks..amazonaws.com/id/" + }, + "Action": "sts:AssumeRoleWithWebIdentity", + "Condition": { + "StringLike": { + "oidc.eks..amazonaws.com/id/:aud": "sts.amazonaws.com", + "oidc.eks..amazonaws.com/id/:sub": "system:serviceaccount:flyte:flytepropeller" + } + } + } + ] + } + + 7. Repeat the previous step for the ``flyte-workers-role``. The result should look like the example: + + .. code-block:: json + + { + "Version": "2012-10-17", + "Statement": [ + { + "Effect": "Allow", + "Principal": { + "Federated": "arn:aws:iam:::oidc-provider/oidc.eks..amazonaws.com/id/" + }, + "Action": "sts:AssumeRoleWithWebIdentity", + "Condition": { + "StringLike": { + "oidc.eks..amazonaws.com/id/:aud": "sts.amazonaws.com", + "oidc.eks..amazonaws.com/id/:sub": "system:serviceaccount:*:default" + } + } + }, + { + "Effect": "Allow", + "Principal": { + "Federated": "arn:aws:iam:::oidc-provider/oidc.eks..amazonaws.com/id/" + }, + "Action": "sts:AssumeRoleWithWebIdentity", + "Condition": { + "StringLike": { + "oidc.eks..amazonaws.com/id/:aud": "sts.amazonaws.com", + "oidc.eks..amazonaws.com/id/:sub": "system:serviceaccount:*:default" + } + } + } + ] + } + + 8. Connect to your new EKS cluster and create the ``flyte`` namespace: + + .. prompt:: bash $ + + kubectl create ns flyte + + 9. Install the dataplane Helm chart following the steps in the **Dataplane deployment** section. See :ref:`section `. + 10. Follow steps 1-3 in the **Controlplane configuration** section (see :ref:`section `) to generate and populate a new section in your ``secrets.yaml`` file + + Example: + + .. code-block:: yaml + + apiVersion: v1 + kind: Secret + metadata: + name: cluster-credentials + namespace: flyte + type: Opaque + stringData: + dataplane_1_token: eyJhbGciOiJSUzI1NiIsImtpZCI6IlM0WlhfMm1Yb1U4Z1V4R0t6... + dataplane_1_cacert: | + -----BEGIN CERTIFICATE----- + MIIDB... + -----END CERTIFICATE----- + dataplane_2_token: eyJhbGciOiJSUzI1NiIsImtpZCI6IjNxZ0tZRXBnNU0zWk1oLUJrUlc... + dataplane_2_cacert: | + -----BEGIN CERTIFICATE----- + MIIDBT... + -----END CERTIFICATE----- + + 12. Connect to the controlplane cluster and update the ``cluster-credentials`` Secret: + + .. prompt:: bash $ + + kubect apply -f secrets.yaml + + 13. Go to your ``values-override.yaml`` file and add the information of the new cluster. Adding a new label is not entirely needed. + Nevertheless, in the following example a new label is created to illustrate Flyte's capability to schedule workloads on different clusters + in response to user-defined mappings of ``project``, ``domain`` and ``label``:abbr: + + .. code-block:: yaml + + ... #all the above content remains the same + configmap: + clusters: + labelClusterMap: + label1: + - id: dataplane_1 + weight: 1 + label2: + - id: dataplane_2 + weight: 1 + clusterConfigs: + - name: "dataplane_1" + endpoint: https://.com:443 + enabled: true + auth: + type: "file_path" + tokenPath: "/var/run/credentials/dataplane_1_token" + certPath: "/var/run/credentials/dataplane_1_cacert" + - name: "dataplane_2" + endpoint: https://:443 + enabled: true + auth: + type: "file_path" + tokenPath: "/var/run/credentials/dataplane_2_token" + certPath: "/var/run/credentials/dataplane_2_cacert" + + 14. Update the Helm release in the controlplane cluster: + + .. prompt:: bash $ + + helm upgrade flyte-core-control flyteorg/flyte-core -n flyte --values values-controlplane.yaml --values values-eks.yaml --values values-override.yaml + + 15. Create a new execution cluster labels file with the following sample content: + + .. code-block:: yaml + + domain: production + project: team1 + value: label2 + + 16. Update the cluster execution labels for the project: + + .. prompt:: bash $ + + flytectl update execution-cluster-label --attrFile ecl-production.yaml + + 17. Finally, submit a workflow execution that matches the label of the new cluster: + + .. prompt:: bash $ + + pyflyte run --remote --project team1 --domain production example.py training_workflow \  ✔ ╱ base  + --hyperparameters '{"C": 0.1}' + + 18. A succesful execution should be visible on the UI, confirming it ran in the new cluster: + + .. image:: https://raw.githubusercontent.com/flyteorg/static-resources/main/common/multicluster-execution.png \ No newline at end of file From be2abfd1451a830e494ac6cf6b7b64834cb52f12 Mon Sep 17 00:00:00 2001 From: davidmirror-ops Date: Fri, 29 Sep 2023 10:57:50 -0500 Subject: [PATCH 05/24] Fix typos Signed-off-by: davidmirror-ops --- rsts/deployment/deployment/multicluster.rst | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/rsts/deployment/deployment/multicluster.rst b/rsts/deployment/deployment/multicluster.rst index c4ba11755e..40e6670f97 100644 --- a/rsts/deployment/deployment/multicluster.rst +++ b/rsts/deployment/deployment/multicluster.rst @@ -9,7 +9,7 @@ Multiple Kubernetes Cluster Deployment .. note:: The multicluster deployment described in this section, assumes you have deployed - the ``flyte-core`` Helm chart, which runs the individual Flyte componentes separately. + the ``flyte-core`` Helm chart, which runs the individual Flyte components separately. This is needed because in a multicluster setup, the execution engine is deployed to multiple K8s clusters; it won't work with the ``flyte-binary`` Helm chart, since it deploys all Flyte services as one single binary. @@ -502,7 +502,7 @@ Kubernetes cluster. example.training_workflow \ --attrFile workflow-ecl.yaml - 3. Execute a workflow indicationg project and domain: + 3. Execute a workflow indicating project and domain: .. prompt:: bash $ @@ -720,6 +720,6 @@ The process can be repeated for additional clusters. pyflyte run --remote --project team1 --domain production example.py training_workflow \  ✔ ╱ base  --hyperparameters '{"C": 0.1}' - 18. A succesful execution should be visible on the UI, confirming it ran in the new cluster: + 18. A successful execution should be visible on the UI, confirming it ran in the new cluster: .. image:: https://raw.githubusercontent.com/flyteorg/static-resources/main/common/multicluster-execution.png \ No newline at end of file From 4c51dc2c2443c2186b3202780d2f1a75984996a9 Mon Sep 17 00:00:00 2001 From: davidmirror-ops Date: Fri, 29 Sep 2023 11:01:12 -0500 Subject: [PATCH 06/24] Fix JSON indentation in example Signed-off-by: davidmirror-ops --- rsts/deployment/deployment/multicluster.rst | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/rsts/deployment/deployment/multicluster.rst b/rsts/deployment/deployment/multicluster.rst index 40e6670f97..b27ef59a06 100644 --- a/rsts/deployment/deployment/multicluster.rst +++ b/rsts/deployment/deployment/multicluster.rst @@ -53,8 +53,8 @@ requests successfully, the following environment-specific requirements should be "s3:GetObject*", "s3:ListBucket", "s3:PutObject*" - ], - "Resource": [ + ], + "Resource": [ "arn:aws:s3:::*", "arn:aws:s3:::*/*" ], From 70737b731691376a2dfb118715692ff42b70721f Mon Sep 17 00:00:00 2001 From: davidmirror-ops Date: Fri, 29 Sep 2023 11:06:08 -0500 Subject: [PATCH 07/24] Fix JSON indentation in example 2nd try Signed-off-by: davidmirror-ops --- rsts/deployment/deployment/index.rst | 2 +- rsts/deployment/deployment/multicluster.rst | 6 +++--- 2 files changed, 4 insertions(+), 4 deletions(-) diff --git a/rsts/deployment/deployment/index.rst b/rsts/deployment/deployment/index.rst index e253ae480c..bec8cef163 100644 --- a/rsts/deployment/deployment/index.rst +++ b/rsts/deployment/deployment/index.rst @@ -85,7 +85,7 @@ There are three different paths for deploying a Flyte cluster: This option is appropriate if all your compute can `fit on one EKS cluster `__ . As of this writing, a single Flyte cluster can handle more than 13,000 nodes. - Regardless of using single or multiple Kubernetes clusters for Flyte, note that ``FlytePropeller`` -tha main data plane component- can be sharded as well, if scale demands require it. + Regardless of using single or multiple Kubernetes clusters for Flyte, note that ``FlytePropeller`` -the main data plane component- can be sharded as well, if scale demands require it. Helm ==== diff --git a/rsts/deployment/deployment/multicluster.rst b/rsts/deployment/deployment/multicluster.rst index b27ef59a06..5745e93d37 100644 --- a/rsts/deployment/deployment/multicluster.rst +++ b/rsts/deployment/deployment/multicluster.rst @@ -54,9 +54,9 @@ requests successfully, the following environment-specific requirements should be "s3:ListBucket", "s3:PutObject*" ], - "Resource": [ - "arn:aws:s3:::*", - "arn:aws:s3:::*/*" + "Resource": [ + "arn:aws:s3:::*", + "arn:aws:s3:::*/*" ], 2. At least three IAM Roles configured: one for the controlplane components, another for the dataplane From bb25ab6288f65bdabd98778f2fc3c97920cd2549 Mon Sep 17 00:00:00 2001 From: davidmirror-ops Date: Fri, 29 Sep 2023 11:10:41 -0500 Subject: [PATCH 08/24] Fix JSON missing blank line Signed-off-by: davidmirror-ops --- rsts/deployment/deployment/multicluster.rst | 3 +++ 1 file changed, 3 insertions(+) diff --git a/rsts/deployment/deployment/multicluster.rst b/rsts/deployment/deployment/multicluster.rst index 5745e93d37..1385c21f83 100644 --- a/rsts/deployment/deployment/multicluster.rst +++ b/rsts/deployment/deployment/multicluster.rst @@ -49,12 +49,15 @@ requests successfully, the following environment-specific requirements should be .. code-block:: json "Action": [ + "s3:DeleteObject*", "s3:GetObject*", "s3:ListBucket", "s3:PutObject*" ], + "Resource": [ + "arn:aws:s3:::*", "arn:aws:s3:::*/*" ], From cf77de93d64a36cd3039368f67f9770c6b7bb4c1 Mon Sep 17 00:00:00 2001 From: davidmirror-ops Date: Fri, 29 Sep 2023 12:53:45 -0500 Subject: [PATCH 09/24] Fix JSON missing blank line 3rd try Signed-off-by: davidmirror-ops --- rsts/deployment/deployment/multicluster.rst | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/rsts/deployment/deployment/multicluster.rst b/rsts/deployment/deployment/multicluster.rst index 1385c21f83..daebe7118c 100644 --- a/rsts/deployment/deployment/multicluster.rst +++ b/rsts/deployment/deployment/multicluster.rst @@ -54,12 +54,14 @@ requests successfully, the following environment-specific requirements should be "s3:GetObject*", "s3:ListBucket", "s3:PutObject*" + ], - + "Resource": [ "arn:aws:s3:::*", "arn:aws:s3:::*/*" + ], 2. At least three IAM Roles configured: one for the controlplane components, another for the dataplane From 56cfed8e589b925d19410fd08872944d15da9f3c Mon Sep 17 00:00:00 2001 From: davidmirror-ops Date: Fri, 29 Sep 2023 12:59:30 -0500 Subject: [PATCH 10/24] Fix JSON missing blank line 4th try Signed-off-by: davidmirror-ops --- rsts/deployment/deployment/multicluster.rst | 53 ++++++++++----------- 1 file changed, 26 insertions(+), 27 deletions(-) diff --git a/rsts/deployment/deployment/multicluster.rst b/rsts/deployment/deployment/multicluster.rst index daebe7118c..5b00cbf1cc 100644 --- a/rsts/deployment/deployment/multicluster.rst +++ b/rsts/deployment/deployment/multicluster.rst @@ -48,20 +48,19 @@ requests successfully, the following environment-specific requirements should be .. code-block:: json - "Action": [ + "Action": [ "s3:DeleteObject*", "s3:GetObject*", "s3:ListBucket", "s3:PutObject*" - ], - "Resource": [ + "Resource": [ "arn:aws:s3:::*", "arn:aws:s3:::*/*" - + ], 2. At least three IAM Roles configured: one for the controlplane components, another for the dataplane @@ -148,10 +147,10 @@ requests successfully, the following environment-specific requirements should be "oidc.eks..amazonaws.com/id/:aud": "sts.amazonaws.com", "oidc.eks..amazonaws.com/id/:sub": "system:serviceaccount:flyte:flytepropeller" } - } - } - ] - } + } + } + ] + } **Workers role** @@ -169,23 +168,23 @@ requests successfully, the following environment-specific requirements should be .. code-block:: json { - "Version": "2012-10-17", - "Statement": [ - { - "Effect": "Allow", - "Principal": { - "Federated": "arn:aws:iam:::oidc-provider/oidc.eks..amazonaws.com/id/" - }, - "Action": "sts:AssumeRoleWithWebIdentity", - "Condition": { - "StringLike": { - "oidc.eks..amazonaws.com/id/:sub": "system:serviceaccount:*:default", - "oidc.eks..amazonaws.com/id/:aud": "sts.amazonaws.com" - } - } - } - ] - } + "Version": "2012-10-17", + "Statement": [ + { + "Effect": "Allow", + "Principal": { + "Federated": "arn:aws:iam:::oidc-provider/oidc.eks..amazonaws.com/id/" + }, + "Action": "sts:AssumeRoleWithWebIdentity", + "Condition": { + "StringLike": { + "oidc.eks..amazonaws.com/id/:sub": "system:serviceaccount:*:default", + "oidc.eks..amazonaws.com/id/:aud": "sts.amazonaws.com" + } + } + } + ] + } .. _dataplane-deployment: @@ -626,8 +625,8 @@ The process can be repeated for additional clusters. } } } - ] - } + ] + } 8. Connect to your new EKS cluster and create the ``flyte`` namespace: From 828359b033e0e0acb7b3c98516e3ab862d17b8df Mon Sep 17 00:00:00 2001 From: davidmirror-ops Date: Fri, 29 Sep 2023 13:11:12 -0500 Subject: [PATCH 11/24] Fix JSON syntax Signed-off-by: davidmirror-ops --- rsts/deployment/deployment/multicluster.rst | 16 +++++++--------- 1 file changed, 7 insertions(+), 9 deletions(-) diff --git a/rsts/deployment/deployment/multicluster.rst b/rsts/deployment/deployment/multicluster.rst index 5b00cbf1cc..c80de93289 100644 --- a/rsts/deployment/deployment/multicluster.rst +++ b/rsts/deployment/deployment/multicluster.rst @@ -48,20 +48,18 @@ requests successfully, the following environment-specific requirements should be .. code-block:: json + { "Action": [ - "s3:DeleteObject*", "s3:GetObject*", "s3:ListBucket", "s3:PutObject*" - ], - - "Resource": [ - - "arn:aws:s3:::*", - "arn:aws:s3:::*/*" - - ], + ], + "Resource": [ + "arn:aws:s3:::*", + "arn:aws:s3:::/*" + ] + } 2. At least three IAM Roles configured: one for the controlplane components, another for the dataplane and one more for the worker Pods that are bootstraped by Flyte to execute workflow tasks. From c7f7e3e6590ad00c3ed20c8c1aa725f69f822201 Mon Sep 17 00:00:00 2001 From: davidmirror-ops Date: Fri, 29 Sep 2023 13:38:39 -0500 Subject: [PATCH 12/24] Fix JSON syntax 6th try Signed-off-by: davidmirror-ops --- rsts/deployment/deployment/multicluster.rst | 10 ++++++++-- 1 file changed, 8 insertions(+), 2 deletions(-) diff --git a/rsts/deployment/deployment/multicluster.rst b/rsts/deployment/deployment/multicluster.rst index c80de93289..ef2674573d 100644 --- a/rsts/deployment/deployment/multicluster.rst +++ b/rsts/deployment/deployment/multicluster.rst @@ -49,16 +49,22 @@ requests successfully, the following environment-specific requirements should be .. code-block:: json { + "Action": [ + "s3:DeleteObject*", "s3:GetObject*", "s3:ListBucket", "s3:PutObject*" - ], + + ], + "Resource": [ + "arn:aws:s3:::*", "arn:aws:s3:::/*" - ] + + ] } 2. At least three IAM Roles configured: one for the controlplane components, another for the dataplane From 18f0fc882dd0a04b9593d5725e40886974ea7e65 Mon Sep 17 00:00:00 2001 From: davidmirror-ops Date: Mon, 2 Oct 2023 16:47:14 -0500 Subject: [PATCH 13/24] Remove JSON block Signed-off-by: davidmirror-ops --- rsts/deployment/deployment/multicluster.rst | 18 ------------------ 1 file changed, 18 deletions(-) diff --git a/rsts/deployment/deployment/multicluster.rst b/rsts/deployment/deployment/multicluster.rst index ef2674573d..49c850e013 100644 --- a/rsts/deployment/deployment/multicluster.rst +++ b/rsts/deployment/deployment/multicluster.rst @@ -48,24 +48,6 @@ requests successfully, the following environment-specific requirements should be .. code-block:: json - { - - "Action": [ - - "s3:DeleteObject*", - "s3:GetObject*", - "s3:ListBucket", - "s3:PutObject*" - - ], - - "Resource": [ - - "arn:aws:s3:::*", - "arn:aws:s3:::/*" - - ] - } 2. At least three IAM Roles configured: one for the controlplane components, another for the dataplane and one more for the worker Pods that are bootstraped by Flyte to execute workflow tasks. From e5cea212ef8354732fa64fa73b899f8179df9685 Mon Sep 17 00:00:00 2001 From: davidmirror-ops Date: Tue, 3 Oct 2023 12:15:21 -0500 Subject: [PATCH 14/24] Fix error in line 57 Signed-off-by: davidmirror-ops --- rsts/deployment/deployment/cloud_production.rst | 10 +++++----- 1 file changed, 5 insertions(+), 5 deletions(-) diff --git a/rsts/deployment/deployment/cloud_production.rst b/rsts/deployment/deployment/cloud_production.rst index 804dcbd726..1736f1eb4c 100644 --- a/rsts/deployment/deployment/cloud_production.rst +++ b/rsts/deployment/deployment/cloud_production.rst @@ -32,7 +32,7 @@ To turn on ingress, update your ``values.yaml`` file to include the following bl :language: yaml :lines: 127-135 - .. group-tab:: ``flyte-binary`` on EKS using ALB + .. group-tab:: ``flyte-binary``/ on EKS using ALB .. code-block:: yaml @@ -54,10 +54,10 @@ To turn on ingress, update your ``values.yaml`` file to include the following bl .. group-tab:: ``flyte-core`` on GCP using NGINX - .. literalinclude:: ../../../charts/flyte-core/values-gcp.yaml - :caption: charts/flyte-core/values-gcp.yaml - :language: yaml - :lines: 156-164 + .. literalinclude:: ../../../charts/flyte-core/values-gcp.yaml + :caption: charts/flyte-core/values-gcp.yaml + :language: yaml + :lines: 156-164 *************** From e9b685b73302c88790d313294c5a5dd2663a572e Mon Sep 17 00:00:00 2001 From: davidmirror-ops Date: Tue, 3 Oct 2023 12:16:52 -0500 Subject: [PATCH 15/24] Fix spelling Signed-off-by: davidmirror-ops --- rsts/deployment/deployment/multicluster.rst | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/rsts/deployment/deployment/multicluster.rst b/rsts/deployment/deployment/multicluster.rst index 49c850e013..98e397b0da 100644 --- a/rsts/deployment/deployment/multicluster.rst +++ b/rsts/deployment/deployment/multicluster.rst @@ -50,7 +50,7 @@ requests successfully, the following environment-specific requirements should be 2. At least three IAM Roles configured: one for the controlplane components, another for the dataplane - and one more for the worker Pods that are bootstraped by Flyte to execute workflow tasks. + and one more for the worker Pods that are bootstrapped by Flyte to execute workflow tasks. 3. An OIDC Provider associated with each of your EKS clusters. You can use the following command to create and connect the Provider: From b257695a081e1ab5f7943ffe9668f6bc0e2a1b81 Mon Sep 17 00:00:00 2001 From: davidmirror-ops Date: Tue, 3 Oct 2023 13:00:18 -0500 Subject: [PATCH 16/24] Apply feedback from review Signed-off-by: davidmirror-ops --- rsts/deployment/deployment/index.rst | 5 +- rsts/deployment/deployment/multicluster.rst | 72 ++++++++++----------- 2 files changed, 38 insertions(+), 39 deletions(-) diff --git a/rsts/deployment/deployment/index.rst b/rsts/deployment/deployment/index.rst index bec8cef163..67b1f9f1c4 100644 --- a/rsts/deployment/deployment/index.rst +++ b/rsts/deployment/deployment/index.rst @@ -85,7 +85,10 @@ There are three different paths for deploying a Flyte cluster: This option is appropriate if all your compute can `fit on one EKS cluster `__ . As of this writing, a single Flyte cluster can handle more than 13,000 nodes. - Regardless of using single or multiple Kubernetes clusters for Flyte, note that ``FlytePropeller`` -the main data plane component- can be sharded as well, if scale demands require it. + Regardless of using single or multiple Kubernetes clusters for Flyte, note that ``FlytePropeller`` -the main data plane component- can be sharded as well if scale demands require it. + See `Automatic scale-out https://docs.flyte.org/en/latest/deployment/configuration/performance.html#automatic-scale-out`__ to learn more about the sharding strategy. + + Helm ==== diff --git a/rsts/deployment/deployment/multicluster.rst b/rsts/deployment/deployment/multicluster.rst index 98e397b0da..7519709c5b 100644 --- a/rsts/deployment/deployment/multicluster.rst +++ b/rsts/deployment/deployment/multicluster.rst @@ -47,9 +47,20 @@ requests successfully, the following environment-specific requirements should be 1. An IAM Policy that defines the permissions needed for Flyte. A minimum set of permissions include: .. code-block:: json + + "Action": [ + "s3:DeleteObject*", + "s3:GetObject*", + "s3:ListBucket", + "s3:PutObject*" + ], + "Resource": [ + "arn:aws:s3:::*", + "arn:aws:s3:::*/*" + ], - 2. At least three IAM Roles configured: one for the controlplane components, another for the dataplane + 2. At least three IAM Roles configured: one for the control plane components, another for the data plane and one more for the worker Pods that are bootstrapped by Flyte to execute workflow tasks. 3. An OIDC Provider associated with each of your EKS clusters. You can use the following command to create and connect the Provider: @@ -58,7 +69,7 @@ requests successfully, the following environment-specific requirements should be eksctl utils associate-iam-oidc-provider --cluster --approve - 4. An IAM Trust Relationship that associates each EKS cluster type (controlplane or dataplane) with the Service Account(s) and namespaces + 4. An IAM Trust Relationship that associates each EKS cluster type (control plane or data plane) with the Service Account(s) and namespaces where the different elements of the system will run. Follow the steps in this section to complete the requirements indicated above: @@ -212,7 +223,7 @@ the first cluster only. .. note:: - Use the same ``values-eks.yaml`` or ``values-gcp.yaml`` file you used to deploy the controlplane. + Use the same ``values-eks.yaml`` or ``values-gcp.yaml`` file you used to deploy the control plane. .. tabbed:: AWS @@ -246,7 +257,7 @@ In order to verify requests, the Kubernetes API Server expects a `signed bearer attached to the Service Account. As of Kubernetes 1.24 and above, the bearer token has to be generated manually. -1. Use the following manifest to create a long-lived bearer token for the ``flyteadmin`` Service Account in your dataplane cluster: +1. Use the following manifest to create a long-lived bearer token for the ``flyteadmin`` Service Account in your data plane cluster: .. prompt:: bash @@ -278,14 +289,14 @@ attached to the Service Account. As of Kubernetes 1.24 and above, the bearer tok .. note:: The credentials have two parts (``CA cert`` and ``bearer token``). -3. Copy the bearer token of the first dataplane cluster's secret to your clipboard using the following command: +3. Copy the bearer token of the first data plane cluster's secret to your clipboard using the following command: .. prompt:: bash $ kubectl get secret -n flyte dataplane1-token \ -o jsonpath='{.data.token}' | base64 -D | pbcopy -4. Go to ``secrets.yaml`` and add a new entry under ``stringData`` with the dataplane cluster token: +4. Go to ``secrets.yaml`` and add a new entry under ``stringData`` with the data plane cluster token: .. code-block:: yaml :caption: secrets.yaml @@ -297,8 +308,7 @@ attached to the Service Account. As of Kubernetes 1.24 and above, the bearer tok namespace: flyte type: Opaque stringData: - dataplane_1_token: eyJhbGciOiJSUzI1NiIsImtpZCI6IlM0WlhfMm1Yb1U4Z1V4R0t6STZDdkhGTVVvVDBZcDAxbjdVbDc1Y1VxR28ifQ.eyJpc3MiOiJrdWJlcm5ldGVzL3NlcnZpY2VhY2NvdW50Iiwia3ViZXJuZXRlcy5pby9zZXJ2aWNlYWNjb3VudC9uYW1lc3BhY2UiOiJmbHl0ZSIsImt1YmVybmV0ZXMuaW8vc2VydmljZWFjY291bnQvc2VjcmV0Lm5hbWUiOiJkYXRhcGxhbmUxLXRva2VuIiwia3ViZXJuZXRlcy5pby9zZXJ2aWNlYWNjb3VudC9zZXJ2aWNlLWFjY291bnQubmFtZSI6ImZseXRlYWRtaW4iLCJrdWJlcm5ldGVzLmlvL3NlcnZpY2VhY2NvdW50L3NlcnZpY2UtYWNjb3VudC51aWQiOiJkNTdhNjMwZi00ZTZmLTQzNTgtYjQwOS00M2UyMTlhYjg4NTEiLCJzdWIiOiJzeXN0ZW06c2VydmljZWFjY291bnQ6Zmx5dGU6Zmx5dGVhZG1pbiJ9.Fbn5qJjWP1wyJ08PgZXnrrUdKEhLRYqUzG9Vff1maFO3yBKkv_EBuYc2hjGeW5_ORCrT9qKcFAd3AE_tM3P8AQ-dRoA6K-RcJ2qinxabWmk9RYbtKFr1zujswU6dm-iB7JkjY7yYyBRewbw_m4QRacgG8K11c8bYZ9SZoV86EqGmsNdeCPuv5GiPBiJ0p3hgta4kZ1knCNf8qLBUQVZ-9G5vabYM0lyD6dvGOqlOs1bMzgLeijvpQN471dTLmIZ71anOG2gkuJW_AusnWDF_0rJ3yfISf3dRkhXkLswyq-awgtKbz6ZYjPaJ1eA8dNvSlbDoNrMXOGNlx7p7KhOY-w - + dataplane_1_token: 5. Obtain the corresponding certificate: .. prompt:: bash $ @@ -318,29 +328,13 @@ attached to the Service Account. As of Kubernetes 1.24 and above, the bearer tok namespace: flyte type: Opaque stringData: - dataplane_1_token: eyJhbGciOiJSUzI1NiIsImtpZCI6IlM0WlhfMm1Yb1U4Z1V4R0t6STZDdkhGTVVvVDBZcDAxbjdVbDc1Y1VxR28ifQ.eyJpc3MiOiJrdWJlcm5ldGVzL3NlcnZpY2VhY2NvdW50Iiwia3ViZXJuZXRlcy5pby9zZXJ2aWNlYWNjb3VudC9uYW1lc3BhY2UiOiJmbHl0ZSIsImt1YmVybmV0ZXMuaW8vc2VydmljZWFjY291bnQvc2VjcmV0Lm5hbWUiOiJkYXRhcGxhbmUxLXRva2VuIiwia3ViZXJuZXRlcy5pby9zZXJ2aWNlYWNjb3VudC9zZXJ2aWNlLWFjY291bnQubmFtZSI6ImZseXRlYWRtaW4iLCJrdWJlcm5ldGVzLmlvL3NlcnZpY2VhY2NvdW50L3NlcnZpY2UtYWNjb3VudC51aWQiOiJkNTdhNjMwZi00ZTZmLTQzNTgtYjQwOS00M2UyMTlhYjg4NTEiLCJzdWIiOiJzeXN0ZW06c2VydmljZWFjY291bnQ6Zmx5dGU6Zmx5dGVhZG1pbiJ9.Fbn5qJjWP1wyJ08PgZXnrrUdKEhLRYqUzG9Vff1maFO3yBKkv_EBuYc2hjGeW5_ORCrT9qKcFAd3AE_tM3P8AQ-dRoA6K-RcJ2qinxabWmk9RYbtKFr1zujswU6dm-iB7JkjY7yYyBRewbw_m4QRacgG8K11c8bYZ9SZoV86EqGmsNdeCPuv5GiPBiJ0p3hgta4kZ1knCNf8qLBUQVZ-9G5vabYM0lyD6dvGOqlOs1bMzgLeijvpQN471dTLmIZ71anOG2gkuJW_AusnWDF_0rJ3yfISf3dRkhXkLswyq-awgtKbz6ZYjPaJ1eA8dNvSlbDoNrMXOGNlx7p7KhOY-w + dataplane_1_token: dataplane_1_cacert: | -----BEGIN CERTIFICATE----- - MIIDBTCCAe2gAwIBAgIIQREjtnmWbyYwDQYJKoZIhvcNAQELBQAwFTETMBEGA1UE - AxMKa3ViZXJuZXRlczAeFw0yMzA5MTIxNzIzMDhaFw0zMzA5MDkxNzIzMDhaMBUx - EzARBgNVBAMTCmt1YmVybmV0ZXMwggEiMA0GCSqGSIb3DQEBAQUAA4IBDwAwggEK - AoIBAQDn53QComJ6lhauUATrnV7DtDDxreGQDxxDp8HrU0nwvzT5e4ewRJ+6+VKH - ru6iV8hRSH99XdsbRhb5+HrM9bxwDduTZ4wOsdmI1ghXvbBpOEHQTJFiSoWY82LS - eyMrlwmo8TU8NUXhN+iE+z+cW/QQUKPnNnDcZYWpWOZYjtdtSoYbvU98/cMrRaNg - IoDMiC6uWz3aNE9SSodE5IpTQ6VhhmZfU8eGO6+2Nl0l73uVSiKUyaJm/DdyUnp1 - iAx7qMPZw+Bfxa6P8PjrkFTpiccPFsy+9mnmoLfbA07QMx0txMFDb/YGOdBYox7n - V+yOst26TvfNnl4lW4o7cBzjwEuxAgMBAAGjWTBXMA4GA1UdDwEB/wQEAwICpDAP - BgNVHRMBAf8EBTADAQH/MB0GA1UdDgQWBBSPzcH1/5DDrurz+Tu8kWwUZoHlcTAV - BgNVHREEDjAMggprdWJlcm5ldGVzMA0GCSqGSIb3DQEBCwUAA4IBAQAXcAFgusLs - PfpqzeQcrmDYnywW067+VLwGn906lpceoJbjxL9NQsHSlluXzS8AqljabbweetKD - +eYfvSDa+yWHSA0ygS9ddCutMgNtsAm5H8LktKvnhERuZKBDUFYG2HFFlIh5mUak - 5TkaYC3FzBsTUoHg+uBqOPSUKaQhzFsIj4a94oZfpGMF+2Yd7vjTeNjuXPbdpYVK - 2avEma8RucJIhIs5w8pgnclSpNXwyz69HrUJ+FxADot6+YHuirpL31XLFPL/jqX4 - Hde3eDWJs4p6Rr0bOGmolOznGUbLdlBsM1QsHfiipMe7XqrBheNWAQFU+rFeHr8L - tbjBbrxuMPKV + -----END CERTIFICATE----- -7. Connect to your controlplane cluster and create the ``cluster-credentials`` secret: +7. Connect to your control plane cluster and create the ``cluster-credentials`` secret: .. prompt:: bash $ @@ -386,8 +380,10 @@ attached to the Service Account. As of Kubernetes 1.24 and above, the bearer tok kubectl cluster-info In this configuration, ``label1`` and ``label2`` are just labels that we will use later in the process -to configure the necessary mappings so workflow executions matching those labels, are scheduled -on one or multiple clusters depending on the weight (e.g. ``label1`` on ``dataplane_1``) +to configure mappings that enable workflow executions matching those labels, to be scheduled +on one or multiple clusters depending on the weight (e.g. ``label1`` on ``dataplane_1``). The ``weight`` is the +priority of a specific cluster, relative to the other clusters under the ``labelClusterMap`` entry. The total sum of weights under a particular +label has to be 1. 9. Update the control plane Helm release: @@ -417,7 +413,7 @@ Example output: .. prompt:: bash $ - kubectl get pods -n flyte  ✔ ╱ base  ╱ fthw-controlplane ⎈ + kubectl get pods -n flyte NAME READY STATUS RESTARTS AGE datacatalog-86f6b9bf64-bp2cj 1/1 Running 0 23h datacatalog-86f6b9bf64-fjzcp 1/1 Running 0 23h @@ -620,8 +616,8 @@ The process can be repeated for additional clusters. kubectl create ns flyte - 9. Install the dataplane Helm chart following the steps in the **Dataplane deployment** section. See :ref:`section `. - 10. Follow steps 1-3 in the **Controlplane configuration** section (see :ref:`section `) to generate and populate a new section in your ``secrets.yaml`` file + 9. Install the data plane Helm chart following the steps in the **Data plane deployment** section. See :ref:`section `. + 10. Follow steps 1-3 in the **control plane configuration** section (see :ref:`section `) to generate and populate a new section in your ``secrets.yaml`` file Example: @@ -634,18 +630,18 @@ The process can be repeated for additional clusters. namespace: flyte type: Opaque stringData: - dataplane_1_token: eyJhbGciOiJSUzI1NiIsImtpZCI6IlM0WlhfMm1Yb1U4Z1V4R0t6... + dataplane_1_token: dataplane_1_cacert: | -----BEGIN CERTIFICATE----- - MIIDB... + -----END CERTIFICATE----- - dataplane_2_token: eyJhbGciOiJSUzI1NiIsImtpZCI6IjNxZ0tZRXBnNU0zWk1oLUJrUlc... + dataplane_2_token: dataplane_2_cacert: | -----BEGIN CERTIFICATE----- - MIIDBT... + -----END CERTIFICATE----- - 12. Connect to the controlplane cluster and update the ``cluster-credentials`` Secret: + 12. Connect to the control plane cluster and update the ``cluster-credentials`` Secret: .. prompt:: bash $ @@ -683,7 +679,7 @@ The process can be repeated for additional clusters. tokenPath: "/var/run/credentials/dataplane_2_token" certPath: "/var/run/credentials/dataplane_2_cacert" - 14. Update the Helm release in the controlplane cluster: + 14. Update the Helm release in the control plane cluster: .. prompt:: bash $ From ee5c001e2490ee7c39543da648b56089b35ee2b2 Mon Sep 17 00:00:00 2001 From: davidmirror-ops Date: Tue, 3 Oct 2023 13:11:04 -0500 Subject: [PATCH 17/24] Fix hyperlink Signed-off-by: davidmirror-ops --- rsts/deployment/deployment/index.rst | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/rsts/deployment/deployment/index.rst b/rsts/deployment/deployment/index.rst index 67b1f9f1c4..2f7033eafd 100644 --- a/rsts/deployment/deployment/index.rst +++ b/rsts/deployment/deployment/index.rst @@ -86,7 +86,7 @@ There are three different paths for deploying a Flyte cluster: As of this writing, a single Flyte cluster can handle more than 13,000 nodes. Regardless of using single or multiple Kubernetes clusters for Flyte, note that ``FlytePropeller`` -the main data plane component- can be sharded as well if scale demands require it. - See `Automatic scale-out https://docs.flyte.org/en/latest/deployment/configuration/performance.html#automatic-scale-out`__ to learn more about the sharding strategy. + See `Automatic scale-out `__ to learn more about the sharding strategy. From 7db1b7ffd0d471f3bc9e24fdb7ccbf4b5ae36e4a Mon Sep 17 00:00:00 2001 From: davidmirror-ops Date: Tue, 3 Oct 2023 13:45:40 -0500 Subject: [PATCH 18/24] Fix blank space Signed-off-by: davidmirror-ops --- rsts/deployment/deployment/multicluster.rst | 1 + 1 file changed, 1 insertion(+) diff --git a/rsts/deployment/deployment/multicluster.rst b/rsts/deployment/deployment/multicluster.rst index 7519709c5b..814063c31f 100644 --- a/rsts/deployment/deployment/multicluster.rst +++ b/rsts/deployment/deployment/multicluster.rst @@ -309,6 +309,7 @@ attached to the Service Account. As of Kubernetes 1.24 and above, the bearer tok type: Opaque stringData: dataplane_1_token: + 5. Obtain the corresponding certificate: .. prompt:: bash $ From 0e2d871219d996cdab5080fdcb7b4059bdef7e3f Mon Sep 17 00:00:00 2001 From: davidmirror-ops Date: Tue, 3 Oct 2023 16:26:46 -0500 Subject: [PATCH 19/24] Incorporate review Signed-off-by: davidmirror-ops --- rsts/deployment/deployment/multicluster.rst | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/rsts/deployment/deployment/multicluster.rst b/rsts/deployment/deployment/multicluster.rst index 814063c31f..dcda7b859d 100644 --- a/rsts/deployment/deployment/multicluster.rst +++ b/rsts/deployment/deployment/multicluster.rst @@ -249,9 +249,9 @@ Control Plane configuration For ``flyteadmin`` to access and create Kubernetes resources in one or more Flyte data plane clusters , it needs credentials to each cluster. -Flyte makes use of Kubernetess Service Accounts to enable every data plane cluster to perform -authenticated requests to the Kubernetes API Server. -The default behaviour is that ``flyteadmin`` creates a `ServiceAccount `_ +Flyte makes use of Kubernetes Service Accounts to enable every control plane cluster to perform +authenticated requests to the data plane Kubernetes API Server. +The default behaviour is that the Helm chart creates a `ServiceAccount `_ in each data plane cluster. In order to verify requests, the Kubernetes API Server expects a `signed bearer token `__ attached to the Service Account. As of Kubernetes 1.24 and above, the bearer token has to be generated manually. @@ -309,7 +309,7 @@ attached to the Service Account. As of Kubernetes 1.24 and above, the bearer tok type: Opaque stringData: dataplane_1_token: - + 5. Obtain the corresponding certificate: .. prompt:: bash $ From 9199f4157838f85a64b23cb9e3f7ddfcb5397a94 Mon Sep 17 00:00:00 2001 From: davidmirror-ops Date: Wed, 4 Oct 2023 12:38:32 -0500 Subject: [PATCH 20/24] Incorporate 2nd round of review Signed-off-by: davidmirror-ops --- rsts/deployment/deployment/multicluster.rst | 33 ++++++++------------- 1 file changed, 12 insertions(+), 21 deletions(-) diff --git a/rsts/deployment/deployment/multicluster.rst b/rsts/deployment/deployment/multicluster.rst index dcda7b859d..6507752d28 100644 --- a/rsts/deployment/deployment/multicluster.rst +++ b/rsts/deployment/deployment/multicluster.rst @@ -248,7 +248,7 @@ Control Plane configuration ********************************* For ``flyteadmin`` to access and create Kubernetes resources in one or more -Flyte data plane clusters , it needs credentials to each cluster. +Flyte data plane clusters, it needs credentials to each cluster. Flyte makes use of Kubernetes Service Accounts to enable every control plane cluster to perform authenticated requests to the data plane Kubernetes API Server. The default behaviour is that the Helm chart creates a `ServiceAccount `_ @@ -284,7 +284,7 @@ attached to the Service Account. As of Kubernetes 1.24 and above, the bearer tok name: cluster-credentials namespace: flyte type: Opaque - stringData: + data: .. note:: The credentials have two parts (``CA cert`` and ``bearer token``). @@ -294,7 +294,7 @@ attached to the Service Account. As of Kubernetes 1.24 and above, the bearer tok .. prompt:: bash $ kubectl get secret -n flyte dataplane1-token \ - -o jsonpath='{.data.token}' | base64 -D | pbcopy + -o jsonpath='{.data.token}' | pbcopy 4. Go to ``secrets.yaml`` and add a new entry under ``stringData`` with the data plane cluster token: @@ -307,17 +307,17 @@ attached to the Service Account. As of Kubernetes 1.24 and above, the bearer tok name: cluster-credentials namespace: flyte type: Opaque - stringData: - dataplane_1_token: + data: + dataplane_1_token: 5. Obtain the corresponding certificate: .. prompt:: bash $ kubectl get secret -n flyte dataplane1-token \ - -o jsonpath='{.data.ca\.crt}' | base64 -D | pbcopy + -o jsonpath='{.data.ca\.crt}' | pbcopy -6. Add another entry on your ``secrets.yaml`` file for the cert, making sure that indentation resembles the following example: +6. Add another entry on your ``secrets.yaml`` file for the certificate: .. code-block:: yaml :caption: secrets.yaml @@ -328,12 +328,9 @@ attached to the Service Account. As of Kubernetes 1.24 and above, the bearer tok name: cluster-credentials namespace: flyte type: Opaque - stringData: + data: dataplane_1_token: - dataplane_1_cacert: | - -----BEGIN CERTIFICATE----- - - -----END CERTIFICATE----- + dataplane_1_cacert: 7. Connect to your control plane cluster and create the ``cluster-credentials`` secret: @@ -630,17 +627,11 @@ The process can be repeated for additional clusters. name: cluster-credentials namespace: flyte type: Opaque - stringData: + data: dataplane_1_token: - dataplane_1_cacert: | - -----BEGIN CERTIFICATE----- - - -----END CERTIFICATE----- + dataplane_1_cacert: dataplane_2_token: - dataplane_2_cacert: | - -----BEGIN CERTIFICATE----- - - -----END CERTIFICATE----- + dataplane_2_cacert: 12. Connect to the control plane cluster and update the ``cluster-credentials`` Secret: From db07babe65e84c206a2b663c1fff4cc593992060 Mon Sep 17 00:00:00 2001 From: davidmirror-ops Date: Wed, 4 Oct 2023 17:07:06 -0500 Subject: [PATCH 21/24] Instructions using 2 IAM Roles Signed-off-by: davidmirror-ops --- rsts/deployment/deployment/multicluster.rst | 158 +++++++------------- 1 file changed, 53 insertions(+), 105 deletions(-) diff --git a/rsts/deployment/deployment/multicluster.rst b/rsts/deployment/deployment/multicluster.rst index 6507752d28..30dfc23be5 100644 --- a/rsts/deployment/deployment/multicluster.rst +++ b/rsts/deployment/deployment/multicluster.rst @@ -59,9 +59,12 @@ requests successfully, the following environment-specific requirements should be "arn:aws:s3:::*/*" ], + + 2. Two IAM Roles configured: one for the control plane components, and another for the data plane where the worker Pods and ``flytepropeller`` run. - 2. At least three IAM Roles configured: one for the control plane components, another for the data plane - and one more for the worker Pods that are bootstrapped by Flyte to execute workflow tasks. + .. note:: + + Using the guidance from this document, make sure to follow your organization's policies to configure IAM resources. 3. An OIDC Provider associated with each of your EKS clusters. You can use the following command to create and connect the Provider: @@ -85,8 +88,12 @@ requests successfully, the following environment-specific requirements should be 2. Go to the **IAM** section in your **AWS Management Console** and select the role that was just created 3. Go to the **Trust Relationships** tab and **Edit the Trust Policy** 4. Add the ``datacatalog`` Service Account to the ``sub`` section + + .. note:: + + When caching is enabled, the ``datacatalog`` service store hashes of workflow inputs alongside with outputs on blob storage. Learn more `here `__. - The end result should look similar to the following example: + Example configuration: .. code-block:: json @@ -120,69 +127,41 @@ requests successfully, the following environment-specific requirements should be eksctl create iamserviceaccount --cluster= --name=flytepropeller --role-only --role-name=flyte-dataplane-role --attach-policy-arn --approve --region --namespace flyte - 2. Verify the Trust Relationship configuration: + 2. Edit the **Trust Relationship** of the data plane role - .. prompt:: bash + .. note:: - aws iam get-role --role-name flyte-dataplane-role --query Role.AssumeRolePolicyDocument + By default, every Pod created for Task execution, uses the ``default`` Service Account on their respective namespace. In your cluster, you'll have as many + namespaces as ``project`` and ``domain`` combinations you may have. Hence, it might be useful to use a ``StringLike`` condition and to use a wildcard for the namespace name in the Trust Policy + + 3. Add the ``default`` Service Account: + - Example output: + Example configuration for one data plane cluster: .. code-block:: json { - "Version": "2012-10-17", - "Statement": [ + "Version": "2012-10-17", + "Statement": [ { "Effect": "Allow", "Principal": { - "Federated": "arn:aws:iam:::oidc-provider/oidc.eks..amazonaws.com/id/" + "Federated": "arn:aws:iam:::oidc-provider/oidc.eks..amazonaws.com/id/" }, "Action": "sts:AssumeRoleWithWebIdentity", "Condition": { - "StringEquals": { - "oidc.eks..amazonaws.com/id/:aud": "sts.amazonaws.com", - "oidc.eks..amazonaws.com/id/:sub": "system:serviceaccount:flyte:flytepropeller" - } - } - } - ] + "StringLike": { + "oidc.eks..amazonaws.com/id/.:aud": "sts.amazonaws.com", + "oidc.eks..amazonaws.com/id/.:sub": [ + "system:serviceaccount:flyte:flytepropeller", + "system:serviceaccount:*:default" + ] + } + } } - **Workers role** - - 1. Create role and initial Trust Relationship: - - .. prompt:: bash - - eksctl create iamserviceaccount --cluster= --name=default --role-only --role-name=flyte-workers-role --attach-policy-arn --approve --region --namespace flyte - - 2. Go to the **IAM** section in your **AWS Management Console** and select the role that was just created - 3. Go to the **Trust Relationships** tab and **Edit the Trust Policy** - 4. By default, every Pod created for Task execution, uses the ``default`` Service Account on their respective namespace. In your cluster, you'll have as many - namespaces as ``project`` and ``domain`` combinations you may have. Hence, it might be useful to use a ``StringLike`` condition and to set a wildcard for the namespace in the Trust Policy: - - .. code-block:: json - - { - "Version": "2012-10-17", - "Statement": [ - { - "Effect": "Allow", - "Principal": { - "Federated": "arn:aws:iam:::oidc-provider/oidc.eks..amazonaws.com/id/" - }, - "Action": "sts:AssumeRoleWithWebIdentity", - "Condition": { - "StringLike": { - "oidc.eks..amazonaws.com/id/:sub": "system:serviceaccount:*:default", - "oidc.eks..amazonaws.com/id/:aud": "sts.amazonaws.com" - } - } - } - ] - } - + .. _dataplane-deployment: Data Plane Deployment @@ -383,7 +362,9 @@ on one or multiple clusters depending on the weight (e.g. ``label1`` on ``datapl priority of a specific cluster, relative to the other clusters under the ``labelClusterMap`` entry. The total sum of weights under a particular label has to be 1. -9. Update the control plane Helm release: +9. Add the ``flyte-dataplane-role`` IAM Role as the ``defaultIamRole`` in your ``values-eks.yaml`` file. `See section here `__ + +10. Update the control plane Helm release: .. note:: This step will disable ``flytepropeller`` in the control plane cluster, leaving no possibility of running workflows there. @@ -405,7 +386,7 @@ label has to be 1. --values values-controlplane.yaml \ --values values-override.yaml -10. Verify that all Pods in the ``flyte`` namespace are ``Running``: +11. Verify that all Pods in the ``flyte`` namespace are ``Running``: Example output: @@ -552,7 +533,11 @@ The process can be repeated for additional clusters. "Condition": { "StringLike": { "oidc.eks..amazonaws.com/id/:aud": "sts.amazonaws.com", - "oidc.eks..amazonaws.com/id/:sub": "system:serviceaccount:flyte:flytepropeller" + + "oidc.eks..amazonaws.com/id/:sub": [ + "system:serviceaccount:flyte:flytepropeller", + "system:serviceaccount:*:default" + ] } } }, @@ -565,57 +550,20 @@ The process can be repeated for additional clusters. "Condition": { "StringLike": { "oidc.eks..amazonaws.com/id/:aud": "sts.amazonaws.com", - "oidc.eks..amazonaws.com/id/:sub": "system:serviceaccount:flyte:flytepropeller" - } + "oidc.eks..amazonaws.com/id/:sub": [ + "system:serviceaccount:flyte:flytepropeller", + "system:serviceaccount:*:default" + ] + } } } ] } - 7. Repeat the previous step for the ``flyte-workers-role``. The result should look like the example: - - .. code-block:: json - - { - "Version": "2012-10-17", - "Statement": [ - { - "Effect": "Allow", - "Principal": { - "Federated": "arn:aws:iam:::oidc-provider/oidc.eks..amazonaws.com/id/" - }, - "Action": "sts:AssumeRoleWithWebIdentity", - "Condition": { - "StringLike": { - "oidc.eks..amazonaws.com/id/:aud": "sts.amazonaws.com", - "oidc.eks..amazonaws.com/id/:sub": "system:serviceaccount:*:default" - } - } - }, - { - "Effect": "Allow", - "Principal": { - "Federated": "arn:aws:iam:::oidc-provider/oidc.eks..amazonaws.com/id/" - }, - "Action": "sts:AssumeRoleWithWebIdentity", - "Condition": { - "StringLike": { - "oidc.eks..amazonaws.com/id/:aud": "sts.amazonaws.com", - "oidc.eks..amazonaws.com/id/:sub": "system:serviceaccount:*:default" - } - } - } - ] - } - - 8. Connect to your new EKS cluster and create the ``flyte`` namespace: - - .. prompt:: bash $ - kubectl create ns flyte - 9. Install the data plane Helm chart following the steps in the **Data plane deployment** section. See :ref:`section `. - 10. Follow steps 1-3 in the **control plane configuration** section (see :ref:`section `) to generate and populate a new section in your ``secrets.yaml`` file + 7. Install the data plane Helm chart following the steps in the **Data plane deployment** section. See :ref:`section `. + 8. Follow steps 1-3 in the **control plane configuration** section (see :ref:`section `) to generate and populate a new section in your ``secrets.yaml`` file Example: @@ -633,13 +581,13 @@ The process can be repeated for additional clusters. dataplane_2_token: dataplane_2_cacert: - 12. Connect to the control plane cluster and update the ``cluster-credentials`` Secret: + 9. Connect to the control plane cluster and update the ``cluster-credentials`` Secret: .. prompt:: bash $ kubect apply -f secrets.yaml - 13. Go to your ``values-override.yaml`` file and add the information of the new cluster. Adding a new label is not entirely needed. + 10. Go to your ``values-override.yaml`` file and add the information of the new cluster. Adding a new label is not entirely needed. Nevertheless, in the following example a new label is created to illustrate Flyte's capability to schedule workloads on different clusters in response to user-defined mappings of ``project``, ``domain`` and ``label``:abbr: @@ -671,13 +619,13 @@ The process can be repeated for additional clusters. tokenPath: "/var/run/credentials/dataplane_2_token" certPath: "/var/run/credentials/dataplane_2_cacert" - 14. Update the Helm release in the control plane cluster: + 11. Update the Helm release in the control plane cluster: .. prompt:: bash $ helm upgrade flyte-core-control flyteorg/flyte-core -n flyte --values values-controlplane.yaml --values values-eks.yaml --values values-override.yaml - 15. Create a new execution cluster labels file with the following sample content: + 12. Create a new execution cluster labels file with the following sample content: .. code-block:: yaml @@ -685,19 +633,19 @@ The process can be repeated for additional clusters. project: team1 value: label2 - 16. Update the cluster execution labels for the project: + 13. Update the cluster execution labels for the project: .. prompt:: bash $ flytectl update execution-cluster-label --attrFile ecl-production.yaml - 17. Finally, submit a workflow execution that matches the label of the new cluster: + 14. Finally, submit a workflow execution that matches the label of the new cluster: .. prompt:: bash $ pyflyte run --remote --project team1 --domain production example.py training_workflow \  ✔ ╱ base  --hyperparameters '{"C": 0.1}' - 18. A successful execution should be visible on the UI, confirming it ran in the new cluster: + 15. A successful execution should be visible on the UI, confirming it ran in the new cluster: .. image:: https://raw.githubusercontent.com/flyteorg/static-resources/main/common/multicluster-execution.png \ No newline at end of file From 83d8c35caef41f220061b838f169ea8583304177 Mon Sep 17 00:00:00 2001 From: davidmirror-ops Date: Thu, 5 Oct 2023 16:00:59 -0500 Subject: [PATCH 22/24] Incorporate 3rd round of feedback Signed-off-by: davidmirror-ops --- rsts/deployment/deployment/multicluster.rst | 10 +++++++++- 1 file changed, 9 insertions(+), 1 deletion(-) diff --git a/rsts/deployment/deployment/multicluster.rst b/rsts/deployment/deployment/multicluster.rst index 30dfc23be5..ab369eba4f 100644 --- a/rsts/deployment/deployment/multicluster.rst +++ b/rsts/deployment/deployment/multicluster.rst @@ -161,6 +161,10 @@ requests successfully, the following environment-specific requirements should be } } + .. note:: + + To further refine the Trust Relationship, consider using a ``StringEquals`` condition and adding the ``default`` Service Account only for the ``project``-``domain`` + namespaces where Flyte tasks will run, instead of using a wildcard. .. _dataplane-deployment: @@ -192,11 +196,15 @@ the first cluster only. admin: endpoint: :443 #indicate the URL you're using to connect to Flyte insecure: false #enables secure communication over SSL. Requires a signed certificate + catalog: + catalog-cache: + endpoint: :443 + insecure: false .. note:: This step is needed so the ``flytepropeller`` instance in the data plane cluster is able to send notifications - back to the ``flyteadmin`` service in the control plane. + back to the ``flyteadmin`` service in the control plane. The ``catalog`` service runs in the control plane and is used when caching is enabled. 3. Install Flyte data plane Helm chart: From e784efd9e692228c8fa35f8765a6c6697b093de5 Mon Sep 17 00:00:00 2001 From: davidmirror-ops Date: Thu, 5 Oct 2023 17:18:55 -0500 Subject: [PATCH 23/24] Add instructions to enable controlplane wf execution Signed-off-by: davidmirror-ops --- rsts/deployment/deployment/multicluster.rst | 6 ++++-- 1 file changed, 4 insertions(+), 2 deletions(-) diff --git a/rsts/deployment/deployment/multicluster.rst b/rsts/deployment/deployment/multicluster.rst index ab369eba4f..e8c6b84f13 100644 --- a/rsts/deployment/deployment/multicluster.rst +++ b/rsts/deployment/deployment/multicluster.rst @@ -375,7 +375,9 @@ label has to be 1. 10. Update the control plane Helm release: .. note:: - This step will disable ``flytepropeller`` in the control plane cluster, leaving no possibility of running workflows there. + This step will disable ``flytepropeller`` in the control plane cluster, leaving no possibility of running workflows there. If you require + the control plane to run workflows, edit the ``values-controlplane.yaml`` file and set ``flytepropeller.enabled`` to ``true``. Then, perform the ``helm upgrade`` operation and complete the steps in :ref:`this section ` to configure it + as a dataplane cluster. .. tabbed:: AWS @@ -651,7 +653,7 @@ The process can be repeated for additional clusters. .. prompt:: bash $ - pyflyte run --remote --project team1 --domain production example.py training_workflow \  ✔ ╱ base  + pyflyte run --remote --project team1 --domain production example.py training_workflow \ --hyperparameters '{"C": 0.1}' 15. A successful execution should be visible on the UI, confirming it ran in the new cluster: From fb0bb64697a8a0b701fb1ca7df516a53047e3816 Mon Sep 17 00:00:00 2001 From: davidmirror-ops Date: Thu, 5 Oct 2023 17:29:42 -0500 Subject: [PATCH 24/24] Incorporate 4th round of reviews Signed-off-by: davidmirror-ops --- charts/flyte-binary/eks-production.yaml | 2 +- rsts/deployment/deployment/index.rst | 4 ++-- rsts/deployment/deployment/sandbox.rst | 2 +- 3 files changed, 4 insertions(+), 4 deletions(-) diff --git a/charts/flyte-binary/eks-production.yaml b/charts/flyte-binary/eks-production.yaml index bda8356c98..2db827b804 100644 --- a/charts/flyte-binary/eks-production.yaml +++ b/charts/flyte-binary/eks-production.yaml @@ -132,7 +132,7 @@ ingress: nginx.ingress.kubernetes.io/app-root: /console grpcAnnotations: nginx.ingress.kubernetes.io/backend-protocol: GRPC - host: development.uniondemo.run # change for the URL you'll use to connect to Flyte + host: # change for the URL you'll use to connect to Flyte rbac: extraRules: - apiGroups: diff --git a/rsts/deployment/deployment/index.rst b/rsts/deployment/deployment/index.rst index 2f7033eafd..eb06d0a6c0 100644 --- a/rsts/deployment/deployment/index.rst +++ b/rsts/deployment/deployment/index.rst @@ -85,8 +85,8 @@ There are three different paths for deploying a Flyte cluster: This option is appropriate if all your compute can `fit on one EKS cluster `__ . As of this writing, a single Flyte cluster can handle more than 13,000 nodes. - Regardless of using single or multiple Kubernetes clusters for Flyte, note that ``FlytePropeller`` -the main data plane component- can be sharded as well if scale demands require it. - See `Automatic scale-out `__ to learn more about the sharding strategy. + Regardless of using single or multiple Kubernetes clusters for Flyte, note that ``FlytePropeller`` -the main data plane component- can be scaled out as well by using ``sharding`` if scale demands require it. + See `Automatic scale-out `__ to learn more about the sharding mechanism. diff --git a/rsts/deployment/deployment/sandbox.rst b/rsts/deployment/deployment/sandbox.rst index 98d1f48582..5c40eea5eb 100644 --- a/rsts/deployment/deployment/sandbox.rst +++ b/rsts/deployment/deployment/sandbox.rst @@ -41,7 +41,7 @@ Requirements - Install `docker `__ or any other OCI-compatible tool, like Podman or LXD. - Install `flytectl `__, the official CLI for Flyte. -While Flyte can run any OCI-compatible task image using the default Kubernetes container runtime (cri-o), the Flyte +While Flyte can run any OCI-compatible task image using the default Kubernetes container runtime (``containerd``), the Flyte core maintainers typically use Docker. Note that the ``flytectl demo`` command does rely on Docker APIs, but as this demo environment is just one self-contained image, you can also run the image directly using another run time.