Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[RFC] Select cluster execution when triggering an execution #4956

Closed
wants to merge 4 commits into from
Closed
Changes from 2 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
100 changes: 100 additions & 0 deletions rfc/core language/4956-cluster-pools.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,100 @@
# [RFC] Select cluster execution when triggering an execution

**Authors:**

- @RRap0so
- @andresgomezfrr

## 1 Executive Summary

When triggering a Flyte LaunchPlan, Workflow or Task, Flyte permits the user to define a ClusterPool from a predefined list of pools that are set in the FlyteAdmin configuration. For example with `flytectl create execution
--clusterPool string`

This provides greater flexibility compared to the existing feature where an [Execution Cluster Label](https://docs.flyte.org/en/latest/deployment/deployment/multicluster.html#configure-execution-cluster-labels) can be set at the Project/Domain level.

## 2 Motivation

The primary motivation for this proposal is to select the clusters where an execution is to be performed dynamically at execution time, without the constant need to overwrite the Execution Cluster Label.

Additionally, we would like to have the possibility of different executions targeting different clusters, even if they are published under the same Project/Domain. This is not entirely feasible using the current Execution Cluster Label.


## 3 Proposed Implementation

We plan to use the `cluster_assignement` field (refer to [proto](https://github.com/flyteorg/flyte/blob/master/flyteidl/protos/flyteidl/admin/execution.proto#L312)) in each execution, this is a ClusterAssignment object with a `cluster_pool_name` which would be the field that contains a string with the corresponding ClusterPool ID.
RRap0so marked this conversation as resolved.
Show resolved Hide resolved

In the FlyteAdmin configuration we have a list of `clusterpool_config` with all the different available pools and corresponding clusters.

**Note**: Today there are configurations to assign domains to specific pools, see [PR](https://github.com/flyteorg/flyte/pull/4208) and `cluster_pool` [interface](https://github.com/flyteorg/flyte/blob/962d45827646f106265630a0fb1d9ae74968bc32/flyteadmin/pkg/runtime/interfaces/cluster_pools.go). We propose another configuration to assign Clusters to a clusterPoolName by having another map under cluster object with weights and cluster assignments.

Proposed configuration:
```
clusters:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As said in the contributors' sync:

Love the proposal, my only wish would be to figure out whether there is a way to simplify this config here by not having to configure a cluster twice, once for the existing project/domain mechanism, once for the new mechanism. Would be nice if the cluster configuration could be shared.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@fg91 I think it should be possible by expanding the ExecutionParameters struct. By adding the ExecutionClusterLabel to this struct and making sure we pass it when executing it should make things simpler and avoid having double configs.

This will mean a bit of re-writing, since it's late I'll also cross-check my thinking and get back with some certainties next week.

poolNameClusterMap:
pool_1:
- id: flyte-1-k8s
weight: 1
- id: flyte-2-k8s
weight: 2
pool_2:
- id: flyte-3-k8s
weight: 6
- id : flyte-4-k8s
weight: 99
labelClusterMap:
production:
- id: flyte-1-k8s
weight: 1
clusterConfigs:
- name: flyte-1-k8s
endpoint: 127.0.0.1
enabled: true
auth:
type: file_path
tokenPath: /etc/…
certPath: /etc/…
- name: flyte-2-k8s
endpoint: 127.0.0.2
enabled: true
...
```

At the start of FlyteAdmin it will create a new RandomClusterSelector with a `poolWeightedRandomMap`.


### When an execution is launched

Flyteadmin runs the [Execute](https://github.com/flyteorg/flyte/blob/master/flyteadmin/pkg/workflowengine/impl/k8s_executor.go#L36) method and builds the workflow. Now we need to know which Kubernetes Cluster to hit. We will have the pool information available in the ExecutionData that is already passed into the Execute method.

**If no clusterPool is defined**: No change in current behavior.

**If clusterPool is defined**: Then we will add the poolName into the [ExecutionTargetSpec](https://github.com/flyteorg/flyte/blob/master/flyteadmin/pkg/executioncluster/execution_target.go#L13-L20) (we will need to modify the struct). This is passed to the getTarget where we check if it has a poolName defined and get the value from the poolWeightedRandomMap. There are still open questions regarding the priority of labels / pool names.


## 4 Metrics & Dashboards

If implemented correctly, we should be able to capture where executions are being triggered. This would be extra useful for our use case to make sure we’re routing executions to the correct pool of clusters

## 5 Drawbacks

Other than the usual drawbacks of implementing a new feature (the introduction of new bugs/problems) we don’t see any potential drawbacks with this implementation. The fields are already exposed but they are only missing the implementation.

We don’t expect any additional overhead for execution or impact in terms of API for flyteadmin.

## 6 Alternatives

As mentioned above, the only alternative we have implemented today is using the [Execution Cluster Labels](https://docs.flyte.org/en/latest/deployment/deployment/multicluster.html#configure-execution-cluster-labels). Even though we could make this work for our use case it’s not ideal and can bring further problems when we want to route executions to different clusters from the same Project/Domain.
RRap0so marked this conversation as resolved.
Show resolved Hide resolved

## 7 Potential Impact and Dependencies

We do not see any.


## 8 Unresolved questions

What takes priority Labels or Pools?
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would vote for labels to take priority as those seem to be more execution specific. I might be missing something though

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes the label usage seems more related to "I need this X type of cluster for this <PROJECT/Domain>".

So to be sure, we first check if there's a label set, then under the clusters in that label if there's a pool. If not then default using the weight. (Similar to what is done today.)

Since we have two different ways to route traffic we need to decide what takes priority.

## 9 Conclusion

With this we want to bring to the finish line an idea that we believe was going to be done but was not yet implemented. We will reuse as many already existing objects as possible and will be able to smoothly select clusters at a moment's notice which for our use case is extremely important to ensure the timely and reliable execution of LaunchPlans/Workflows/Tasks.
Loading