Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

✨ Adding the cluster-stack-operator design document #50

Closed
wants to merge 1 commit into from

Conversation

batistein
Copy link

@batistein batistein commented Aug 17, 2023

We're introducing a design for an operator aimed at enhancing cluster stack usage in Kubernetes. Building on Cluster-API advancements, this operator will streamline cluster operations and elevate user experience.

  • Purpose: Facilitate streamlined use of cluster stacks in Kubernetes.
  • Integration with Cluster-API: Builds upon Cluster-API advancements for effective cluster operations.
  • Component Bundling: Combines fundamental cluster elements, configuration settings, and node images.
  • Release Monitoring: Tracks and integrates new releases automatically.
  • Management Simplification: Reduces complexity in cluster stack management tasks.
  • User Narratives: Provides scenarios of anticipated user-system interactions.
  • History and Evolution: Discusses cloud orchestration and cluster stack integration.
  • Risk Analysis: Addresses potential challenges, especially for providers integrating the operator.
  • Repository Structure: Detailed breakdown for better understanding and development.
  • Tool Integration: Uses tools like kubebuilder for CRD validation.
  • Cluster Objects Exploration: Deep dive into essential cluster objects and their roles.
  • Controller Reconcile Loops: Focuses on precision in release monitoring, especially with auto-subscribe features.

Click here for a preview

@garloff garloff added documentation Improvements or additions to documentation Container Issues or pull requests relevant for Team 2: Container Infra and Tooling labels Aug 18, 2023
@jschoone
Copy link
Contributor

Hi @SovereignCloudStack/dnation, @mxmxchere, @DEiselt, @NotTheEvilOne.
With this draft we have the first design document for the Cluster Stack Operator which will be a fundamental part of the future reference implementation.
Could you have a look on that?
Feel free to add comments/questions directly to the document here.
I'd like to set up a separate meeting to discuss this

There is also a draft for an ADR on testing the cluster stacks which can be treated in the same way.

Of course, if you have the capacity I would love it if everyone was there, but since those are two walls of text I don't expect everyone of you to read this, at least one person per tender would be great for now.

@batistein batistein force-pushed the syself/cluster-stack-operator branch from 14a8723 to 16b794a Compare August 19, 2023 08:55

However, there is a key distinction between our operator and the Cluster-API. Not all providers necessitate provider-specific work. Recognizing this, we've imbued our system with flexibility. There isn’t a compulsory need to construct a provider-specific operator. Instead, users have the autonomy to determine whether they wish to initiate such an operator based on their unique requirements. This approach affirms our commitment to offering a user-centric tool, tailored to meet diverse needs with precision.

## User stories

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am missing distinct user stories for the CSP offering the managed-k8s service and the user consuming managed-k8s service.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is User Story 1 with SCS-user is running the management cluster and SCS-User is consuming its self-hosted service. From my point of view it is crucial that we add and focus on a user story where offering and consuming of the managed-k8s are done by distinct players. (CSP and SCS-User)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We've had that discussion before indeed, and identified that both are scenarios we need to have in mind.
Situations, where someone external to the SCS-User manages the cluster (this might then be the CSP that also runs the infrastructure or a third-party which may be paid for this or a different department in the SCS-user's company). And situations where they are the same person. Even if it's the same person, that person has two different roles, so I think that the distinct look is the more general one.

2. **Simplifying User Interactions:**
- Users need only interact with a singular object to specify all their requirements.
- This object also serves as the primary communication channel, updating users about the status of their cluster stack's release.
- The net result is a user-centric design where they engage with a minimal number of objects, while still benefiting from the operator's advanced features. The operator's intricate details remain in the background, ensuring users face minimal cognitive load.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very good, this is a clear improvement.

Regarding the missing user-story:

  • From my point of view the SCS-user should only interact with exactly one object (per cluster)
  • The CSP should manage the "provider-specific" object. It will feel very strange for an cloud-xyz customer to create cloud-xyz specific settings first before he/she can consume clusters

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This can be done in case the CSP runs the management cluster, of course. This design document is general and is independent of who runs the management cluster.
That the "user" has to do it only means that it is not done automatically and someone has to do it manually, whoever it is.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But let's keep things that providers need to configure (so it works on their infra) and things that consumers want to configure (so the cluster fits their needs) well separated please.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What do you mean @garloff ? This operator could be used by anyone having access to cluster stacks. If it is a public cluster stack, then anyone can use it - no matter if they are CSPs or not.
This design works for both cases.
And if you are not a CSP, you need to take care of the provider-specific things as well.

Which might be, btw, something like the node images that you want to have. This is user-specific and the CSP can potentially leave that to the users (if they want).

- controlplaneamd64
```

### Webhooks
Copy link

@mxmxchere mxmxchere Aug 21, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

very-general and a waste of time for me as a reviewer to put that paragraph here


1. **Purpose**: They give a chronological account of what's happening in the system. This can be crucial for debugging purposes, for understanding the lifecycle of resources, or for monitoring.
2. **Scope**: Events can be associated with different types of objects. For instance, you might see events related to a Pod's lifecycle or events showing reasons why a service endpoint was not created.
3. **Viewing Events**: The most common way to view events is by using the `kubectl` command-line tool:
Copy link

@mxmxchere mxmxchere Aug 21, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

general-purpose information

This is very unrelated and just taking time of the reviewers


### Repo DevOps: Tools for Efficient Development

#### kubebuilder

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes kubebuilder might minimize boilerplate code, but this whole paragraph could have been left out...


Moving forward, the controller assumes an active role. Its first task is to enumerate all the `ClusterStackReleases`. This list is curated based on a specific criterion: every release must possess an owner reference, or, more aptly put, a controller reference set to the `ClusterStack`. This ensures that there's a structured hierarchy and organized linkage between different releases and their associated cluster stacks.

The controller's role doesn't end with listing. It delves deeper, embarking on an analytical journey to ascertain the relevance and importance of each `ClusterStackRelease`. Several factors influence the controller's judgment:

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seriously? Your controllers are embarking on analytical journeys? Great.

1. **Templating with Helm:** Our first step involves utilizing the helm binary to craft a template from the helm chart. Helm, as you might be aware, serves as a potent package manager for Kubernetes, and its capability to template charts is paramount for our objective.
2. **Applying the Templated Objects:** Post templating, the objects are then methodically applied to the Kubernetes cluster using the client-go library, a standardized Go client for Kubernetes. As each object gets applied, we maintain a meticulous record in the resource status. This record provides clarity on the operational efficacy. If an object application is successful, its status is marked as synced. Conversely, if an object encounters a problem during its application, its status reads not synced. In such cases, we also provide insights into potential reasons for the failure, facilitating easier troubleshooting.

Our approach resonates with methodologies adopted by renowned projects such as Argo-CD. We harness the power of both the helm binary and the Kubernetes Go client, ensuring a blend of robustness and flexibility.
Copy link

@mxmxchere mxmxchere Aug 21, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a special reason why helm is used in a compiled form? Especially if the next processing step is again using a go-library? You could also use helm as a library.


A user engaging with the `ClusterStack` can command several operations, such as:

- **Subscription to Git Repository:** Users have the autonomy to choose if they wish to auto-subscribe to updates of specific Git repositories. Upon this selection, the controller undertakes the responsibility of periodically interfacing with Git. Its aim? To vigilantly ascertain if fresh releases, pertinent to the said `ClusterStack`, have been dispatched. If it discerns any, an automatic download is initiated.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Regarding the missing user-story:

This makes it very easy for SCS-users to create broken clusters that the CSP might have to fix. From my point of view it is better if the CSP manages a set of supported versions, or stacks that the SCS-user can choose from. This enlarges the whole benefit of the cluster-stack being tested and the components being tweaked for each other.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think there is a misunderstanding: The cluster stacks are all tested before they are released. This feature just makes sure that all users automatically get the latest released and tested cluster stacks.
In case the management cluster is run by a CSP, it is less work for the CSP to update all the setups for the customers, as it is done automatically

Copy link

@mxmxchere mxmxchere left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All in all this was very hard (and long) to read. As this is a technical document all marketing and bling-bling language could have been left out. Also some paragraphs about webhooks, events, kube-builder and declarative system management could just have been left out.

What is currently missing is a user-story for this being used in a role-separated setup (with CSP and SCS-User having interests and responsibilities for different resources). Regarding this setup we should also think about scaling, how many resources should one controller manage? Is there a mechanism planned form horizontal scaling (deploying multiple controllers and assigning resources to different controllers?)

Apart from that very good work and a great enhancement for the current state. This proposal solves two problems very well:

  1. currently the user has to deploy multiple objects to get a running cluster
  2. the user has to install addons/extensions to get the working cluster

@mxmxchere
Copy link

A real benefit to enhance my understanding of the new approach would be a picture showing the various objects with their contents (both cluster-apis as well as the new ones) and how they interact with each other. I think in most cases it is from top to bottom (cluster -> creating controlplane -> creating openstackcontrolplane and so forth). Also are there ideas on how to integrate with the planned central-api (SovereignCloudStack/issues#364)? This would help to start thinking in the two roles "CSP" and "SCS-User"

@jschoone
Copy link
Contributor

A real benefit to enhance my understanding of the new approach would be a picture showing the various objects with their contents (both cluster-apis as well as the new ones) and how they interact with each other. I think in most cases it is from top to bottom (cluster -> creating controlplane -> creating openstackcontrolplane and so forth). Also are there ideas on how to integrate with the planned central-api (SovereignCloudStack/issues#364)? This would help to start thinking in the two roles "CSP" and "SCS-User"

👍 for the picture! We already have an issue for that:
SovereignCloudStack/issues#333

Copy link
Contributor

@fdobrovolny fdobrovolny left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great work, guys.

I agree with @mxmxchere that this was not a light read, and I had to google some words, which is unnecessarily flowery.

Except for what I wrote in individual comments, some of which are my confusion, edit's to the formatting, dislike of a specific sentence, or possible improvements I see.

I would advocate adding a preface. I feel that this document walks through the whole architecture, and each point is addressed to every detail. I would like to see a preface where all the architecture is summarized with all the components in a few paragraphs so then when I'm reading the text, I can start connecting into context the individual parts.

Also, maybe connected to this, some kind of glossary would be nice. In a few words, summarize all the components in a list I can return to while reading.

The task issues/#333 should be incorporated here as soon as possible.

I feel that the first reading is not enough. I will try to re-read this by the end of the week. Hopefully, I can get more profound insight.


In order to solve some issues that go beyond core CAPI components, we bundle application configuration, node images, and the `ClusterClasses` to formulate what we term as "cluster stacks".

While the Cluster-API is adept at managing the lifecycle of clusters, there are aspects it doesn't cover. For instance, it doesn't oversee the lifecycle of the Cluster Stack components. For example, when a new Syself Cluster Stack is introduced, the corresponding `ClusterClass` object should be downloaded and implemented, ensuring it's readily accessible to users. Additionally, there are certain tasks that are provider-specific, such as building node images, that the Cluster-API doesn't cater to.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For example, when a new Syself Cluster Stack is introduced, the corresponding ClusterClass object should be downloaded and implemented, ensuring it's readily accessible to users.

I don't like the word implemented here as it, at least for me, has a connotation of manual implementation and I think in this context it was supposed to mean more as installed or enabled?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

deployed? applied?


The Cluster Stacks contain the configurations of many different applications, e.g. Cilium. Additionally, they contain a specific node image. This node image can be updated with a new version of a Cluster Stack, but does not have to.

If there is a new version of Cilium, this should be made available for users of any Cluster Stack that contains Cilium. Assuming that there is a `ClusterStackXYZ` with Cilium. After changing the respective versions in the configuration, a new release of `ClusterStackXYZ` will be published. Users that have clusters that point to the `ClusterClass` object of `ClusterStackXYZ` want to update to the latest release. This has to be done by first applying some yaml files in the management cluster (e.g. the `ClusterClass` object). Then, the node image has to be built if it doesn’t exist already.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This has to be done by first applying some yaml files in the management cluster (e.g. the ClusterClass object).

I would probably be more specific and say," by first applying manifests in the management cluster." Some Yaml files seem misleading to me.


Introducing the cluster-stack-operator: a transformative tool designed to alleviate users from the painstaking manual tasks traditionally associated with the usage of cluster stacks. The primary objective of this operator is to streamline and automate the following pivotal processes:

1. **Proactive Release Monitoring:** The operator will be vigilant, regularly scanning for the newest releases of cluster stacks, ensuring users are always updated with the latest enhancements and features.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe this is more addressed lower, but from the explanation given here, this does not seem to me as "Proactive Release Monitoring:", but as automatic updates. I think it should be rewritten followingly.

ensuring users have always the option to update to the latest enhancements and features.


1. **Proactive Release Monitoring:** The operator will be vigilant, regularly scanning for the newest releases of cluster stacks, ensuring users are always updated with the latest enhancements and features.

2. **Streamlined Release Preparation:** Once a user expresses interest in a particular cluster stack release, the operator jumps into action, preparing the release for smooth deployment and use.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How does the user express interest?
How is the release prepared?

I think this should be called differently. Maybe preinitialization? Preprovisioning? Fetching release resources? Prepering everything needed to release deployment?


3. **Efficient Clean-Up:** Over time, some cluster stack releases may become obsolete or redundant. The operator will diligently purge unused and outdated releases, ensuring the system remains clutter-free and optimized.

A noteworthy feature of our operator is its modular architecture. Given that some aspects of its operation, such as building node images, are provider-specific, we've strategically adopted the structural framework utilized by the Cluster-API. This means that provider-specific tasks are delegated to a distinct operator. This specialized operator then synergizes with the core operator to prepare cluster stack releases for deployment.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

to a distinct operator. This specialized operator then

I would probably use the same word like:

to a distinct operator. This distinct operator then

Lastly, the controller diligently ensures the correct application of objects from the cluster-class helm chart in the cluster. In scenarios where this has been overlooked, the controller undertakes the following steps:

1. **Templating with Helm:** Our first step involves utilizing the helm binary to craft a template from the helm chart. Helm, as you might be aware, serves as a potent package manager for Kubernetes, and its capability to template charts is paramount for our objective.
2. **Applying the Templated Objects:** Post templating, the objects are then methodically applied to the Kubernetes cluster using the client-go library, a standardized Go client for Kubernetes. As each object gets applied, we maintain a meticulous record in the resource status. This record provides clarity on the operational efficacy. If an object application is successful, its status is marked as synced. Conversely, if an object encounters a problem during its application, its status reads not synced. In such cases, we also provide insights into potential reasons for the failure, facilitating easier troubleshooting.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a standardized library or a standard library?

- **Alignment Scenario:** If these versions resonate perfectly, it implies a state of consistency from release v1 to v2, rendering any further actions redundant.
- **Discrepancy Scenario:** However, any disparity between the two versions signals a fresh iteration of the cluster addons. This necessitates re-application.

2. **Addon Application Mechanism:** Drawing on the dual-step approach previously elucidated, the addon application process introduces an intermediate step, drawing inspiration from the cluster-api-addon-provider-helm. This step is dedicated to infusing the helm with intricate details about the `Cluster` and the `ProviderCluster`. This can be further explored through this GitHub link.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is here supposed to be a link?

2. **Addon Application Mechanism:** Drawing on the dual-step approach previously elucidated, the addon application process introduces an intermediate step, drawing inspiration from the cluster-api-addon-provider-helm. This step is dedicated to infusing the helm with intricate details about the `Cluster` and the `ProviderCluster`. This can be further explored through this GitHub link.
The inclusion of this step is not merely a formality but a necessity. Conventional templating mechanisms fall short when it comes to injecting values that materialize only at runtime. Yet, these values are paramount for the resources we wish to apply as cluster addons.

3. **Addressing Container Restarts:** The controller's dependency on release assets introduces a peculiar challenge when the container undergoes a restart. In the absence of an external volume, any information retained in memory or within the container is wiped out. The implication? A re-fetch from GitHub becomes indispensable.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this imply that the container can have an external volume and therefore caches the information locally?

Once all node images reach a state of preparedness, the controller makes its proclamation by setting `status.ready: true`. This is the green signal for the clusterstackrelease-controller to swing into action and proceed with its tasks.

4. **Safeguarding Node Images with a Dedicated Controller:**
The `ProviderNodeImageRelease` object doesn't operate in isolation. It has its own guardian angel, a dedicated controller, ensuring that a specific node image is primed for the user. This is no simple task. The path to readiness often involves time-consuming operations, such as the meticulous uploading or intricate building of the node image.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm missing a chapter on the ProviderNodeImageRelease controller and how it builds and handles images.

The `ProviderNodeImageRelease` object doesn't operate in isolation. It has its own guardian angel, a dedicated controller, ensuring that a specific node image is primed for the user. This is no simple task. The path to readiness often involves time-consuming operations, such as the meticulous uploading or intricate building of the node image.

5. **Tackling the Challenge of Long-Running Operations:**
These lengthy operations come with their own set of hurdles. Consider scenarios where the operator's container is abruptly halted by Kubernetes or manually terminated. The solution? A robust map securely stored in memory, vigilantly monitoring all concurrent operations (thankfully, the power of goroutines ensures parallel operations). Handling shutdowns with finesse, especially those triggered by SIGTERM, becomes paramount. The key lies in gracefully stopping all extended operations, potentially leveraging the canceling contexts in Go.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How does the operator handle abrupt stop cases without the possibility of executing a shutdown? Such as a power outage of a given node or a system crash?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As a side note as for another controller I recently faced this exact issue again. There's no reliable way to execute shutdown actions within k8s even for pods under certain scenarios. Therefore the mentioned finesse must be added while handling with the data, not at shutdown time.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

agreed, the controller cannot do anything when it is killed. We then have to either handle some form of cleanup outside of the operator or when the operator re-starts.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ideally, there should not be any long-running operations in controllers at all, right?

If an operation really is long-running, tracking progress using usual (mostly stateless) reconciliation loops (setting status to track preliminary states) should be the K8s native way to do it right? Maybe even splitting out some things into their own resource definitions with their own reconciliation loops?

That may indeed prove hard to do if some long-running synchronous call has to be executed - but if that is not something that is already foreseeable, IMHO this probably is not something to worry about beforehand.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed that there shouldn't be long-running operations at all. But here we cannot avoid it, as building or uploading images might be long-running. In this case we use goroutines that work in parallel and that don't block the reconciliation loop.

The only other alternative is starting separate containers via Kubernetes jobs, but that would be quite messy as well.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, ok. It's about image operations. Missed the context.

Still, ProviderClusterStackRelease creating an ImageBuild (or something), creating a Job does not seem too messy.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Creating a Kubernetes job might appear simple initially, but it can lead to more overhead than anticipated. Here's why:

At first glance, creating a job in Kubernetes seems efficient because it manages the lifecycle of your task. But, there are challenges:

  1. Monitoring: If the job or the pod fails, how would you know? For reliability, you'd need a monitoring solution to check the job's health.
  2. Error Recovery: If a job gets stuck, you would need logic to restart it.
  3. Data Cleanup: If your system identifies images by their names, you might end up with unused or "orphaned" images. A garbage collector would be needed to clean them up.
  4. State Information: You would also need a way to regularly check the status or progress of the pod. This could involve accessing the pod directly or having the job send status updates.
  5. User Visibility: To make the system user-friendly, you'd need a way to show users the progress or status of their tasks.

Considering these complexities, it might be simpler to directly run a pod and manage its lifecycle with a controller. However, this introduces a testing challenge. Unit tests may not be sufficient, and tools like envtest can't handle this scenario. Therefore, you'd rely on end-to-end (e2e) tests. These tests are slower, tougher to maintain, and can be time-consuming when identifying issues.

You might then wonder: "Why run this as a separate pod at all?" Is it to estimate resource needs or to integrate more smoothly with other systems?

An alternative is using Go routines. Here are their benefits:

  1. Easy Testing: You can test the logic easily, and native tools work out-of-the-box.
  2. Simplified Management: The lifecycle is managed directly in the code. There's no need to interact with the Kubernetes API or set up complex RBAC.
  3. Real-Time Information: Getting updates about the ongoing task is straightforward.
  4. Resource Management: Go routines can be queued, helping predict resource usage. Since in our use case one controller won't spawn hundreds of goroutines simultaneously, it's efficient and avoids unnecessary overhead.

Conclusion: If you're aiming for a quick fire and forget method, a Kubernetes job is suitable. However, for building robust systems, alternative methods like Go routines are more straightforward and reliable. They offer precise control over long-running tasks, ensure easier maintainability, and reduce the intricacies of testing. This leads to better code quality in less time. Of course, using an approach with creating pods is entirely feasible. If we were operating on a larger scale, this would indeed be the preferred solution. However, it comes with all the previously mentioned challenges, adding unnecessary overhead for our current needs. As the saying goes, usually the simplest solution is the best! ;)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But, there are challenges

As always :D

As the saying goes, usually the simplest solution is the best! ;)

I can also agree to this statement, while I still think that it actually favours a Job approach. Mirroring your points:

Monitoring: If the job or the pod fails, how would you know? For reliability, you'd need a monitoring solution to check the job's health.

A monitoring solution is not strictly required, as JobStatus v1 includes conditions (Failed/Complete).

Error Recovery: If a job gets stuck, you would need logic to restart it.

For clean exits, this is actually builtin (and the primary purpose of a Job compared to a one-off Pod): "A Job creates one or more Pods and will continue to retry execution of the Pods until a specified number of them successfully terminate." (https://kubernetes.io/docs/concepts/workloads/controllers/job/ CC BY 4.0).
For cases, where a non-zero-error exit is not done by the process on its own (-> being "stuck"), I guess using livenessProbes would be the way to go, forcing a non-zero-error return.

Data Cleanup: If your system identifies images by their names, you might end up with unused or "orphaned" images. A garbage collector would be needed to clean them up.

Yes, cleaning up is required, indeed (even though some cleaning up will also be required when implementing in-controller-process-jobs, if leakage should be prevented). If not done in the controller itself, the builtin TTL for Jobs can be used.

State Information: You would also need a way to regularly check the status or progress of the pod. This could involve accessing the pod directly or having the job send status updates.

Querying JobStatus v1 should be sufficient, I guess.

User Visibility: To make the system user-friendly, you'd need a way to show users the progress or status of their tasks.

That indeed would not be something that is builtin into Kubernetes already, but I am also unsure how something like progress reporting (or progress bars) would be implemented when using in-controller-process-jobs. Would the controller update status of Cluster Stack Releases, notifying about stages of image building and pushing?


Regarding the use of Go routines for implementing in-controller-process-jobs: I like Go routines very much as well and have a fair share of experience using them, but here are a few disadvantages to this approach:

  1. One would still have to implement a non-trivial lifecycle system - which is easier to do in Go than in most other languages, indeed. But it will still be a non-trivial problem implemented in a stateful process that may be terminated during e.g. node reboots. So, if any bug appears...
  2. ...identifying a problem is much more complicated for anyone who does not have a Go debugger (or dev environment at least) at hand in e. g. production. When having almost all state inside of K8s API objects, problems are found much easier - for anyone familiar with K8s, but not necessarily with Go, Go routines and the to-be-built lifecycle system on top.

So, at least this point:

Simplified Management: The lifecycle is managed directly in the code. There's no need to interact with the Kubernetes API or set up complex RBAC.

...has a large flip side. I think that interacting with the K8s API is actually the most natural fit for any controller/operator, and that RBAC actually is not too complex most of the time.

Privilege escalation via Job/Pod creation could be a real tradeoff, though.

Copy link
Contributor

@joshmue joshmue Sep 18, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Privilege escalation via Job/Pod creation could be a real tradeoff, though.

I guess that this strongly relates also to @fdobrovolny's remark about the image building process being relatively unspecified.

Building images inside of a container is in my experience usually a rather complicated task, when it comes to security and isolation. The controller may need more privileges on the host system or may create e. g. dedicated one-off kaniko Pods. That would be still subject of research/decision making, I guess?

(EDIT: Of course, defining a threat model might also be the most natural first step here)

@jschoone jschoone added this to the R5 (v6.0.0) milestone Aug 23, 2023
@jschoone jschoone added Sprint Izmir Sprint Izmir (2023, cwk 32+33) Sprint Jena Sprint Jena (2023, cwk 34+35) labels Aug 23, 2023

There are three components of a cluster stack:

1. **Cluster Addons:** Cluster addons are a set of essential extensions and tools, including but not limited to the Cluster Network Interface (CNI), Container Storage Interface (CSI), and Cloud Controller Manager (CCM). These addons play a pivotal role in enhancing the functionality and manageability of each workload cluster initiated by the user. It's paramount to ensure that these addons are applied to every single workload cluster upon initiation, thus enabling seamless communication and operational consistency.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As a more fundamental question: Maybe it makes sense to take one step back an review if at least part of the requirements should be implemented with tools already introduced to SCS, like flux2 [1]? This would make this document split into two parts as well as for flux maintenance will be more focused to setting up manifests.

It seems to me that partly the intended controller might reinvent the wheel.

[1] SovereignCloudStack/k8s-cluster-api-provider#123

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We take an operator approach and it is not clear how this approach would integrate with flux to me.
The described ClusterAddons and their controller would make it possible to install the cluster addons without any manual work or configuration.

Additionally, we have some values from Cluster API objects that we inject into the yaml config of the objects we apply. I wouldn't know how to do that with external software either.

Can you maybe elaborate on your idea to integrate flux?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just as a very brief workflow idea:

  • flux must be reconciled (installed) from within the operator, e.g. using a release delivered Helm chart from the operators source.
  • Additional addons can be taken from the ClusterStackRelease resource (if I understood it's purpose correctly) which lists components adapting a format similar to the GitRepository and Kustomization example [1]. E.g. the repositories as a list of maps including a sub-list of components applied as Kustomization. This is intended to replace a static Helm artifact linked to the ClusterStackRelease.
  • The operator only generates GitRepository and Kustomization specs leaving the rest to flux to be applied. Validation of installation can be checked requesting the status of Kustomization.

This should move the burden of deploying and upgrading deployments to flux maintaining the flexibility of the operators approach including default and provider customized addons without the burden of actually tracking the current state of deployments applied.

[1] https://fluxcd.io/flux/components/kustomize/kustomization/#example

garloff pushed a commit that referenced this pull request Aug 28, 2023
Bumps [http-cache-semantics](https://github.com/kornelski/http-cache-semantics) from 4.1.0 to 4.1.1.
- [Release notes](https://github.com/kornelski/http-cache-semantics/releases)
- [Commits](kornelski/http-cache-semantics@v4.1.0...v4.1.1)

---
updated-dependencies:
- dependency-name: http-cache-semantics
  dependency-type: indirect
...

Signed-off-by: dependabot[bot] <[email protected]>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: Max Wolfs <[email protected]>

The conditions, on the other hand, should always show the current information that is available to describe the state of a resource.

### Controllers that react immediately on specific etcd events
Copy link
Contributor

@joshmue joshmue Sep 18, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

etcd events -> API server updates?

(etcd should be abstracted away by the API server, events may be ambiguous with Kubernetes Event Resources)

@jschoone jschoone closed this Feb 20, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Container Issues or pull requests relevant for Team 2: Container Infra and Tooling documentation Improvements or additions to documentation Sprint Izmir Sprint Izmir (2023, cwk 32+33) Sprint Jena Sprint Jena (2023, cwk 34+35)
Projects
Archived in project
Development

Successfully merging this pull request may close these issues.

8 participants