-
Notifications
You must be signed in to change notification settings - Fork 432
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support stretching vcluster to multiple host clusters #441
Comments
@olljanat thanks for creating this issue! This is also something we have thought about and its definitely possible, although it would require quite some rewriting of vclusters code (but doable). One of the biggest difficulties right now is that vcluster would require a global network across the multiple host clusters, however this could be achieved through submariner. This would be sort of the requirement for running vclusters across multiple host clusters or otherwise networking would not work as expected. Besides that persistent storage could get problematic as storage would only be available in certain host clusters. While this is a super exciting feature from a technical point of view, I'm still not sure how useful this feature would be actually in reality. vcluster would still require a super-host cluster as submariner requires it too which essentially is the single point of failure cluster again. Besides that its definitely interesting in terms of workload distribution especially with our new feature where you can run a scheduler inside the vcluster which you could use to schedule workloads across different clusters automatically through regular Kubernetes affinities and topologies. So it's definitely worth investigating further. |
Alternatively you can also use Calico and do BGP peering with top of rack switches. That together with configuration where you also advertise Kubernetes service IP addresses and disable outgoing NAT from IP pools you can make all pod and service IPs achievable also outside of Kubernetes cluster (configuration which we are using). Nowadays that configuration is most commonly used in on-prem but it looks to be that Azure, AWS and GCP does support BGP peering with other devices/processes so it should be doable on there too.
Sure but that is actually only way how you can make sure that storage system is not single point of failure. What you want to do instead of is let application cluster (e.g. etcd, Redis, RabbitMQ, etc...) make sure that you actually have multiple copies of data which is written to those different storage systems.
Super-host cluster shouldn't be needed as long there is odd number of syncer processes and each of those are running on different host cluster. Then vcluster can works as long more than half of those are running (e.g. 2/3 or 3/5) and those can see each others. Also in case if some reason connectivity between all host clusters would go down it shouldn't prevent any existing pods running on there. Only thing what it means is that vcluster must stay on read-only mode until connectivity between syncer processes works again. |
FYI. To simplify to developing this and later verifying it on e2e tests (if it gets implemented) I created now scripts which can be used to spin three kind+calico clusters locally and setup BGP peering between them. Scripts can be found from https://github.com/olljanat/vcluster/tree/d7344790bb85d6d8b0bf86f2b4fc119804376499/hack/multi-cluster just copy to those locally and run EDIT: I also tried to enable service IP advertisement on later version of that code. However it looks that it will end up to problems if same service CIDR is used on all host clusters. So to get this one working we need use different on each cluster and need to be able to handle that situation some how. |
Another possible idea for this problem would be to use vcluster alongside liqo, which essentially allows using nodes to schedule workloads to other clusters. Combined with vcluster this would mean we could schedule vcluster workloads as well as parts of the vcluster itself onto that nodes and essentially enable multi-cluster functionality. We need some more investigation around this topic, but this wouldn't require any changes in vcluster itself to enable multi-cluster capabilities. |
Yes, with quick look I really like liqo architecture. Especially the parts it allows mix clusters with different CNIs and on-prem+managed. Definitely worth to investigate more. |
I think that we need wait 0.5 mentioned here https://github.com/liqotech/liqo#roadmap as vcluster will need this feature which is targeted to it:
Other things which I'm not too sure are the facts that:
|
Hi! Liqo maintainer here :-) I have to admit we haven't managed yet to investigate the combination of vcluster and liqo (we hope to be able to give it a try soon), but we feel it might be definitely interesting.
A first implementation of the feature required to support offloaded applications requiring to interact with the home API server is already merged into master, and will be included in the next release which is planned in one-two weeks (although it won't include the other items of the roadmap). There are still some limitations (i.e., mainly, it does not support the TokenRequest API), but it should work in most situations.
It is unclear to me whether this is a strong requirement to make things work, or it could be just useful to bring in different optimizations. Nonetheless, we see the reasons behind such proposal, and we are discussing about whether we could/should add the support for it alongside the current approach.
Yes, the EnforceSameName strategy ensures that remote namespaces have the same name as the corresponding one in the local cluster, which is typically a requirement to make cross-namespace DNS resolution works out of the box. All other resources are replicated with the same name in the remote namespace, and should lead to no concerns. |
Is your feature request related to a problem?
From fault tolerance and disaster recovery point of view it would be better to have three different host clusters on different datacenters than having one stretched cluster. Then if there is network/power/etc failure on one datacenter or if one host cluster upgrade goes seriously wrong others would continue running.
However if then each of those host cluster would run separate vcluster instance then it need to be taken care of on app CI/CD pipelines and who ever need to troubleshoot environment would need connect all of those vclusters.
Related to #193 as this would make possible to move workloads between host clusters online.
Which solution do you suggest?
For now I would like to understand if it even on theory possible stretch vcluster to multiple host clusters? How big changes it would need to vcluster? What would be cons on that kind of solution?
As far I understand the etcd should works fine as long values are got for these
vcluster/charts/k8s/templates/etcd-statefulset.yaml
Lines 88 to 90 in 3241bc7
Most probably best way would be give
--initial-cluster-state=new
value for one etcd instance and--initial-cluster-state=existing
for others.Also service CIDR most probably would need to be same on all host clusters.
Based on
vcluster/charts/k8s/templates/syncer-deployment.yaml
Lines 101 to 102 in 3241bc7
So afaiu what is missing is that syncer should at least know api-server endpoint to all the host clusters, nodes from host cluster should be synced to vcluster and scheduling sync should be done based on which node pod would be "allocated" inside of vcluster.
Which alternative solutions exist?
No response
Additional context
No response
The text was updated successfully, but these errors were encountered: