Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Create hosted cluster deploy - destroy fixtures #9957

Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
10 changes: 10 additions & 0 deletions conf/deployment/fusion_hci_pc/hypershift_client_bm_2w.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,10 @@
ENV_DATA:
platform: 'hci_baremetal'
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we should refrain from using the 'HCI' term, since we are not actually installing on Fusion HCI in particular, and once we do, this might cause confusion.
I understand this already exists in our repo so we can address this in a separate PR

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We have a separate ticket for this where we introduce another platform type. @ebondare assigned to it. We need to change it with one PR across all the modules. If it is done only here it would not be in use.

cluster_type: 'hci_client'
cluster_namespace: "openshift-storage-client"
worker_replicas: 2
mon_type: 'hostpath'
osd_type: 'ssd'
REPORTING:
ocs_must_gather_image: "quay.io/ocs-dev/ocs-must-gather"
ocs_must_gather_latest_tag: 'latest'
10 changes: 10 additions & 0 deletions conf/deployment/fusion_hci_pc/hypershift_client_bm_3w.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,10 @@
ENV_DATA:
platform: 'hci_baremetal'
cluster_type: 'hci_client'
cluster_namespace: "openshift-storage-client"
worker_replicas: 3
mon_type: 'hostpath'
osd_type: 'ssd'
REPORTING:
ocs_must_gather_image: "quay.io/ocs-dev/ocs-must-gather"
ocs_must_gather_latest_tag: 'latest'
36 changes: 34 additions & 2 deletions ocs_ci/deployment/helpers/hypershift_base.py
Original file line number Diff line number Diff line change
@@ -1,6 +1,9 @@
import logging
import os
import random
import re
import shutil
import string
import tempfile
import time
from datetime import datetime
Expand All @@ -14,7 +17,7 @@
from ocs_ci.ocs.resources.pod import wait_for_pods_to_be_in_statuses_concurrently
from ocs_ci.ocs.version import get_ocp_version
from ocs_ci.utility.retry import retry
from ocs_ci.utility.utils import exec_cmd, TimeoutSampler
from ocs_ci.utility.utils import exec_cmd, TimeoutSampler, get_latest_release_version

"""
This module contains the base class for HyperShift hosted cluster management.
Expand Down Expand Up @@ -62,6 +65,35 @@ def wrapper(self, *args, **kwargs):
return wrapper


def get_random_hosted_cluster_name():
"""
Get a random cluster name
Returns:
str: random cluster name
"""
# getting the cluster name from the env data, for instance "ibm_cloud_baremetal3; mandatory conf field"
bm_name = config.ENV_DATA.get("baremetal", {}).get("env_name")
ocp_version = get_latest_release_version()
hcp_version = "".join([c for c in ocp_version if c.isdigit()][:3])
match = re.search(r"\d+$", bm_name)
if match:
random_letters = "".join(
random.choice(string.ascii_lowercase) for _ in range(3)
)
dahorak marked this conversation as resolved.
Show resolved Hide resolved
cluster_name = (
"hcp"
+ hcp_version
+ "-bm"
+ bm_name[match.start() :]
+ "-"
+ random_letters
)
Comment on lines +84 to +91
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we save this as a constant, that will get the random letters?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tbh not very clear for me. These are random letters attached to name to prevent creating 2 clusters with the same name.

  • Why get_random_hosted_cluster_name is not a fixture - it is used only for dynamically add hosted clusters, too specific task
  • Why we do not have a function making this name with random letters in some common utility module - The name is very specific, so it makes no sense to reuse it for other resources

The function produces name similar to hcp416-bm1-wtp

else:
raise ValueError("Cluster name not found in the env data")
return cluster_name


def get_binary_hcp_version():
"""
Get hcp version output. Handles hcp 4.16 and 4.17 cmd differences
Expand Down Expand Up @@ -582,7 +614,7 @@ def destroy_kubevirt_cluster(self, name):
Args:
name (str): Name of the cluster
"""
destroy_timeout_min = 10
destroy_timeout_min = 15
logger.info(
f"Destroying HyperShift hosted cluster {name}. Timeout: {destroy_timeout_min} min"
)
Expand Down
175 changes: 109 additions & 66 deletions ocs_ci/deployment/hosted_cluster.py
Original file line number Diff line number Diff line change
Expand Up @@ -51,12 +51,9 @@ class HostedClients(HyperShiftBase):

def __init__(self):
HyperShiftBase.__init__(self)
if not config.ENV_DATA.get("clusters"):
raise ValueError(
"No 'clusters': '{<cluster names>: <cluster paths>}' set to ENV_DATA"
)
self.kubeconfig_paths = []

def do_deploy(self):
def do_deploy(self, cluster_names=None):
"""
Deploy multiple hosted OCP clusters on Provider platform and setup ODF client on them
Perform the 7 stages of deployment:
Expand All @@ -73,11 +70,26 @@ def do_deploy(self):
solution: disable MCE and install upstream Hypershift on the cluster
! Important !
due to n-1 logic we are assuming that desired CNV version <= OCP version
due to n-1 logic we are assuming that desired CNV version <= OCP version of managing/Provider cluster
Args:
cluster_names (list): cluster names to deploy, if None, all clusters from ENV_DATA will be deployed
Returns:
list: the list of HostedODF objects for all hosted OCP clusters deployed by the method successfully
"""

# stage 1 deploy multiple hosted OCP clusters
cluster_names = self.deploy_hosted_ocp_clusters()
# If all desired clusters were already deployed and self.deploy_hosted_ocp_clusters() returns None instead of
# the list, in this case we assume the stage of Hosted OCP clusters creation is done, and we
# proceed to ODF installation and storage client setup.
# If specific cluster names were provided, we will deploy only those.
if not cluster_names:
cluster_names = self.deploy_hosted_ocp_clusters() or list(
config.ENV_DATA.get("clusters").keys()
)
if cluster_names:
cluster_names = self.deploy_hosted_ocp_clusters(cluster_names)

# stage 2 verify OCP clusters are ready
logger.info(
Expand All @@ -91,11 +103,6 @@ def do_deploy(self):
logger.info("Download kubeconfig for all clusters")
kubeconfig_paths = self.download_hosted_clusters_kubeconfig_files()

# if all desired clusters were already deployed and step 1 returns None instead of the list,
# we proceed to ODF installation and storage client setup
if not cluster_names:
cluster_names = list(config.ENV_DATA.get("clusters").keys())

# stage 4 deploy ODF on all hosted clusters if not already deployed
for cluster_name in cluster_names:

Expand All @@ -112,51 +119,39 @@ def do_deploy(self):
# stage 5 verify ODF client is installed on all hosted clusters
odf_installed = []
for cluster_name in cluster_names:

if not self.config_has_hosted_odf_image(cluster_name):
if self.config_has_hosted_odf_image(cluster_name):
logger.info(
f"Hosted ODF image not set for cluster '{cluster_name}', skipping ODF validation"
f"Validate ODF client operator installed on hosted OCP cluster '{cluster_name}'"
)
continue

logger.info(
f"Validate ODF client operator installed on hosted OCP cluster '{cluster_name}'"
)
hosted_odf = HostedODF(cluster_name)

if not hosted_odf.odf_client_installed():
# delete catalogsources help to finish install cluster if nodes have not enough mem
# see oc describe pod ocs-client-operator-controller-manager-<suffix> -n openshift-storage-client
# when the problem was hit
hosted_odf.exec_oc_cmd(
"delete catalogsource --all -n openshift-marketplace"
)
logger.info("wait 30 sec and create catalogsource again")
time.sleep(30)
hosted_odf.create_catalog_source()
odf_installed.append(hosted_odf.odf_client_installed())
hosted_odf = HostedODF(cluster_name)
if not hosted_odf.odf_client_installed():
hosted_odf.exec_oc_cmd(
"delete catalogsource --all -n openshift-marketplace"
)
logger.info("wait 30 sec and create catalogsource again")
time.sleep(30)
Comment on lines +131 to +132
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we instead of sleep sample the existence of the catalogsource to validate it has been deleted?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • I strongly agree that static sleeps are bad but I've made an exception here.
  • This 30-second timeout was added as a workaround for a very rare issue. We will not hit it 99% of the times
  • we don't wait for only catalog source to be deleted, but wait for leftovers associated with this catalog source to be cleaned up too (I don't know exactly the full list, images, pods, secrets..).
  • we remove no one catalogsource but all non-default catalogsources from namespace

hosted_odf.create_catalog_source()
odf_installed.append(hosted_odf.odf_client_installed())

# stage 6 setup storage client on all hosted clusters
client_setup = []
client_setup_res = []
hosted_odf_clusters_installed = []
for cluster_name in cluster_names:

if (
not config.ENV_DATA.get("clusters")
.get(cluster_name)
.get("setup_storage_client", False)
):
if self.storage_installation_requested(cluster_name):
logger.info(
f"Storage client setup not set for cluster '{cluster_name}', skipping storage client setup"
f"Setting up Storage client on hosted OCP cluster '{cluster_name}'"
)
continue

logger.info(
f"Setting up Storage client on hosted OCP cluster '{cluster_name}'"
)
hosted_odf = HostedODF(cluster_name)
client_setup.append(hosted_odf.setup_storage_client())

# stage 7 verify all hosted clusters are ready and print kubeconfig paths
hosted_odf = HostedODF(cluster_name)
client_installed = hosted_odf.setup_storage_client()
client_setup_res.append(client_installed)
if client_installed:
hosted_odf_clusters_installed.append(hosted_odf)
else:
logger.info(
f"Storage client installation not requested for cluster '{cluster_name}', "
"skipping storage client setup"
)
# stage 7 verify all hosted clusters are ready and print kubeconfig paths on Agent
logger.info(
"kubeconfig files for all hosted OCP clusters:\n"
+ "\n".join(
Expand All @@ -172,9 +167,11 @@ def do_deploy(self):
odf_installed
), "ODF client was not deployed on all hosted OCP clusters"
assert all(
client_setup
client_setup_res
), "Storage client was not set up on all hosted ODF clusters"

return hosted_odf_clusters_installed

def config_has_hosted_odf_image(self, cluster_name):
"""
Check if the config has hosted ODF image set for the cluster
Expand All @@ -199,23 +196,56 @@ def config_has_hosted_odf_image(self, cluster_name):

return regestry_exists and version_exists

def deploy_hosted_ocp_clusters(
self,
):
def storage_installation_requested(self, cluster_name):
"""
Check if the storage client installation was requested in the config
Args:
cluster_name (str): Name of the cluster
Returns:
bool: True if the storage client installation was requested, False otherwise
"""
return (
config.ENV_DATA.get("clusters", {})
.get(cluster_name, {})
.get("setup_storage_client", False)
)

def deploy_hosted_ocp_clusters(self, cluster_names_list=None):
"""
Deploy multiple hosted OCP clusters on Provider platform
Args:
cluster_names_list (list): List of cluster names to deploy. If not provided, all clusters
in config.ENV_DATA["clusters"] will be deployed (optional argument)
Returns:
list: the list of cluster names for all hosted OCP clusters deployed by the func successfully
list: The list of cluster names for all hosted OCP clusters deployed by the func successfully
"""

cluster_names_desired = list(config.ENV_DATA["clusters"].keys())
# Get the list of cluster names to deploy
if cluster_names_list:
cluster_names_desired = [
name
for name in cluster_names_list
if name in config.ENV_DATA["clusters"].keys()
]
else:
cluster_names_desired = list(config.ENV_DATA["clusters"].keys())
number_of_clusters_to_deploy = len(cluster_names_desired)
logger.info(f"Deploying '{number_of_clusters_to_deploy}' number of clusters")
deployment_mode = (
"only specified clusters"
if cluster_names_list
else "clusters from deployment configuration"
)
logger.info(
f"Deploying '{number_of_clusters_to_deploy}' number of {deployment_mode}"
)

cluster_names = []

for index, cluster_name in enumerate(config.ENV_DATA["clusters"].keys()):
for index, cluster_name in enumerate(cluster_names_desired):
logger.info(f"Creating hosted OCP cluster: {cluster_name}")
hosted_ocp_cluster = HypershiftHostedOCP(cluster_name)
# we need to ensure that all dependencies are installed so for the first cluster we will install all,
Expand Down Expand Up @@ -282,22 +312,37 @@ def download_hosted_clusters_kubeconfig_files(self):
if not (self.hcp_binary_exists() and self.hypershift_binary_exists()):
self.download_hcp_binary_with_podman()

kubeconfig_paths = []
for name in config.ENV_DATA.get("clusters").keys():
path = config.ENV_DATA.get("clusters").get(name).get("hosted_cluster_path")
kubeconfig_paths.append(self.download_hosted_cluster_kubeconfig(name, path))
self.kubeconfig_paths.append(
self.download_hosted_cluster_kubeconfig(name, path)
)

return self.kubeconfig_paths

return kubeconfig_paths
def get_kubeconfig_path(self, cluster_name):
"""
Get the kubeconfig path for the cluster
Args:
cluster_name (str): Name of the cluster
Returns:
str: Path to the kubeconfig file
"""
if not self.kubeconfig_paths:
self.download_hosted_clusters_kubeconfig_files()
for kubeconfig_path in self.kubeconfig_paths:
if cluster_name in kubeconfig_path:
return kubeconfig_path
return

def deploy_multiple_odf_clients(self):
"""
Deploy multiple ODF clients on hosted OCP clusters. Method tries to deploy ODF client on all hosted OCP clusters
If ODF was already deployed on some of the clusters, it will be skipped for those clusters.
Returns:
list: the list of kubeconfig paths for all hosted OCP clusters
"""
kubeconfig_paths = self.update_hcp_binary()
self.update_hcp_binary()

hosted_cluster_names = get_hosted_cluster_names()

Expand All @@ -306,8 +351,6 @@ def deploy_multiple_odf_clients(self):
hosted_odf = HostedODF(cluster_name)
hosted_odf.do_deploy()

return kubeconfig_paths


class HypershiftHostedOCP(HyperShiftBase, MetalLBInstaller, CNVInstaller, Deployment):
def __init__(self, name):
Expand Down
Loading
Loading