Skip to content

Latest commit

 

History

History
328 lines (247 loc) · 23.3 KB

README.md

File metadata and controls

328 lines (247 loc) · 23.3 KB

Welcome to ClusterSlice

ClusterSlice Documentation

Table of Contents

Simple Example

ClusterSlice can transform test-bed resources from bare-metal or hypervisor setups into fully-operational Kubernetes slices. For instance, ClusterSlice test-bed users may execute the following command:

user@boss:~/clusterslice/examples$ kubectl apply -f slicerequest.yaml

After a while, they will have their own test-bed slice, similar to those in SLICES (i.e., Fed4FIRE+) or EdgeNet test-beds.

An example ClusterSlice request file, such as slicerequest.yaml, is provided below:

apiVersion: "swn.uom.gr/v1"
kind: SliceRequest
metadata:
  name: clusterslice
  namespace: swn
spec:
  name: clusterslice
  usernamespace: swn
  credentials:
    username: clusterslice
    password: sha-512-encoded-password
  infrastructure:
    masters:
      count: 1
      osimage: "ubuntu-22-clean"
      mastertype: "vm"
    workers:
      count: 5
      osimage: "ubuntu-22-clean"
      workertype: "vm"
  kubernetes:
    kubernetestype: "vanilla"
    version: "1.1"
    networkfabric: "flannel"
  applications:
    - name: argo
      version: "v3.4.4"
      parameters: "{'workflow': 'daemon-nginx.yaml'}"
      scope: cluster
    - name: helm
      scope: cluster
    - name: metrics-server
      version: "latest"
      scope: cluster
    - name: dashboard
      version: "v2.7.0"
      scope: cluster
    - name: kubeview
      scope: cluster
    - name: docker
      scope: all

In this custom resource, a vanilla Kubernetes cluster of version 1.1 is requested, with Flannel as the networking fabric. The test-bed allocates 1 master and 5 worker VMs running Ubuntu 22.04. After allocation, an Argo workflow named daemon-nginx.yaml is deployed, along with other applications.

To check the status of the request, the following command can be used:

user@boss:~/clusterslice/examples$ kubectl get slicerequests

Once the request is accepted, the corresponding slice can be viewed:

user@boss:~/clusterslice/examples$ kubectl get slices

One can see in the slice the number of master and worker nodes being succesfully allocated, as well the status and a recent output message of the slice.

For detailed resource status, including nodes and controllers (global test-bed resources belong in swn namespace):

user@boss:~/clusterslice/examples$ kubectl get computeresources -n swn

At this point, the following status states are supported for the computeresources:

  • free: the resource is free to be used.
  • reserved: the resource is reserved to a slice to be used.
  • creating_vm: a VM is being created.
  • booting: the resource is currently booting.
  • os_ready: the OS is installed.
  • os_configured: the OS is configured.
  • os_completed: A non-k8s node is allocated.
  • kubernetes_base: Kubernetes base tools are installed in the resource.
  • wait_for_plugin: Waiting for network fabric deployment to be completed.
  • kubernetes_master: The resource is a kubernetes master node.
  • join_worker: Worker node is joining the cluster.
  • kubernetes_worker: The resource is a kubernetes worker node.
  • allocated: The resource is now fully allocated, including its apps been installed.
  • failed: There is a failure in the node deployment process.

This is the typical flow of a node state machine, in a new slice deployment.

More details can be found in the yaml file defining the computeresoures custom resource, i.e., file computeresources-crd.yaml.

The user can execute the following command to remove an existing slice, e.g., for a slice request described in file slicerequest.yaml:

user@boss:~/clusterslice/examples$ kubectl delete -f slicerequest.yaml

The removal of slice request object triggers a cascading deletion of all other relevant resources, including the associated VMs.

Declarative Definition of Experimentation Slices

Here, we provide an exhaustive list of parameters for declaring an experimentation slice request using the SliceRequest Kubernetes custom resource. Further details about the SliceRequest object can be found in the slicerequest-crd.yaml file.

Experimentation slice requests can be defined directly by users or through a higher-level abstraction called the "multicluster slice request" custom resource, which facilitates multi-domain and multi-cluster experimentation.

The slice request objects can have the following status states:

  • defined: The slice request has been defined.
  • accepted: The slice request has been accepted and a slice object has been created.
  • failed: The slice request has failed.
  • deploying_nodes: The slice request involves deploying physical nodes, as in open test-bed deployments.
  • declined: The slice request has been declined.

The SliceRequest specification consists of four parts: global properties, infrastructure configuration demands, Kubernetes configuration, and applications configuration.

Global Properties

The available global properties, along with descriptions, parameter types/options, and implementation status, are listed in the table below:

Parameters Type / Options Description Implementation Status
name string Slice name Available
duration string Slice duration In Progress
usernamespace string User namespace Available
deploymentstrategy firstone, balanced VM deployment strategy over multiple servers More options to follow
deploymentdomain string Deployment domain selection in multi-domain setups Available
credentials.username string Slice admin username Available
credentials.password SHA-512 encoded string Encoded slice admin password Available
updating boolean Whether node update packages after deployment In Progress

To create the encoded password, you can use the following command (requires the mkpasswd tool):

user@boss:~/clusterslice/examples$ mkpasswd --method=sha-512

The above command can be installed with the package whois, in the case of ubuntu distribution.

Infrastructure Configuration

The infrastructure configuration is defined under the "infrastructure" field of the SliceRequest custom resource. It allows specifying requirements for three types of nodes: regular non-Kubernetes nodes, master nodes, and worker nodes.

The corresponding parameters are listed below:

Parameters Type / Options Description Implementation Status
nodes.count integer Number of non-Kubernetes nodes Available
nodes.osimage string OS image of non-Kubernetes nodes More images to follow
nodes.osaccount string Initial admin account for non-Kubernetes nodes Available
nodes.nodetype vm, node, or testbed node type Type of non-Kubernetes nodes Available
masters.count integer Number of master nodes Available
masters.osimage string OS image of master nodes More images to follow
masters.osaccount string Initial admin account for master OS Available
masters.mastertype vm, node, or testbed node type Type of master nodes Available
workers.count integer Number of worker nodes Available
workers.osimage string OS image for worker nodes More images to follow
workers.osaccount string Initial admin account for worker OS Available
workers.workertype vm, node, or testbed node type Type of worker nodes Available

The requested nodes can be implemented over heterogeneous virtual and physical resources, including from VirtualBox, XCP-NG cloud systems and CloudLab open test-bed.

Kubernetes Configuration

When a SliceRequest specifies Kubernetes master and worker nodes, a Kubernetes configuration can be defined under the "kubernetes" field. The configuration parameters are as follows:

Parameters Type / Options Description Implementation Status
kubernetestype vanilla, k3s, k0s, microk8s Type of Kubernetes to install More options to follow
version string Version of Kubernetes to install In Progress
networkfabric flannel, multus, calico, cilium,
kuberouter, weavenet, kubeovn, antrea
Intra-cluster network plugin to install More options to follow
containerdversion string Version of containerd to install Available for k8s
cnitoolsversion string Version of cni-tools to install Available for k8s
networkcidr string, default: "10.244.0.0/16" Kubernetes network cidr Available for k8s, k3s
servicecidr string, default: "10.96.0.0/12" Kubernetes service cidr Available for k8s, k3s

Currently, the following plugins are supported for each considered Kubernetes flavor:

  • vanilla: Flannel, Multus, Calico, WeaveNet, Cilium, Kube-Router, Kube-OVN, Antrea
  • k3s: Flannel, Calico, Cilium
  • k0s: Kube-Router, Calico
  • microk8s: Calico, Flannel, Kube-OVN

Inter-cluster network plugins (e.g., Submariner) are currently specified as Kubernetes extensions, but they will be integrated as part of CRD fields in the "multicluster slice request" custom resource.

Application Modules Configuration

A SliceRequest custom resource also supports a declarative approach to application and Kubernetes extension deployment, including modular OS configuration. In the documentation and source code, we use the term applications, but mean all these three latter options. The "applications" field includes the basic properties for these deployments:

Parameters Type / Options Description Implementation Status
name string Name of application to install More applications to follow
version string Version of application to install In Progress
parameters JSON string List of app configuration parameters to pass Supported
scope cluster, masters, workers, all Scope of application deployment Supported
sharefile string Node may share a file after app deployment Supported
waitforfile string Node may wait for a file before app deployment Supported

Each application field has a "deployed" status boolean variable indicating its deployment status.

Currently supported applications include:

  • argo: Argo Workflows
  • dashboard: Kubernetes Dashboard GUI
  • docker: Docker Engine
  • helm: Helm charts support
  • kubeview: Kubeview GUI
  • metrics-server: Metrics Server Kubernetes add-on
  • ocm: OCM multi-cluster management software
    • ocm-hub-init
    • ocm-hub-approve
    • ocm-managed
  • liqo: Liqo multi-cluster management software
    • liqo-master
    • liqo-peer
  • karmada: Karmada multi-cluster management software
    • karmada-init
    • karmada-join
  • submariner: Submariner inter-domain network plugin
    • submariner-broker
    • submariner
  • teaching-examples: Examples for teaching purposes
  • benchmarks: Benchmarking for Kubernetes, CNI plugins, and machine learning deployments
  • updates: Requests a node to update packages to the latest version

Each application field accommodates a status boolean variable, called "deployed", indicating the deployment status of corresponding application.

Experimentation Automation

We are currently implementing experimentation automation features that leverage ClusterSlice's unique design characteristics, such as declarative experiment definitions, scalability, reliability, and reproducibility. These features are initially built as ClusterSlice applications but are planned to be implemented as CRDs. Currently supported features include utilizing Argo workflows with measurement sidecar containers, container-based experiment output generation (e.g., PDF with configuration and gnuplot figures), and a time-series generator for machine learning workflows.

Design and Implementation Details

ClusterSlice embraces its unique philosophy in automating experimentation, characterized by:

  • Design empowered from the Kubernetes paradigm, incorporating heavy utilization of Custom Resource Definitions (CRDs) and Kubernetes operators. This approach inherits the reliability, scalability, and resource optimization capabilities of Kubernetes.
  • Introduction of innovative automation abstractions, encompassing both node-level and cluster-level automation, i.e., Resource Managers and Slice Operators, respectively.
  • Avoidance of API or technology-specific software, aside from tools like BASH, command-line utilities, and SSH through Ansible. This approach preserves the simplicity, portability, and composability inherent to the Unix philosophy.
  • Abstractions of infrastructure managers that interface with diverse test-bed facilities (e.g., CloudLab) and cloud/virtualization systems (e.g., VirtualBox or XCP-NG). This maximizes compatibility across heterogeneous systems.

Basic ClusterSlice Architecture

In the image that follows, we give a high-level overview of the ClusterSlice architecture. We begin with an explanation of the core ClusterSlice components. Subsequently, we give an example of a slice deployment workflow.

The ClusterSlice framework relies on the subsequent design abstractions based on Custom Resources (CRs):

  • ComputeResource CRs: These CRs represent entities that host ClusterSlice nodes (such as Kubernetes nodes) and node controllers. They can be configured either by administrators or through dynamic population (e.g., after allocating a new physical server via CloudLab).
  • Resource Managers: Each Resource Manager mirrors the active configuration and state of a ClusterSlice node. These managers employ ansible playbook templates to achieve the desired configuration and state. They embody a concept akin to a digital twin of a node. Resource Managers are tailored to specific technologies, yet they interact with the upper part of the ClusterSlice architecture in an abstracted manner.
  • Slice CRs and Operator: The Slice resource represents an active experimentation slice that enhances the targeted configuration and state with the currently active configuration and state. The Slice Operator is tasked with aligning the active state and configuration with the intended state at the level of the experimentation slice. This task is accomplished by managing Resource Managers and overseeing abstracted control of clusters and cluster-level applications.
  • SliceRequest CRs and Operator: SliceRequests embody the declarative specifications of an experimentation slice, outlining the intended configuration and state. The SliceRequest Operator implements these objectives by managing Slice objects and conducting a dynamic resource discovery process. This process encompasses the specified ComputeResources as well as the dynamic allocation of physical nodes via abstracted test-bed control features.

The Resource Manager container implementation as well as all operators reside at the clusterslice/controllers directory, where the CRDs at the clusterslice/crds directory.

ClusterSlice Architecture

We now outline the basic experimentation slice deployment workflow. As a first step, an experimenter defines a SliceRequest object manifest in the form of a YAML file. Example definitions have been uploaded in the clusterslice/examples directory. Additionally, we have plans to incorporate a web-based GUI for experiment specification.

To transform the manifest into a Kubernetes object, the user should execute a kubectl apply -f command. The SliceRequest Operator is then notified about the new object and proceeds to parse the object descriptor. After defining various configuration parameters, the operator generates a new Slice object. In practical terms, the operator implements a resource discovery process that matches ComputeResource objects with the specified requirements, such as the need for 1 master and 5 worker nodes. As a result, a set of ComputeResource objects are marked as reserved, and their details are stored in a fresh Slice object manifest. If this process involves a demand for open test-bed nodes, these nodes are allocated through the appropriate test-bed interfaces, and their details are populated as ComputeResources and within the new Slice manifest. In scenarios where there is a demand for new Virtual Machines (VMs), a VM embedding process might also occur. This happens when two or more cloud servers are assigned to the reserved ComputeResource objects. In other words, VMs are allocated to specific cloud servers. The VM placement algorithm can be configured within the SliceRequest manifest.

The responsibility now shifts to the Slice Operator, which extracts the configuration parameters of cloud servers and nodes (such as hostnames and IP addresses) from the Slice object. It proceeds to create a dedicated Resource Manager for each node, incorporating the appropriate high-level configuration details like hostnames, IPs, designated cloud servers, intended application modules for deployment, and more. In essence, the Resource Managers are in charge of tasks such as VM allocation, Kubernetes deployment and configuration, setting up Kubernetes nodes, and installing the predefined application modules. The Slice Operator manages horizontal node control processes, like the establishment of the Kubernetes cluster via the communication with the appropriate Resource Managers.

The Slice object maintains a status state along with corresponding output text that conveys its condition, including the ability to report failure messages. The Resource Managers are responsible for maintaining comprehensive log information regarding the deployment process, including specific status states for the resources (i.e., ComputeResources) they are responsible for, like "deploy_VM", "booting", "install_os" and more.

After a while, the new experimentation slice is up and running. The user can access it with the admin user credentials specified in the SliceRequest object.

Multi-Clustering and Multi-Domain Capabilities

The expansion of ClusterSlice to support multi-clustering and multi-domain functionality introduces the following additional design abstractions:

  • MultiClusterSliceRequest CR and Operator: This higher-level abstraction is responsible for creating and overseeing multiple SliceRequest objects, which in turn allocate the clusters that constitute the multi-cluster slice. It also manages the deployment of multi-cluster management software, such as Liqo, Karmada or OCM.
  • Infrastructure Managers: The role of the Infrastructure Managers is to establish communication with technology-specific cloud managers or test-bed controllers within specific domains. These managers function as drivers for heterogeneous resource management systems, handling both physical and virtual resources.

Example Infrastructure Manager implementations as well as the MultiClusterSliceRequest Operator can be found in the clusterslice/controllers directory. The MultiClusterSliceRequest CRD is placed in the clusterslice/crds/multiclusterslicerequest-crd.yaml file.

ClusterSlice Security

In the picture that follows, we detail the security settings of an example ClusterSlice user called Sofia. A regular user is able to apply, delete and monitor SliceRequest objects and to monitor Slice and ComputeResource objects. The relevant security manifests can be found in the clusterslice/security directory.

ClusterSlice Security

Open-Source Projects Acknowledgement

We would like to thank the maintainers of the following open-source projects that we have either be inspired from or utilized in the implementation of ClusterSlice: