Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Group 4 added translation docs #41

Open
wants to merge 2 commits into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -20,7 +20,7 @@ PouchContainer 底层对接的是 Containerd v1.0.3 ,对比 Moby,在容器


对比 Moby 中容器存储架构,PouchContainer 主要不一样的地方:
* PouchContainer 中没有了 GraphDriver 和 Layer 的概念,新的存储架构里引入了 Snapshotter 和 Snapshot,从而更加拥抱 CNCF 项目 containerd 的架构设计。Snapshotter 可以理解为存储驱动,比如 overlay、devicemapper、btrfs 等。Snapshot 为镜像快照,分为两种:一种只读的,即容器镜像的每一层只读数据;一种为可读写的,即容器可读写层,所有容器增量数据都会存储在可读写 Snapshot 中;
* PouchContainer 中没有了 GraphDriver 和 Layer 的概念,新的存储架构里引入了 Snapshotter 和 Snapshot,从而更加拥抱 CNCF 项目 Containerd 的架构设计。Snapshotter 可以理解为存储驱动,比如 overlay、devicemapper、btrfs 等。Snapshot 为镜像快照,分为两种:一种只读的,即容器镜像的每一层只读数据;一种为可读写的,即容器可读写层,所有容器增量数据都会存储在可读写 Snapshot 中;
* Containerd 中容器和镜像元数据都存储在 boltdb 中,这样的好处是每次服务重启不需要通过读取宿主机文件目录信息来初始化容器和镜像数据,而是只需要初始化 boltdb。

## Upgrade 功能需求
Expand All @@ -45,7 +45,7 @@ PouchContainer 底层对接的是 Containerd v1.0.3 ,对比 Moby,在容器
## Upgrade 功能具体实现
### Upgrade API 定义

首先说明一下 `upgrade` API 入口层定义,用于定义升级操作可以对容器的哪些参数进行修改。如下 `ContainerUpgradeConfig` 的定义,容器升级操作可以对容器 `ContainerConfig` 和 `HostConfig` 都可以进行操作,如果在 PouchContainer github 代码仓库的 `apis/types` 目录下参看这两个参数的定义,可以发现实际上,`upgrade` 操作可以修改旧容器的__所有__相关配置
首先说明一下 `upgrade` API 入口层定义,用于定义升级操作可以对容器的哪些参数进行修改。如下 `ContainerUpgradeConfig` 的定义,容器升级操作可以对容器 `ContainerConfig` 和 `HostConfig` 都可以进行操作,如果在 PouchContainer github 代码仓库的 `apis/types` 目录下参看这两个参数的定义,可以发现实际上,`upgrade` 操作可以修改旧容器的 __所有__ 相关配置
```go
// ContainerUpgradeConfig ContainerUpgradeConfig is used for API "POST /containers/upgrade".
// It wraps all kinds of config used in container upgrade.
Expand Down Expand Up @@ -123,4 +123,4 @@ test 43b750 Up 3 seconds 34 seconds ago registry.hub.docker.com/library/

# 总结

在企业生产环境中,容器 `upgrade` 操作和容器扩容、缩容操作一样也是的一个高频操作,但是,不管是在现在的 Moby 社区,还是 Containerd 社区都没有一个与该操作对标的 API,PouchContainer 率先实现了这个功能,解决了容器技术在企业环境中有状态服务更新发布的一个痛点问题。PouchContainer 现在也在尝试与其下游依赖组件服务如 Containerd 保持紧密的联系,所以后续也会将 `upgrade` 功能回馈给 Containerd 社区,增加 Containerd 的功能丰富度。
在企业生产环境中,容器 `upgrade` 操作和容器扩容、缩容操作一样也是一个高频操作,但是,不管是在现在的 Moby 社区,还是 Containerd 社区都没有一个与该操作对标的 API,PouchContainer 率先实现了这个功能,解决了容器技术在企业环境中有状态服务更新发布的一个痛点问题。PouchContainer 现在也在尝试与其下游依赖组件服务如 Containerd 保持紧密的联系,所以后续也会将 `upgrade` 功能回馈给 Containerd 社区,增加 Containerd 的功能丰富度。
27 changes: 27 additions & 0 deletions blog-en/Design and Implementation of PouchContainer CRI.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,27 @@
### 2. Overview of CRI design



![cri-1.png | left | 827x299](https://cdn.yuque.com/lark/0/2018/png/103564/1527478355304-a20865ae-81b8-4f13-910d-39c9db4c72e2.png "")


As shown in the figure above, Kubelet is the Node Agent of the Kubernetes cluster, which will supervise the state of containers to ensure they all run as expected. To achieve this, Kubelet calls corresponding CRI interface to synchronize the container.

CRI shim works as an interface conversion layer, which converts the CRI interface to that corresponding to the underlying container runtime,and calls interface and returns result. For some container runtimes, CRI shim works as an independent process. For example, in the case that Docker is the container runtime of Kubernetes, Kubelet initializes with a Docker shim process which is Docker's CRI shime. For PouchContainer, CRI shim is embedded in Pouchd, which we call CRI manager. We will discuss this in more detail in the next section when we discuss the architecture of PouchContainer.

CRI is essentially gRPC interface. Kubelet has an embedded gRPC Client, while CRI shim has an embedded gRPC Server. Every time Kubelet calls CRI interface, it will be converted to a gRPC request and then sent to gRPC Server in CRI shim by gRPC Client. Server processes the request and returns a result by calling underlying container runtimes, which carries out the whole precess of calling CRI interface.

gRPC interface defined by CRI can be categorized into two types, ImageService and RuntimeService: ImageService is responsible for managing images of container, while RuntimeService is responsible for managing container’s life-cycle and interacting with container (exec/attach/port-forward).

### 3. CRI Manager Architectural design



![yzz's pic.jpg | left | 827x512](https://cdn.yuque.com/lark/0/2018/jpeg/95844/1527582870490-a9b9591d-d529-4b7d-bc5f-69514ef115e7.jpeg "")


In the whole structure of PouchContainer, CRI Manager implements all CRI interfaces and serves as CRI shim in PouchContainer. When kubelet calls some CRI interface, request would go through Kubelet’s gPRC client and eventually sent to gRPC Server, as shown in the graph. Server will analyze the request and call CRI Manager to handle the request with appropriate method.

We will now exemplify functionalities of every module of CRI Manager. When requested to create a new Pod, CRI Manager will convert the CRI formatted configuration obtained from the request to a format that satisfies requirements of PouchContainer, call Image Manager and Container Manager to respectively get necessary images and to create new container. CRI Manager also calls CNI Manager to config network of the Pod with CNI plugins. At last, Stream Server will execute interact request such as exec/attach/portforward.

To note that CNI Manager and Stream Server are submodules of CRI Manager. CRI Manager, Container Manager and Image Manager are at the same level, all contained in a binary file — Pouchd. Therefore, they only interact with functions. Unlike interaction between Docker shim and Docker, no cost of remote calls is required for the interaction of these three modules. We will go deep inside CRI Manager and explore implementations of its core functionalities in the following passage.
Original file line number Diff line number Diff line change
@@ -0,0 +1,125 @@
# Background

A considerable portion of containers used in Alibaba Inc are rich containers. Among these rich containers that are operated and maintained on traditional virtual machine, some still keep information of states. For the cooperation, updating and upgrading the stateful services are more than frequent. As for a container technology that delivers by images, updating and upgrading the service requires two steps: deleting the old image container and creating a new container. However, a stateful upgrade must guarantee that new container inherits all resources from the old container, such as internet, storage and other information. Here are two examples of rich containers used in business to illustrate the necessary conditions of deploying rich container services:

* Case One: Some database service requires to download remote data to local machine as initial data of the database. Database initialization can potentially take up a lot of time. Therefore, for potential service upgrade s in the future, new container has to inherit storage data from the old container to optimize the time needed to deploy the service;
* Case Two: Some middleware service follows service registry pattern, which means any containers that are resized have to register their IPs with the service registry, otherwise the container becomes unavailable. It is also crucial to guarantee that new container inherits IP from the old container when container have new releases, otherwise the most recently deployed service becomes unavailable.

Nowadays, cooperations use Moby as their container engine but there is no interface at all in Moby’s API that directly supports container upgrading. Combining API will unavoidably increase API requests, for example, API request to add or delete, to keep… By using combined API, there is also increasing risk of failures.

Based on the above background, PouchContainer provides an `upgrade` interface in the container engine level to achieve update-in-place. By moving the upgrading function to engine level, manipulating container related resources becomes more convenient and more efficient due to decreasing API requests.

# Upgrade Implementation Details

## Introduction of Container’s Underlying Storage

PouchContainer connects with Containerd v1.0.3 in base layer. Compared with Moby, PouchContainer is quite different in the container storage architecture. Therefore, before manifesting how PouchContainer implements update-in-place, it is necessary to give an introduction regarding the storage architecture of PouchContainer:


![image.png | center | 600x336.3525091799266](https://cdn.yuque.com/lark/0/2018/png/95961/1527735535637-5afc58e6-31ef-400c-984c-a9d7158fd40d.png "")


Comparing with the container storage architecture of Moby, there are the main differences in PouchCaintainer:
* PouchContainer removes the concepts of GraphDriver and Layer and introduces Snapshotter and Snapshot to the new storage architecture. As a result, PouchContainer embraces more the architectural design of containerd in CNCF project. Snapshotter can be understood as a storage driver, such as overlays, devicemapper, BTRFS, etc. Snapshot has two types: One is read-only, which is the read-only data at each layer of the container image; the other one is read/write, which is a type of layer in container where all incremental data of container is stored in.
* Metadata of both container and image in Containerd are stored in boltdb. Such design only needs to initialize boltdb instead of reading the host file directory to initialize data of container and image.

## Upgrade Requirements

At the early stage of designing each system and function, it is necessary to investigate what pain points the system or feature solves for users. After researching the specific situations in which developers in Alibaba use update-in-place of container, there are three requirements for `upgrade` design:
* Data consistency
* Flexibility
* Robustness

Data consistency means that some data need to remain consistent after `upgrade`:
* Network: network configuration of container should remain consistent after upgrade;
* Storage: new container needs to inherit all volume of the old one;
* Config: new container needs to inherit some of configuration of the old one, such as Env, Labels, etc;

Flexibility means that developers are allowed to add new configuration when implementing `upgrade`:
* Allowed to modify the CPU, memory, and other information of new container;
* Supported to specify a new `Entrypoint` as well as inherit the old container's `Entrypoint` for a new image;
* Supported for adding new volume to container. New information of volume may be included in the new image, which needs to be parsed.

Robustness refers to handling potential exception when implementing update-in-place in container, and supporting roll-back of the old container to deal with exceptions occurred during upgrading.

## Upgrade Implementation Details
### Upgrade API Definition

To define how `upgrade` can modify parameters of container, it is important to first explain upgrade’s API. As the definition of `ContainerUpgradeConfig` has shown, it can upgrade both `ContainerConfig` and `HostConfig`. In fact, according to the definitions of these parameters under `apis/types` of PouchContainer github project, `upgrade` modifies __all__ configurations regarding the old container.
```go
// ContainerUpgradeConfig ContainerUpgradeConfig is used for API "POST /containers/upgrade".
// It wraps all kinds of config used in container upgrade.
// It can be used to encode client params in client and unmarshal request body in daemon side.
//
// swagger:model ContainerUpgradeConfig

type ContainerUpgradeConfig struct {
ContainerConfig

// host config
HostConfig *HostConfig `json:"HostConfig,omitempty"`
}
```

### Upgrade Instructions

Container `upgrade` is in fact deleting the old container and creating a new container from a new image while maintaining the internet configuration and original volume configuration. Here are more detailed instruction of `upgrade`:
* At first, log all transactions of the original container to make roll-back possible after failure.
* Update container configuration parameters by adding new configuration parameters in the request to container to activate new configuration.
* Special treatment of `Entrypoint` in the image: if new params have assigned value to `Entrypoint` then use the new value. Otherwise, query `Entrypoint` of the old container and if it is assigned via configuration parameter not the old image, then new container inherit `Entrypoint` from old container. If neither is the case, use `Entrypoint` in the new image as new container’s `Entrypoint`. The reason of doing these is to keep the consistency of input params.
* Decide the state of container. If state is Running, stop container at first and then create a new Snapshot, based on new image, as read/write layer of the new container.
* After creating new Snapshot, re-decide the state of old container. If state is Running, then start new container, otherwise do nothing.
* At last clean up after container upgrade, delete old Snapshot and persist the latest configuration to disk.

### Upgrade Roll-back

`upgrade` may raise a few exceptions and now the policy is to roll back when exceptions occurred and to change the container back to its original state. Here we will further explain __upgrade failure situations__:
* When creating resources for new container fails, roll-back has to be executed: when making resources such as Snapshot, Volumes for new container, roll-back will be executed;
* When errors occurred during startup of new container, roll-back has to be executed: in other words, calling Containerd API fails when creating new container, roll-back will be executed. If API returns success while applications running inside container errors and shuts down the container, roll-back will not happen. Here is an example for roll-back:
```go
defer func() {
if !needRollback {
return
}

// rollback to old container.
c.meta = &backupContainerMeta

// create a new containerd container.
if err := mgr.createContainerdContainer(ctx, c); err != nil {
logrus.Errorf("failed to rollback upgrade action: %s", err.Error())
if err := mgr.markStoppedAndRelease(c, nil); err != nil {
logrus.Errorf("failed to mark container %s stop status: %s", c.ID(), err.Error())
}
}
}()
```

When upgrading, if exceptions are raised, recently initialized resources such as Snapshot will be garbage collected and therefore, in roll-back phase, only things need to be done are reverting configuration to its old state and start a new container with reverted configuration.

### Upgrade Demonstration

* Using `ubuntu` image to create a new container:
```bash
$ pouch run --name test -d -t registry.hub.docker.com/library/ubuntu:14.04 top
43b75002b9a20264907441e0fe7d66030fb9acedaa9aa0fef839ccab1f9b7a8f

$ pouch ps
Name ID Status Created Image Runtime
test 43b750 Up 3 seconds 3 seconds ago registry.hub.docker.com/library/ubuntu:14.04 runc
```

* Upgrading the image of test container to `busybox` :
```bash
$ pouch upgrade --name test registry.hub.docker.com/library/busybox:latest top
test
$ pouch ps
Name ID Status Created Image Runtime
test 43b750 Up 3 seconds 34 seconds ago registry.hub.docker.com/library/busybox:latest runc
```

As shown in the demo, image of the container is directly replaced with new image through `upgrade` interface, and the other configurations remain unchanged.

# Summary

In production environment of enterprise, `upgrade` is one of the frequent operations in container. However, whether in Moby community nor in Containerd community, there is no corresponding API of upgrade. PouchContainer is the first to solve the pain point of stateful service update in the enterprise environment of container technology. PouchContainer is also trying to keep close contact with services from dependent downstream component such as Containerd, and to provide feedback of `upgrade` to the Containerd community later, in order to diversify Containerd.