本文主要介绍了自己对lxcfs的k8s方式部署的总结
lxcfs 是一个开源的 FUSE(用户态文件系统)实现来支持 LXC 容器,它也可以支持 Docker 容器。让容器内的应用在读取内存和 CPU 信息的时候通过 lxcfs 的映射,转到自己的通过对 cgroup 中容器相关定义信息读取的虚拟数据上。
runc没有做到完全隔离/proc、/sys路径下的文件,所以容器内通过top、free等命令看到的数据都是物理机上的。对于习惯了虚机,物理机的同学来说不太友好,而且这些命令似乎也失去了本质意义。lxcfs作用就是将容器内/proc、/sys文件与物理机隔离,让top等命令显示容器内真实数据。
说明:
lxcfs是以用户空间文件系统(Filesystem in Userspace)为基础,以cgroup技术实现的用户空间的虚拟文件系统。
容器中读取lxcfs文件系统
lxcfs程序启动时会指定一个路径(如下图是/var/lib/lxcfs)作为挂载点,以后读取这个路径的下文件(cgroup、proc、sys)vfs都会调用内核fuse,fuse回调lxcfs实现的文件操作函数。容器内读取lxcfs文件系统中的数据时,通过gblic系统调用vfs接口然后转到fuse内核模块,内核模块fuse回调lxcfs程序中实现的回调函数,获取容器的cgroup,然后去宿主机对应cgroup下读取并计算后得到容器的实际mem、cpu等信息。lxcfs将物理机的cgroups挂载到运行时环境/run/lxcfs/controllers,但直接在物理机上看不见,因为程序中用unshare做了mounts namespace隔离。lxcfs程序中所有的cgroups信息都从/run/lxcfs/controllers下获得。
关于lxcfs的工作原理已有很多文章介绍,这里不再赘述。
lxcfs的k8s方式部署,主要有2个步骤内容:
- 运行lxcfs-daemonset,保证每个节点都有运行lxcfs服务
- 运行lxcfs-admission-webhook,保证在某namespace下的业务pod能够自动注入lxcfs挂载目录,无需业务pod手动配置lxcfs挂载
什么是AdmissionWebhook
什么是AdmissionWebhook,就要先了解K8S中的admission controller, 按照官方的解释是: admission controller是拦截(经过身份验证)API Server请求的网关,并且可以修改请求对象或拒绝请求。
简而言之,它可以认为是拦截器,类似web框架中的middleware。
K8S默认提供很多内置的admission controller,通过kube-apiserver启动命令参数可以 查看到支持的admission controller plugin有哪些。
kube-apiserver --help |grep enable-admission-plugins
MutatingAdmissionWebhook, ValidatingAdmissionWebhook
MutatingAdmissionWebhook: 做修改操作的AdmissionWebhook
ValidatingAdmissionWebhook: 做验证操作的AdmissionWebhook
在kubernetes中安装lxcfs 这里阿里云提供了kubernetes的解决方案,能够通过yaml自动化安装和启用lxcfs https://github.com/denverdino/lxcfs-admission-webhook
说明,这个lxcfs-admission-webhook项目的k8s方式部署会有问题,项目的issue中也有讨论
- lxcfs-daemonset重启会报错
- 部署lxcfs,启动业务pod后,再重启lxcfs,发现业务pod内无法查看cpu和内存信息,如:无法正确执行free命令
所以,该项目无法直接进行生产环境部署,需要进行优化
具体配置可以参考我的lxcfs-admission-webhook项目:
# Kubernetes 1.9.0 or above with the admissionregistration.k8s.io/v1beta1 API enabled. Verify that by the following command:
# kubectl api-versions | grep admissionregistration.k8s.io/v1beta1
# The result should be:
# admissionregistration.k8s.io/v1beta1
# In addition, the MutatingAdmissionWebhook and ValidatingAdmissionWebhook admission controllers should be added and listed in the correct order in the admission-control flag of kube-apiserver.
# MutatingAdmissionWebhook ValidatingAdmissionWebhook
提前准备好lxcfs镜像和lxcfs-admission-webhook镜像
## lxcfs-admission-webhook 在k8s1.20环境中的部署,检查api接口
kubectl api-versions | grep admissionregistration.k8s.io/v1beta1
### Deploy
#### Deploy lxcfs to worker nodes
kubectl apply -f deployment/lxcfs-daemonset.yaml
#### Install injector with lxcfs-admission-webhook
deployment/install.sh
### install.sh installed webhook
### kubectl delete -f deployment/deployment.yaml
### kubectl apply -f deployment/deployment.yaml
Test
Enable the namespace for injection
kubectl label namespace default lxcfs-admission-webhook=enabled
Note: All the new created pod under the namespace will be injected with LXCFS
Enable the namespace for injection
# kubectl label namespace default lxcfs-admission-webhook=enabled
# Note: All the new created pod under the namespace will be injected with LXCFS
### 修改测试namespace下的webhook
#### 删除原先设置的 ns:default lxcfs-admission-webhook
# kubectl label namespace default lxcfs-admission-webhook-
## 自定义namespace 设置 lxcfs-admission-webhook
kubectl create namespace test-lxcfs
kubectl label namespace test-lxcfs lxcfs-admission-webhook=enabled
#### 在业务pod的配置文件 web.yaml中 设置namespace为test-lxcfs
### Deploy the test deployment
kubectl apply -f deployment/web.yaml
kubectl delete -f deployment/web.yaml
kubectl apply -f deployment/web.yaml
kubectl delete -f deployment/web111.yaml
kubectl apply -f deployment/web111.yaml
### namespace test-lxcfs web pod
kubectl delete -f deployment/web2.yaml
kubectl apply -f deployment/web2.yaml
### Inspect the resource inside container
kubectl get pod
容器资源显示效果
# cpu数量 最小个数为1
#
# cat /proc/cpuinfo | grep processor
processor : 0
#
# 内存大小
# free -hm
total used free shared buffers cached
Mem: 256M 3.0M 252M 0B 0B 344K
-/+ buffers/cache: 2.7M 253M
Swap: 0B 0B 0B
#
# 共享内存
# ipcs -m
------ Shared Memory Segments --------
key shmid owner perms bytes nattch status
#
#
命令杂记,不能按顺序执行
# 制作自制镜像
cd /home/wangb/lxcfs-test/custom-lxcfs-image
docker build -t lxcfs:dev2 .
docker images |grep lxcfs
docker run -dit --name lxcfsdev lxcfs:dev /bin/sh
cidname=lxcfsdev
file_source=container_remount_lxcfs.sh
target_path=/container_remount_lxcfs.sh
docker cp ${file_source} ${cidname}:${target_path}
docker exec -it lxcfsdev /bin/sh
cidname=lxcfsdev
midname=lxcfs:dev
docker commit -m 'make image' ${cidname} ${midname}
# fusermount卸载
nsenter -m/proc/1/ns/mnt /home/wangb/lxcfs-test/fusermount -u /var/lib/lxc/lxcfs
具体内容可以参考我的lxcfs-admission-webhook项目: http://bingerambo.com
lxcfs镜像的启动配置
#!/bin/bash
# Cleanup
nsenter -m/proc/1/ns/mnt fusermount -u /var/lib/lxc/lxcfs 2> /dev/null || true
nsenter -m/proc/1/ns/mnt [ -L /etc/mtab ] || \
sed -i "/^lxcfs \/var\/lib\/lxc\/lxcfs fuse.lxcfs/d" /etc/mtab
# Prepare
mkdir -p /usr/local/lib/lxcfs /var/lib/lxc/lxcfs
# Update lxcfs
cp -f /lxcfs/lxcfs /usr/local/bin/lxcfs
cp -f /lxcfs/liblxcfs.so /usr/local/lib/lxcfs/liblxcfs.so
# Mount
exec nsenter -m/proc/1/ns/mnt /usr/local/bin/lxcfs /var/lib/lxc/lxcfs/
FROM centos:7 as build
RUN yum -y update
RUN yum -y install fuse-devel pam-devel wget install gcc automake autoconf libtool make
ENV LXCFS_VERSION 3.1.2
RUN wget https://linuxcontainers.org/downloads/lxcfs/lxcfs-$LXCFS_VERSION.tar.gz && \
mkdir /lxcfs && tar xzvf lxcfs-$LXCFS_VERSION.tar.gz -C /lxcfs --strip-components=1 && \
cd /lxcfs && ./configure && make
FROM centos:7
RUN yum -y install fuse fuse-lib fuse-devel
STOPSIGNAL SIGINT
COPY --from=build /lxcfs/lxcfs /usr/local/bin/lxcfs
COPY --from=build /lxcfs/.libs/liblxcfs.so /usr/local/lib/lxcfs/liblxcfs.so
COPY --from=build /lxcfs/lxcfs /lxcfs/lxcfs
COPY --from=build /lxcfs/.libs/liblxcfs.so /lxcfs/liblxcfs.so
COPY --from=build /usr/lib64/libfuse.so.2.9.2 /usr/lib64/libfuse.so.2.9.2
COPY --from=build /usr/lib64/libulockmgr.so.1.0.1 /usr/lib64/libulockmgr.so.1.0.1
RUN ln -s /usr/lib64/libfuse.so.2.9.2 /usr/lib64/libfuse.so.2 && \
ln -s /usr/lib64/libulockmgr.so.1.0.1 /usr/lib64/libulockmgr.so.1
COPY start.sh /
RUN chmod 750 /start.sh
CMD ["/start.sh"]
这里我新增了lifecycle的前后处理
apiVersion: apps/v1
kind: DaemonSet
metadata:
name: lxcfs
namespace: kube-system
labels:
app: lxcfs
spec:
selector:
matchLabels:
app: lxcfs
template:
metadata:
labels:
app: lxcfs
spec:
hostPID: true
tolerations:
- key: node-role.kubernetes.io/master
effect: NoSchedule
containers:
- name: lxcfs
# image: registry.cn-hangzhou.aliyuncs.com/denverdino/lxcfs:3.1.2
image: lxcfs:dev
# imagePullPolicy: Always
imagePullPolicy: IfNotPresent
# command: [ "/bin/sh","-c","chmod u+x /start.sh;/start.sh" ]
command: [ "/bin/sh","-c","/start.sh" ]
securityContext:
privileged: true
# custom config
lifecycle:
postStart:
exec:
command: ["/bin/sh", "-c", "chmod u+x /tmp/container_remount_lxcfs.sh;/tmp/container_remount_lxcfs.sh 2> /dev/null || true"]
preStop:
exec:
# command: ["/bin/sh", "-c", "nsenter -m/proc/1/ns/mnt fusermount -u /var/lib/lxc/lxcfs 2> /dev/null || true"]
command: ["/bin/sh", "-c", "nsenter -m/proc/1/ns/mnt fusermount -u /var/lib/lxc/lxcfs 2> /dev/null || true"]
volumeMounts:
- name: cgroup
mountPath: /sys/fs/cgroup
- name: lxcfs
# mountPath: /var/lib/lxcfs
mountPath: /var/lib/lxc/lxcfs
mountPropagation: Bidirectional
- name: usr-local
mountPath: /usr/local
# custom config
- name: docker-bin
mountPath: /usr/bin/docker
- name: docker-sock
mountPath: /var/run/docker.sock
- name: libltdl-so
mountPath: /usr/lib64/libltdl.so.7
# configmap
- name: lxcfs-remount
mountPath: /tmp/container_remount_lxcfs.sh
volumes:
- name: cgroup
hostPath:
path: /sys/fs/cgroup
- name: usr-local
hostPath:
path: /usr/local
- name: lxcfs
hostPath:
# path: /var/lib/lxcfs
path: /var/lib/lxc/lxcfs
type: DirectoryOrCreate
# - name: lxcfs
# emptyDir: {}
# custom config
- name: docker-bin
hostPath:
path: /usr/bin/docker
- name: docker-sock
hostPath:
path: /var/run/docker.sock
- name: libltdl-so
hostPath:
path: /usr/lib64/libltdl.so.7
- name: lxcfs-remount
hostPath:
path: /tmp/container_remount_lxcfs.sh
解决lxcfs重启导致业务pod查看资源异常的重新挂载脚本,这里我新增了 检测lxcfs状态的步骤
#! /bin/bash
## 该脚本适用于lxcfs容器化部署
PATH=$PATH:/bin
LXCFS="/var/lib/lxc/lxcfs"
LXCFS_ROOT_PATH="/var/lib/lxc"
## 1. check lxcfs process state
# cpuinfo
# diskstats
# loadavg
# meminfo
# stat
# swaps
# uptime
proc_num=$(ls ${LXCFS}/proc/ |wc -l)
# while [[ ${proc_num} != 7 ]]; do
while [[ ${proc_num} == 0 ]]; do
#statements
echo "check lxcfs process state failed: not find all ${LXCFS}/proc..."
echo $(ls ${LXCFS}/proc/)
sleep 2s
proc_num=$(ls ${LXCFS}/proc/ |wc -l)
done
## 2. find containers to remount
# PATH=$PATH:/bin
# LXCFS="/var/lib/lxc/lxcfs"
# LXCFS_ROOT_PATH="/var/lib/lxc"
containers=$(docker ps | grep -v pause | grep -v calico | awk '{print $1}' | grep -v CONTAINE)
#-v /var/lib/lxc/lxcfs/proc/cpuinfo:/proc/cpuinfo:rw
#-v /var/lib/lxc/lxcfs/proc/diskstats:/proc/diskstats:rw
#-v /var/lib/lxc/lxcfs/proc/meminfo:/proc/meminfo:rw
#-v /var/lib/lxc/lxcfs/proc/stat:/proc/stat:rw
#-v /var/lib/lxc/lxcfs/proc/swaps:/proc/swaps:rw
#-v /var/lib/lxc/lxcfs/proc/uptime:/proc/uptime:rw
#-v /var/lib/lxc/lxcfs/proc/loadavg:/proc/loadavg:rw
#-v /var/lib/lxc/lxcfs/sys/devices/system/cpu/online:/sys/devices/system/cpu/online:rw
for container in $containers;do
mountpoint=$(docker inspect --format '{{ range .Mounts }}{{ if eq .Destination "/var/lib/lxc" }}{{ .Source }}{{ end }}{{ end }}' $container)
if [ "$mountpoint" = "$LXCFS_ROOT_PATH" ];then
echo "remount $container"
PID=$(docker inspect --format '{{.State.Pid}}' $container)
# mount /proc
for file in meminfo cpuinfo loadavg stat diskstats swaps uptime;do
echo nsenter --target $PID --mount -- mount -B "$LXCFS/proc/$file" "/proc/$file"
nsenter --target $PID --mount -- mount -B "$LXCFS/proc/$file" "/proc/$file"
done
# mount /sys
for file in online;do
echo nsenter --target $PID --mount -- mount -B "$LXCFS/sys/devices/system/cpu/$file" "/sys/devices/system/cpu/$file"
nsenter --target $PID --mount -- mount -B "$LXCFS/sys/devices/system/cpu/$file" "/sys/devices/system/cpu/$file"
done
fi
done
# 1. 将宿主机上lxcfs的上一层目录挂到容器里
# -v /var/lib/lxc/:/var/lib/lxc/:shared \
# -v /var/lib/lxc/lxcfs/proc/uptime:/proc/uptime \
# -v /var/lib/lxc/lxcfs/proc/swaps:/proc/swaps \
# -v /var/lib/lxc/lxcfs/proc/stat:/proc/stat \
# -v /var/lib/lxc/lxcfs/proc/diskstats:/proc/diskstats \
# -v /var/lib/lxc/lxcfs/proc/meminfo:/proc/meminfo \
# -v /var/lib/lxc/lxcfs/proc/cpuinfo:/proc/cpuinfo
# 2. 在systemd的增加ExecStartPost=remount_lxcfs
# 3. 在lxcfs进程重启之后,在容器里再mount一次mount -B "/var/lib/lxc/lxcfs/proc/$file" "/proc/$file"
原来项目的lxcfs-admission-webhook 无法满足我的项目需要, 我这里修改了lxcfs-admission-webhook代码,重新进行了构建
#!/bin/bash
DOCKER_USER=bingerambo
#dep ensure -v
# go mod
CGO_ENABLED=0 GOOS=linux go build -a -installsuffix cgo -o lxcfs-admission-webhook
docker build --no-cache -t lxcfs-admission-webhook:dev .
#docker build --no-cache -t registry.cn-hangzhou.aliyuncs.com/${DOCKER_USER}/lxcfs-admission-webhook:v1 .
#rm -rf lxcfs-admission-webhook
#
#docker push registry.cn-hangzhou.aliyuncs.com/${DOCKER_USER}/lxcfs-admission-webhook:v1
业务pod测试部署
apiVersion: apps/v1
kind: Deployment
metadata:
name: web
namespace: test-lxcfs
spec:
replicas: 2
selector:
matchLabels:
app: web
template:
metadata:
labels:
app: web
spec:
containers:
- name: web
image: httpd:2.4.32
imagePullPolicy: Always
resources:
requests:
memory: "256Mi"
cpu: "500m"
limits:
memory: "256Mi"
cpu: "500m"
[root@node1 lxcfs-test]# kubectl logs lxcfs-sffmt
mount namespace: 5
hierarchies:
0: fd: 6: memory
1: fd: 7: hugetlb
2: fd: 8: cpuset
3: fd: 9: perf_event
4: fd: 10: cpuacct,cpu
5: fd: 11: devices
6: fd: 12: pids
7: fd: 13: freezer
8: fd: 14: blkio
9: fd: 15: net_prio,net_cls
10: fd: 16: name=systemd
fuse: mountpoint is not empty
fuse: if you are sure this is safe, use the 'nonempty' mount option
denverdino/lxcfs-admission-webhook#8
对于fuser 所提示“noempty”挂载选项应该如何去配置在daemonset 当中呢? 是可能因为在挂载lxcfs的时候出现挂载的目录当中已经存在数据造成的。
问题解决: 说明 hostpath:/var/lib/lxc/lxcfs 存在数据,如下 /var/lib/lxc/lxcfs/proc/
rm -rf /var/lib/lxcfs
lxcfs-5pxkb 0/1 RunContainerError
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled 60s default-scheduler Successfully assigned default/lxcfs-5pxkb to node2
Warning Failed 59s (x2 over 59s) kubelet Error: failed to start container "lxcfs": Error response from daemon: OCI runtime create failed: container_linux.go:380: starting container process caused: process_linux.go:545: container init caused: rootfs_linux.go:76: mounting "/var/lib/lxcfs/proc/loadavg" to rootfs at "/proc/loadavg" caused: mount through procfd: not a directory: unknown: Are you trying to mount a directory onto a file (or vice-versa)? Check if the specified host path exists and is the expected type
Warning Failed 45s kubelet Error: failed to start container "lxcfs": Error response from daemon: OCI runtime create failed: container_linux.go:380: starting container process caused: process_linux.go:545: container init caused: rootfs_linux.go:76: mounting "/var/lib/lxcfs/proc/cpuinfo" to rootfs at "/proc/cpuinfo" caused: mount through procfd: not a directory: unknown: Are you trying to mount a directory onto a file (or vice-versa)? Check if the specified host path exists and is the expected type
Normal Pulled 21s (x4 over 60s) kubelet Container image "registry.cn-hangzhou.aliyuncs.com/denverdino/lxcfs:3.1.2" already present on machine
Normal Created 21s (x4 over 60s) kubelet Created container lxcfs
Warning Failed 21s kubelet Error: failed to start container "lxcfs": Error response from daemon: OCI runtime create failed: container_linux.go:380: starting container process caused: process_linux.go:545: container init caused: rootfs_linux.go:76: mounting "/var/lib/lxcfs/proc/meminfo" to rootfs at "/proc/meminfo" caused: mount through procfd: not a directory: unknown: Are you trying to mount a directory onto a file (or vice-versa)? Check if the specified host path exists and is the expected type
Warning BackOff 6s (x4 over 58s) kubelet Back-off restarting failed container
lxcfs mount through procfd: not a directory: unknown: Are you trying to mount a directory onto a file (or vice-versa)? Check if the specified host path exists and is the expected type
因为 lxcfs-daemonset pod的namespace 和 lxcfs-webhook作用的namespace相同,如:default, 且没有删除掉lxcfs-webhook,导致启动 lxcfs-daemonset的挂载路径报错
修改下lxcfs-daemonset 的namespace 为 kube-system,与lxcfs-webhook作用的namespace区别开来,重新部署lxcfs-daemonset正常
[root@node1 lxcfs-test]# kubectl exec -it web-fdf588795-p9f52 sh
kubectl exec [POD] [COMMAND] is DEPRECATED and will be removed in a future version. Use kubectl exec [POD] -- [COMMAND] instead.
#
#
# free -m
total used free shared buffers cached
Mem: 256 8 247 0 0 0
-/+ buffers/cache: 8 247
Swap: 0 0 0
#
#
# free -m
Error: /proc must be mounted
To mount /proc at boot you need an /etc/fstab line like:
proc /proc proc defaults
In the meantime, run "mount proc /proc -t proc"
#
lxcfs-daemonset pod 删除时,挂载目录/var/lib/lxcfs 也会删除 待 lxcfs-daemonset 重启时,业务容器pod的free查询挂载点报错
当lxcfs服务重启后,需要对容器挂载的响应目录进行重新挂载remount
重启lxcfs daemonset,业务pod无法查看容器cpu和内存信息
# cat cat /sys/devices/system/cpu/online
cat: cat: No such file or directory
cat: /sys/devices/system/cpu/online: Transport endpoint is not connected
#
#
#
#
# cat /proc/cpuinfo
cat: /proc/cpuinfo: Transport endpoint is not connected
#
#
#
# ls /proc
1 acpi cpuinfo execdomains ioports keys mdstat net self sysrq-trigger version
10 buddyinfo crypto fb ipmi kmsg meminfo pagetypeinfo slabinfo sysvipc vmallocinfo
114 bus devices filesystems irq kpagecount misc partitions softirqs timer_list vmstat
125 cgroups diskstats fs kallsyms kpageflags modules sched_debug stat timer_stats zoneinfo
8 cmdline dma interrupts kcore loadavg mounts schedstat swaps tty
9 consoles driver iomem key-users locks mtrr scsi sys uptime
# ls /pro ^C
#
#
#
# cat /proc/cpuinfo
cat: /proc/cpuinfo: Transport endpoint is not connected
# cat /proc/version
Linux version 3.10.0-1127.el7.x86_64 ([email protected]) (gcc version 4.8.5 20150623 (Red Hat 4.8.5-39) (GCC) ) #1 SMP Tue Mar 31 23:36:51 UTC 2020
#
问题分析lxcfs daemonset的挂载点问题,对已进行lxcfs挂载的容器,重新执行挂载
[root@node2 lxcfs-test]# sh container_remount_lxcfs.sh
remount aec74195b6e1
nsenter --target 5579 --mount -- mount -B /var/lib/lxc/lxcfs/proc/meminfo /proc/meminfo
nsenter --target 5579 --mount -- mount -B /var/lib/lxc/lxcfs/proc/cpuinfo /proc/cpuinfo
nsenter --target 5579 --mount -- mount -B /var/lib/lxc/lxcfs/proc/loadavg /proc/loadavg
nsenter --target 5579 --mount -- mount -B /var/lib/lxc/lxcfs/proc/stat /proc/stat
nsenter --target 5579 --mount -- mount -B /var/lib/lxc/lxcfs/proc/diskstats /proc/diskstats
nsenter --target 5579 --mount -- mount -B /var/lib/lxc/lxcfs/proc/swaps /proc/swaps
nsenter --target 5579 --mount -- mount -B /var/lib/lxc/lxcfs/proc/uptime /proc/uptime
nsenter --target 5579 --mount -- mount -B /var/lib/lxc/lxcfs/sys/devices/system/cpu/online /sys/devices/system/cpu/online
remount 344fdf47b13c
nsenter --target 5107 --mount -- mount -B /var/lib/lxc/lxcfs/proc/meminfo /proc/meminfo
nsenter --target 5107 --mount -- mount -B /var/lib/lxc/lxcfs/proc/cpuinfo /proc/cpuinfo
nsenter --target 5107 --mount -- mount -B /var/lib/lxc/lxcfs/proc/loadavg /proc/loadavg
nsenter --target 5107 --mount -- mount -B /var/lib/lxc/lxcfs/proc/stat /proc/stat
nsenter --target 5107 --mount -- mount -B /var/lib/lxc/lxcfs/proc/diskstats /proc/diskstats
nsenter --target 5107 --mount -- mount -B /var/lib/lxc/lxcfs/proc/swaps /proc/swaps
nsenter --target 5107 --mount -- mount -B /var/lib/lxc/lxcfs/proc/uptime /proc/uptime
nsenter --target 5107 --mount -- mount -B /var/lib/lxc/lxcfs/sys/devices/system/cpu/online /sys/devices/system/cpu/online
[root@node2 lxcfs-test]#
[root@node2 lxcfs-test]#
[root@node2 lxcfs-test]#
解决方式:重新构建lxcfs镜像,修改lxcfs-admission-webhook,再进行编译构建
关于remount_lxcfs的这个功能
Warning Failed 21s (x3 over 36s) kubelet Error: failed to start container "lxcfs": Error response from daemon: error while creating mount source path '/var/lib/lxc/lxcfs': mkdir /var/lib/lxc/lxcfs: file exists
Normal Started 6s (x3 over 24s) kubelet Started container lxcfs
Warning FailedPostStartHook 6s (x3 over 24s) kubelet Exec lifecycle hook ([/bin/sh -c /container_remount_lxcfs.sh || true]) for Container "lxcfs" in Pod "lxcfs-d9xf4_kube-system(019fba71-758a-411f-a538-1feb64ff5403)" failed - error: command '/bin/sh -c /container_remount_lxcfs.sh || true' exited with 126: , message: "OCI runtime exec failed: exec failed: cannot exec a container that has stopped: unknown\r\n"
Normal Killing 6s (x3 over 24s) kubelet FailedPostStartHook
Warning FailedPreStopHook 6s (x3 over 24s) kubelet Exec lifecycle hook ([/bin/sh -c nsenter -m/proc/1/ns/mnt fusermount -u /var/lib/lxc/lxcfs 2> /dev/null || true]) for Container "lxcfs" in Pod "lxcfs-d9xf4_kube-system(019fba71-758a-411f-a538-1feb64ff5403)" failed - error: rpc error: code = Unknown desc = container not running (lxcfs), message: ""
Warning BackOff 6s (x3 over 22s) kubelet Back-off restarting failed container
问题分析
This is because the image you are using to start the container is not supported for underlying hardware architecture.
容器不支持底层操作???由于脚本和自定义添加的命令处理,使用了底层程序但是容器中没有,如:docker命令,需要在容器中添加设置
重要说明,挂载路径被删除后导致的异常,则需要fusermount处理下。要保证业务pod容器中的挂载根路径(/var/lib/lxc)不能删除。
-
/var/lib/lxc 目录不能删除
-
/var/lib/lxc/lxcfs 可以删除
-
lxcfs和lxcfs-webhook的namespace不能相同,否则当lxcfs重启时,会由于lxcfs-webhook的回调处理导致lxcfs启动异常
参考文章
https://www.helpsystems.com/resources/guides/unix-load-average-part-1-how-it-works
https://github.com/libfuse/libfuse
https://www.helpsystems.com/resources/guides/unix-load-average-part-1-how-it-works
玩转K8S AdmissionWebhook https://cloud.tencent.com/developer/article/1445760
Kubernetes中Sidecar生命周期管理 https://blog.csdn.net/u012986012/article/details/108709892
为容器的生命周期事件设置处理函数 https://kubernetes.io/zh/docs/tasks/configure-pod-container/attach-handler-lifecycle-event/