Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Failure when executing Pod I/O Stress with some VOLUME_MOUNT_PATH input values #3108

Open
chirangaalwis opened this issue Aug 9, 2021 · 2 comments
Labels

Comments

@chirangaalwis
Copy link
Contributor

What happened:
Experiencing the following helper Pod failures when executing Pod I/O Stress Experiment.

kubectl logs -f pod-io-stress-helper-cwiejs -n <namespace>
W0809 11:12:08.662327   18897 client_config.go:541] Neither --kubeconfig nor --master was specified.  Using the inClusterConfig.  This might not work.
time="2021-08-09T11:12:08Z" level=info msg="Helper Name: stress-chaos"
time="2021-08-09T11:12:08Z" level=info msg="[PreReq]: Getting the ENV variables"
time="2021-08-09T11:12:08Z" level=info msg="container ID of nginx container, containerID: ac65d9cb42b1c6a911b9b6240d5d140de66501784d35aac30f0afcd37cb528bb"
time="2021-08-09T11:12:08Z" level=info msg="[Info]: Container ID=ac65d9cb42b1c6a911b9b6240d5d140de66501784d35aac30f0afcd37cb528bb has process PID=22936"
time="2021-08-09T11:12:08Z" level=info msg="[Info]: Details of Stressor:" hdd-bytes="50%" Timeout=330 Volume Mount Path=/tmp/sample io=4 hdd=4
time="2021-08-09T11:12:08Z" level=info msg="[Info]: starting process: pause nsutil -t 22936 -p -- stress-ng --timeout 330s --io 4 --hdd 4 --hdd-bytes 50% --temp-path /tmp/sample"
time="2021-08-09T11:12:08Z" level=info msg="[Info]: Sending signal to resume the stress process"
time="2021-08-09T11:12:09Z" level=info msg="[Wait]: Waiting for chaos completion"
time="2021-08-09T11:12:09Z" level=fatal msg="helper pod failed, err: error process exited accidentally%!(EXTRA *exec.ExitError=exit status 1)"

The same command when executed manually inside the Pod container (NGINX in this case) works fine. Please find the attached screenshots to observe the mentioned scenario.
Screenshot 2021-08-09 at 16 41 30
Screenshot 2021-08-09 at 16 42 29
Screenshot 2021-08-09 at 16 46 55

Please see the use sample ChaosEngine.

apiVersion: litmuschaos.io/v1alpha1
kind: ChaosEngine
metadata:
  name: nginx-chaos
  namespace: <namespace>
spec:
  # It can be active/stop
  engineState: 'active'
  appinfo:
    appns: '<namespace>'
    applabel: 'app=nginx'
    appkind: 'deployment'
  chaosServiceAccount: pod-io-stress-sa
  experiments:
    - name: pod-io-stress
      spec:
        components:
          env:
            # set chaos duration (in sec) as desired
            - name: TOTAL_CHAOS_DURATION
              value: '330'

            ## specify the size as percentage of free space on the file system
            - name: FILESYSTEM_UTILIZATION_PERCENTAGE
              value: '50'

             ## provide the cluster runtime
            - name: CONTAINER_RUNTIME
              value: 'containerd'

            # provide the socket file path
            - name: SOCKET_PATH
              value: '/run/containerd/containerd.sock'

             ## percentage of total pods to target
            - name: PODS_AFFECTED_PERC
              value: '100'

            - name: VOLUME_MOUNT_PATH
              value: '/tmp/sample'
@braybaut
Copy link

Same behavior for me only works running stress-ng directly in the pod, it seems to be a bug in Pod I/O Stress code

@kbfu
Copy link

kbfu commented Nov 24, 2023

I have the same issue here. But I don't think they will be able to fix this by using pure golang.
Check out the issue here.
So I have to replace the nsutil with nsexec, which is a tool from the chaos-mesh team. Then build my own go-runner image.
Finally, I got this experiment to work.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants