-
Notifications
You must be signed in to change notification settings - Fork 3.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Automation Jobs Killed #12549
Comments
Hello @pagey101, could you please ask this on our mailing list? See https://github.com/ansible/awx/#get-involved for information for ways to connect with us. |
Hello, I have posted this to the group. A similar issue happens in which the job status is failed and the job output is "No output found for this job". Apologies if I am misunderstanding, but I would think it is a bug that the jobs are failing without providing a specific reason as to why? Thanks |
Could this be related to #11805 ? |
Unfortunately not, I have seen that issue. This is not a long running playbook, and fails before the playbook runs sometimes (during inventory updates). Sometimes the inventory updates will fail, sometimes they will complete but the playbook will fail. This is due to the pods being killed in quick succession. As mentioned, there are no CPU/memory pressures when this happens. I have a response from the AWX team on the mailing list, so I will be discussing there. |
could you please share a link to the group? |
Closing this in favor of the mailing list discussion. |
Please confirm the following
Bug Summary
Having migrated AWX operator to a new cluster, jobs are randomly failing with the following:
Error opening pod stream: Get "https://awx.k3s.net:10250/containerLogs/awx/automation-job-12053-blqxd/worker?follow=true": EOF
I'm not able to see anything beyond this error. The cluster has plenty of CPU/memory overhead, and I can't see any OOM events or similar.
AWX Operator version
0.21.0
AWX version
21.0.0
Kubernetes platform
kubernetes
Kubernetes/Platform version
v1.23.7+k3s1
Modifications
yes
Steps to reproduce
Expected results
I expect the jobs to complete successfully.
Actual results
The pods sometimes fail with the error:
Error opening pod stream: Get "https://awx.k3s.net:10250/containerLogs/awx/automation-job-12053-blqxd/worker?follow=true": EOF
Additional information
No response
Operator Logs
No response
The text was updated successfully, but these errors were encountered: