-
Notifications
You must be signed in to change notification settings - Fork 3.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
AWX Task Pod - Worker failed to run task awx.main.tasks.system.awx_periodic_scheduler
#14394
Comments
This was reported in #8214 and #4419 and in both cases there was a simple and obvious remediation. The mechanic here is that we're using the django-solo library to track the scheduler state https://github.com/lazybird/django-solo/ and the entire point of the solo library is that it only maintains 1 record in a table. The simple fact that you hit this error isn't very interesting. If you know of something that an internal task did, or that the awx-operator did, which put it into this state, that would be very interesting. You might make progress by looking at the table contents and getting logs from around the time of the recorded timestamps to see what might have ran and caused this. |
Hi @AlanCoding 😄 Thank you for the explanation, I wasn't aware of the mentioned issues.
Scaling AWX down and deleting these entries resulted in no longer these logs being shown. |
I hate to claim "this shouldn't happen"... but it shouldn't happen. There is only 1 case I know of where we access the model: Line 720 in 224e9e0
That is done under an advisory lock which prevents more than 1 process doing this operation at a given time. That's a pretty strong level of assurance against this kind of bug. This reeks of someone doing cowboy coding and running scripts outside of the app logic itself. Still, the fact that it has been reported 3 times suggests there's something systemic going on. |
Hey Daniel can you share some information on how to clear the duplicate entries from the AWX DB? When you mention scale down i assume you mean the replicas of AWX? I only have a single replica so i would assume i can run while a single instance is running? |
FYI - For anyone trying to resolve this issue steps i performed to clear double entry below. Not sure if its the correct way but it seemed to resolve my issue. Confirm if you have two entries: Delete both entries: |
Please confirm the following
[email protected]
instead.)Bug Summary
On the AWX Task Pod I see the following error message being logged on a loop constantly:
Pod Startup seems normal:
AWX version
23.0.0
Select the relevant components
Installation method
kubernetes
Modifications
no
Ansible version
ansible [core 2.15.0]
Operating system
Fedora CoreOS 38
Web browser
Firefox
Steps to reproduce
We are using AWX for a while now and only recently did we notice this, and it seems to only have started happening after the upgrade from version 22.5.0 to 22.7.0.
Currently on version 23.0.0 using the AWX Operator on version 2.5.2 we still face this issue.
Expected results
No errors being outputted to the logs.
Actual results
The following log message keeps repeating in the awx-task Pod every 30s +/-:
Additional information
No response
The text was updated successfully, but these errors were encountered: