Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SupRsync docker on satp2 smurf servers crashes due to QueuePool limit #739

Open
jlashner opened this issue Aug 28, 2024 · 1 comment
Open
Labels
agent: suprsync bug Something isn't working needs triage Cause of bug still unknown, needs investigation.

Comments

@jlashner
Copy link
Collaborator

We're seeing these failure messages on both smurf-srv19 and smurf-srv21.

2024-08-28T13:41:12+0000 run:0 CRASH: [Failure instance: Traceback: <class 'sqlalchemy.exc.TimeoutError'>: QueuePool limit of size 5 overflow 10 reached, connection timed out, timeout 30.00 (Background on this error at: https://sqlalche.me/e/20/3o7r)
/usr/lib/python3.10/threading.py:1016:_bootstrap_inner
/usr/lib/python3.10/threading.py:953:run
/opt/venv/lib/python3.10/site-packages/twisted/_threads/_threadworker.py:49:work
/opt/venv/lib/python3.10/site-packages/twisted/_threads/_team.py:192:doWork
--- <exception caught here> ---
/opt/venv/lib/python3.10/site-packages/twisted/python/threadpool.py:269:inContext
/opt/venv/lib/python3.10/site-packages/twisted/python/threadpool.py:285:<lambda>
/opt/venv/lib/python3.10/site-packages/twisted/python/context.py:117:callWithContext
/opt/venv/lib/python3.10/site-packages/twisted/python/context.py:82:callWithContext
/opt/venv/lib/python3.10/site-packages/ocs/ocs_agent.py:984:_running_wrapper
/opt/venv/lib/python3.10/site-packages/socs/agents/suprsync/agent.py:198:run
/opt/venv/lib/python3.10/site-packages/socs/db/suprsync.py:707:delete_files
/opt/venv/lib/python3.10/site-packages/socs/db/suprsync.py:391:get_deletable_files
/opt/venv/lib/python3.10/site-packages/sqlalchemy/orm/query.py:2673:all
/opt/venv/lib/python3.10/site-packages/sqlalchemy/orm/query.py:2827:_iter
/opt/venv/lib/python3.10/site-packages/sqlalchemy/orm/session.py:2351:execute
/opt/venv/lib/python3.10/site-packages/sqlalchemy/orm/session.py:2226:_execute_internal
/opt/venv/lib/python3.10/site-packages/sqlalchemy/orm/session.py:2095:_connection_for_bind
<string>:2:_connection_for_bind
/opt/venv/lib/python3.10/site-packages/sqlalchemy/orm/state_changes.py:139:_go
/opt/venv/lib/python3.10/site-packages/sqlalchemy/orm/session.py:1189:_connection_for_bind
/opt/venv/lib/python3.10/site-packages/sqlalchemy/engine/base.py:3276:connect
/opt/venv/lib/python3.10/site-packages/sqlalchemy/engine/base.py:146:__init__
/opt/venv/lib/python3.10/site-packages/sqlalchemy/engine/base.py:3300:raw_connection
/opt/venv/lib/python3.10/site-packages/sqlalchemy/pool/base.py:449:connect
/opt/venv/lib/python3.10/site-packages/sqlalchemy/pool/base.py:1263:_checkout
/opt/venv/lib/python3.10/site-packages/sqlalchemy/pool/base.py:712:checkout
/opt/venv/lib/python3.10/site-packages/sqlalchemy/pool/impl.py:168:_do_get
]
2024-08-28T13:41:12+0000 run:0 Status is now "done".
@BrianJKoopman BrianJKoopman added bug Something isn't working agent: suprsync needs triage Cause of bug still unknown, needs investigation. labels Aug 28, 2024
@jlashner
Copy link
Collaborator Author

After investigating, I'm thinking this may be due to the sleep-time of the main loop in the suprsync agent. Right now, most agents are running with a 5 second looptime, and every 5 seconds running something like 15 sql queries. The 5 sec sleep time was mainly used for testing and debugging, but in operation this can be increased to something like 60 sec without any issues. I tried increasing the sleeptime for the sat-uhf agents, so I'm hoping that alleviates the issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
agent: suprsync bug Something isn't working needs triage Cause of bug still unknown, needs investigation.
Projects
None yet
Development

No branches or pull requests

2 participants