-
-
Notifications
You must be signed in to change notification settings - Fork 185
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Unhandled exception in event loop due to possible resource contention #1219
Comments
I'm 90 percent sure this is an issue from aiohttp down. debugging this will require adding a ton of logging to aiohttp. if you can create a repeatable test case that would help a lot, perhaps with moto |
are you using spawned vs forked processes? forking is dangerous with aiohttp due to the event loops |
I need to tear apart several hundred lines of code in this script to see what I can share. The hierarchy of ProcessPoolExecutor and asyncio is fairly simple:
Ubuntu defaults to fork, and macOS defaults to spawn. There's no asyncio code outside the |
Minor update in case this helps someone else ... I implemented a singleton wrapper around
|
hmm, ya forking is bad because each process/thread should have its own loop, I remember back in the day python didn't handle this very well so I had to do a bunch of custom code to create a new event loop in the forked process, even then there'd be dragons. So sessions don't really store anything of importance, not sure why that fixed your issue. Would be interesting to track down. The clients are where all the action is at. |
Describe the bug
Preface: I'm not 100% sure what's going on because the stack trace doesn't include my code. I'll share what I do know.
We have an image processor which fetches millions of images and writes them to cloud storage. We recently migrated a storage bucket from AWS S3 to Backblaze B2 and started seeing the following exception stack randomly in
asyncio.run(debug=True)
.The image processor creates a
ProcessPoolExecutor
, and each process invokesasyncio.run(process_messages(messages))
. Within this function, we create aiobotocore clients, and for the S3 client we set the credentials and endpoint URL. We create tasks viaasyncio.createtask()
andasyncio.wait()
. Each task downloads files from the internet to temp storage and then uploads them to B2. This is the upload code:Note that this code could run for 24h+ without errors when uploading to S3. But upon switching to B2, processes within the pool will get stuck in an infinite exception loop. We've noticed that B2's API is less stable than S3 (throttling, latency, etc.), so that could be triggering some race condition in
aiobotocore
,aiohttp
, or evenasyncio
. We have several different Python envs, and it repros on all of them (i.e., 3.10, 3.12, and 3.13).Furthermore, we noticed that the exception loop gets triggered more frequently when increasing resource utilization. We could go 1h+ without error when running with fewer workers, i.e.,
ProcessPoolExecutor(max_workers=4)
. But when we increase it to 32-64 workers (on a 64 core machine), we'll see at least one of the processes get stuck and time out every ~5 minutes. If we revert the S3 client config and use AWS S3, the problem goes away. We've mitigated the issue by setting atimeout
inasyncio.wait()
and then canceling any pending tasks (pending due to being stuck in the loop).Checklist
pip check
passes without errorspip freeze
resultspip freeze results
Environment:
The text was updated successfully, but these errors were encountered: