AWS: Make Fetches Concurrent #1201
ramonpetgrave64
started this conversation in
Ideas
Replies: 2 comments 4 replies
-
Some things to consider: How will we apply/refactor this change to all AWS modules? Will it be easy for new module authors to copy this pattern in their own modules? Can you describe the performance gain that you observed in #1192 with a bit more detail? |
Beta Was this translation helpful? Give feedback.
1 reply
-
We tried to incorporate multi-threading. One of the challenges we faced was around Rate Limits. boto3 has retry policies in place. But, it sometimes fails if there are a ton of concurrent requests going to AWS. |
Beta Was this translation helpful? Give feedback.
3 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
Title: Make Fetches Concurrent
Description:
Cartography up to now is entirely serial. But there are many opportunities for concurrency, particularly during the
get_
phases of sync modules.As a basic example: To get all the data needed for all AWS ECR Images, we:
The loop in the second step above is what we can make concurrent, saving time of network overhead when there are perhaps many thousands of repositories.
The boto clients are also thread-safe, meaning it's okay to run them in multiple concurrent threads.
Ideally, we would use the async version of client libraries, but boto does not support that. There is a project aiboto3 that seems to wrap around boto to make its methods async, but personally I would rather not.
I have a pending PR as another workaround #1192. The idea there is to do minimal refactoring by
To be more concrete with the ECR Images example: you might think to just convert
get_ecr_repository_images
to anasync def
, but that still does not yield any performance gain because inside of that is a long-running function call with boto'slist_images
method. Because of the GIL, it is ultimately thelist_images
method that needs to run in a separate thread, but I chose to keep the refactor simple by putting it's only caller in a separate thread.I've read a bit about asyncio and GIL from the realpython guides, but I'm still new and would like to hear thoughts from the community.
Relevant Links:
Beta Was this translation helpful? Give feedback.
All reactions