Skip to content
This repository has been archived by the owner on Jun 20, 2023. It is now read-only.

scan_bucket.py struggles when too many objects in the bucket #151

Open
cbp123 opened this issue Jan 8, 2021 · 2 comments
Open

scan_bucket.py struggles when too many objects in the bucket #151

cbp123 opened this issue Jan 8, 2021 · 2 comments

Comments

@cbp123
Copy link

cbp123 commented Jan 8, 2021

We have a bucket with millions of objects in it. Because scan_bucket.py loads all objects in the bucket into memory before scanning them, it can freeze for a long time at startup. I imagine in the worst case it could start to run out of memory.

I modified the code to load and scan the objects in pages instead and it worked much better. If you think this is a better method, let me know and I can submit a PR.

@denniswebb
Copy link
Contributor

Yes a PR would be great. Thanks.

@jdepp
Copy link
Contributor

jdepp commented May 3, 2021

Hey @cbp123, any update on this?

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants