Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

clean up S3 Access logs #1658

Closed
carolyncole opened this issue Jan 23, 2024 · 4 comments
Closed

clean up S3 Access logs #1658

carolyncole opened this issue Jan 23, 2024 · 4 comments
Assignees

Comments

@carolyncole
Copy link
Member

https://s3.console.aws.amazon.com/s3/buckets/pdc-describe-logs?bucketType=general&region=us-east-1&tab=objects#
Logs need to be purged regularly.

@hectorcorrea
Copy link
Member

hectorcorrea commented Aug 14, 2024

As an experiment, I ran the following command to get a list of all the files that we have in the pdc-describe-logs bucket:

aws s3 ls --summarize --human-readable --recursive s3://pdc-describe-logs/ > pdc_describe_logs.txt

Lo and behold we have 9 million files (9,273,338 to be exact). They add up to 290 GB.

The command took about 75 minutes to run and it gave me a list of all the files and their dates. The list is available here: https://drive.google.com/file/d/1fhyNJLWYCEfHVPCcq5bS8DmBxZ4FcQGP/view?usp=drive_link

We could go throught that list and start deleting files older than X.

One problem might be that the aws s3 CLI tool does not seem to accept wildcards for the rm command.

Another approach to address this problem could be to start preserving files to a new S3 bucket and delete this bucket with 9 million files in two years or so.

@carolyncole carolyncole self-assigned this Oct 17, 2024
@carolyncole
Copy link
Member Author

I set up a lifecycle rule to delete all objects in the log bucket that are older than 60 days. In theory at midnight Amazon should expire all the old objects. https://us-east-1.console.aws.amazon.com/s3/buckets/pdc-describe-logs?region=us-east-1&bucketType=general&tab=metrics

@carolyncole
Copy link
Member Author

I took the weekend, but the bucket size is finally down and there are no logs from 2023!
Screenshot 2024-10-21 at 8 47 30 AM

@hectorcorrea
Copy link
Member

@carolyncole this is fantastic! I am glad you found a way to deal with this within the AWS toolkit!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants