Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

upload tarball and metadata file to different directories in S3 bucket #241

Open
trz42 opened this issue Feb 6, 2024 · 0 comments
Open

Comments

@trz42
Copy link
Contributor

trz42 commented Feb 6, 2024

Currently (in EESSI), the tarball for built software and a metadata file describing the contents of that tarball are uploaded to the same directory in the S3 bucket. During ingestion they stay in the same directory. The ingestion procedure puts only the metadata file into a staging repository and when the state of an ingestion changes, the metadata file is moved to a corresponding top level directory. For example, it is first created under new/some_path/TARBALL.meta.txt and moved to staged/some_path/TARBALL.meta.txt when the tarball has been staged from the S3 bucket to the Stratum-0 server.

The current procedure may lead to many GitHub API requests for which an hourly limit of 5000 is imposed. Hitting that limit will lead to failing or slowed-down ingestion progression.

In NESSI, we use a slightly different approach. Tarballs are always put under tarballs/some_path/TARBALL in the S3 bucket (different top-level directory) and never moved (same as in EESSI). Metadata files are initially created under new/some_path/TARBALL.meta.txt in the S3 bucket (different top-level directory). The ingestion procedure moves the metadata file in the S3 bucket to a top-level directory corresponding to the state of the ingestion (differs to EESSI approach). The metadata file is not moved between different directories in the staging repository on GitHub (differs to EESSI approach).

In NESSI, we have modified the script eessi-upload-to-staging such that the tarball and the metadata file are uploaded to different top-level directories. Code looks like this after the change

        echo Uploading to "${url}"
        echo "  store tarball at tarballs/${aws_path}/${aws_file}"
        upload_to_staging_bucket \
                "${file}" \
                "${bucket_name}" \
                "tarballs/${aws_path}/${aws_file}" \
                "${endpoint_url}"
        echo "  store metadata file at new/${aws_path}/${aws_file}.meta.txt"
        upload_to_staging_bucket \
                "${metadata_file}" \
                "${bucket_name}" \
                "new/${aws_path}/${aws_file}.meta.txt" \
                "${endpoint_url}"

The corresponding code in EESSI is

        echo Uploading to "${url}"
        upload_to_staging_bucket \
                "${file}" \
                "${bucket_name}" \
                "${aws_path}/${aws_file}" \
                "${endpoint_url}"
        upload_to_staging_bucket \
                "${metadata_file}" \
                "${bucket_name}" \
                "${aws_path}/${aws_file}.meta.txt" \
                "${endpoint_url}"

Instead of hardcoding the destination for the uploads it might be better to make that location configurable. This would also allow for a smoother migration because using different locations in the S3 bucket will also require changes to the the ingestion scripts running as cron jobs on the Stratum-0.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant