Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Combine CWL and modular CWL into a single container image #251

Merged
merged 11 commits into from
Jan 21, 2025

Conversation

nikki-t
Copy link
Collaborator

@nikki-t nikki-t commented Jan 3, 2025

Purpose

Combine CWL and Modular CWL into a single container image. Modify modular CWL DAG definition to override image entry point and point to modular bash script.

Proposed Changes

  • [ADD] A single container image that can execute the regular CWL task or the modular task that includes stage in, processing, and stage out.

Issues

Testing

  • Deployed and tested in unity-venue-dev. Confirmed existing functionality is preserved. The modular CWL DAG completed successfuly and processed files were uploaded to S3.
  • Also confirmed functionality of CWL DAG (non-modular).

@nikki-t nikki-t self-assigned this Jan 3, 2025
Copy link
Collaborator

@LucaCinquini LucaCinquini left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @nikki-t : I am a little confused because this line in unity_sps_utils.py seems to still point to the latest Docker image that I built a few days ago:

SPS_DOCKER_CWL_IMAGE = "ghcr.io/unity-sds/unity-sps/sps-docker-cwl:2.4.0"

Perhaps you tested with this Docker image:
ghcr.io/unity-sds/unity-sps/sps-docker-cwl:250-single-container-image

and then you changed back the URI to be ready for the merge?

@nikki-t
Copy link
Collaborator Author

nikki-t commented Jan 6, 2025

@LucaCinquini - That is correct, I tested with ghcr.io/unity-sds/unity-sps/sps-docker-cwl:250-single-container-image.

I think it should be updated to point to: SPS_DOCKER_CWL_IMAGE = "ghcr.io/unity-sds/unity-sps/sps-docker-cwl:2.5.0" as this will be part of the next release which will include the single Docker image for both entrypoints. But I am not really sure how we do the versioning or whether it is automated?

Copy link
Collaborator

@LucaCinquini LucaCinquini left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @nikki-t : this PR is approved. Can you also add here the new changes to the example workflows (with networking enabled) and reference the Docker image version 2.5.0 (which you can create). Thanks.

@nikki-t
Copy link
Collaborator Author

nikki-t commented Jan 17, 2025

@LucaCinquini - I have updated the modular CWL so that it is consistent with the regular CWL EC2 instance type selection. I also updated the stage in and stage out CWL to point to the DS provided workflows and confirmed that the stage out S3 bucket is saved as an SSM parameter with the most recent deployment and updated the parameter name in the unity_sps_utils.py file.

I also built the 2.5.0 container from this branch in case you wanted to test everything out. I tested both the CWL DAG Modular and the CWL DAG and confirmed everything is working. I think this PR is ready to be merged after figuring out the pre-commit failure.

@nikki-t nikki-t force-pushed the 250-single-container-image branch from 31981e1 to 34a3a7e Compare January 17, 2025 19:08
Copy link
Collaborator

@LucaCinquini LucaCinquini left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@nikki-t : this value should be parametrized for the venue?

DS_S3_BUCKET_PARAM = "/unity/unity/dev/ds/datastore-bucket"

i.e.

/unity/unity/test/ds/datastore-bucket
/unity/unity/prod/ds/datastore-bucket

@nikki-t
Copy link
Collaborator Author

nikki-t commented Jan 21, 2025

@LucaCinquini - It looks like I can query AIRFLOW_VAR_UNITY_VENUE environment variable to get the venue of the stage out bucket. Does this seem like it would work for test and prod? I have tested it in dev. It also looks like this change is causing failures with the unit tests.

@LucaCinquini
Copy link
Collaborator

It should work. Perhaps those envs are not available in the Test environment, but the env variables can probably be added.

@nikki-t
Copy link
Collaborator Author

nikki-t commented Jan 21, 2025

@LucaCinquini - I added the AIRFLOW_VAR_UNITY_VENUE environment variable to the unit test GitHub action definition and they are passing now.

@LucaCinquini LucaCinquini merged commit 2e1326e into develop Jan 21, 2025
2 checks passed
@LucaCinquini LucaCinquini deleted the 250-single-container-image branch January 21, 2025 17:28
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants