Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WIP: Session deployment #22

Draft
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

thodson-usgs
Copy link
Contributor

This PR would configure Flink to run in session mode. Essentially, it would create a single job manager for the cluster, and all pangeo-forge-recipes would submit their jobs to that job manager.
One of the main advantages of this would be to centralize all infrastructure configuration configuration in pangeo-forge-cloud-federation.
Currently, infrastructure is spread across pangeo-forge-cloud-federation, pangeo-forge-runner and within the individual recipe's config.py, and this makes it difficult to configure the cluster. Ideally, we could have multiple node pools of on demand and spot, instances, high-availability job managers, reactive scaling, default failure strategies, etc and set all that within pangeo-forge-cloud-federation. Then the recipe and pangeo-forge-runner require minimal configuration, like setting parallelism and the job name.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant