Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feat: Create fly.io template to launch beta9 in fly.io machines #722

Open
wants to merge 17 commits into
base: main
Choose a base branch
from

Conversation

jsun-m
Copy link
Contributor

@jsun-m jsun-m commented Nov 19, 2024

Add templates and setup file to spin up an entire beta9 cluster in fly.io.

This includes templates for:

  • control plane redis
  • juicefs redis
  • blobcache redis
  • control plane postgres
  • control plane storage
  • image storage

Currently, there is no built in provider to automatically provision nodes. You would have to use the beta9 agent to manually add a node to the beta9 cluster.

@jsun-m jsun-m changed the title Create fly io template to launch beta9 in fly io machines Create fly.io template to launch beta9 in fly.io machines Nov 19, 2024
@jsun-m jsun-m changed the title Create fly.io template to launch beta9 in fly.io machines Feat:Create fly.io template to launch beta9 in fly.io machines Nov 19, 2024
@jsun-m jsun-m changed the title Feat:Create fly.io template to launch beta9 in fly.io machines Feat: Create fly.io template to launch beta9 in fly.io machines Nov 19, 2024
@@ -177,7 +177,8 @@ def create_machine(service: ServiceClient, pool: str):
--tailscale-url "{res.machine.tailscale_url}" \\
--tailscale-auth "{res.machine.tailscale_auth}" \\
--pool-name "{res.machine.pool_name}" \\
--provider-name "{res.machine.provider_name}"
--provider-name "{res.machine.provider_name}" \\
--gateway-url "https://{service._config.gateway_host}"
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This would need to conditionally added since in beam the grpc host and the http host are on different subdomains

@@ -274,7 +274,8 @@ func (wpc *ExternalWorkerPoolController) createWorkerJob(workerId, machineId str
workerImage := fmt.Sprintf("%s/%s:%s",
wpc.config.Worker.ImageRegistry,
wpc.config.Worker.ImageName,
wpc.config.Worker.ImageTag,
// wpc.config.Worker.ImageTag,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

revert this

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@@ -370,6 +371,11 @@ func (wpc *ExternalWorkerPoolController) getWorkerEnvironment(workerId, machineI
podHostname = fmt.Sprintf("machine-%s.%s.%s", machineId, wpc.config.Tailscale.User, wpc.config.Tailscale.HostName)
}

grpcPort := 443
if wpc.config.GatewayService.GRPC.ExternalPort != 0 {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why not just set this in the config to 443 and remove the if clause?

Hostname string `key:"hostname" json:"hostname"`
}

type RootAgentConfig struct {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we rename this?


type RootAgentConfig struct {
ControlPlaneRedis AgentRedisConfig `key:"controlPlaneRedis" json:"control_plane_redis"`
JuicefsRedis AgentRedisConfig `key:"juicefsRedis" json:"juicefs_redis"`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure I like this config structure. We're basically only using it to override the URLs, feels like there's a more elegant approach here

Copy link
Contributor Author

@jsun-m jsun-m Dec 5, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i'm not a fan either. especially since we don't pass this to the agent.

we do this

remoteConfig.Database.Redis.Addrs[0] = fmt.Sprintf("%s:%d", redisHostname, 6379)
remoteConfig.Storage.JuiceFS.RedisURI = fmt.Sprintf("%s://:%s@%s", scheme, juicefsRedisPassword, juiceFsRedisHostname)
remoteConfig.BlobCache.BlobFs.Sources[idx].JuiceFS.RedisURI = remoteConfig.Storage.JuiceFS.RedisURI

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added a new variable config and removed the juicefs redis, controlplane redis and blobcache urls.
The new veriable is just called
DynamicServiceHosts

that encapsulate all of them

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

basically if its true, we use tailscale for the agent configs, if not it will use the host currently specified in config in the places that tailscale would have overrided

@@ -364,6 +364,7 @@ func (s *Worker) getContainerEnvironment(request *types.ContainerRequest, option
fmt.Sprintf("CONTAINER_ID=%s", request.ContainerId),
fmt.Sprintf("BETA9_GATEWAY_HOST=%s", os.Getenv("BETA9_GATEWAY_HOST")),
fmt.Sprintf("BETA9_GATEWAY_PORT=%s", os.Getenv("BETA9_GATEWAY_PORT")),
fmt.Sprintf("BETA9_GATEWAY_TLS=%s", os.Getenv("BETA9_GATEWAY_TLS")),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can just detect if a host has TLS enabled programatically, something like this:

def is_tls_enabled(url: str) -> bool:
    """Check if a given URL supports TLS.

    Args:
        url (str): The URL to check.

    Returns:
        bool: True if TLS is enabled, False otherwise.
    """
    try:
        # Extract hostname and default to port 443 for HTTPS
        hostname = url.split("//")[-1].split("/")[0]
        port = 443

        # Create SSL context and test the connection
        context = ssl.create_default_context()
        with socket.create_connection((hostname, port), timeout=5) as sock:
            with context.wrap_socket(sock, server_hostname=hostname):
                return True
    except (ssl.SSLError, socket.error):
        return False

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Then we can remove the env var

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this makes sense. the code here works

@jsun-m jsun-m requested a review from luke-lombardi December 6, 2024 01:03
@@ -274,7 +274,8 @@ func (wpc *ExternalWorkerPoolController) createWorkerJob(workerId, machineId str
workerImage := fmt.Sprintf("%s/%s:%s",
wpc.config.Worker.ImageRegistry,
wpc.config.Worker.ImageName,
wpc.config.Worker.ImageTag,
// wpc.config.Worker.ImageTag,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@@ -176,6 +180,7 @@ def prompt_for_config_context(
prompt_gateway_port = functools.partial(
terminal.prompt, text="Gateway Port", default=gateway_port or settings.gateway_port
)
prompt_tls = functools.partial(terminal.prompt, text="TLS", default=False)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why is this prompted? shouldn't it be using that function below?

is_tls_enabled

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

removed

@jsun-m jsun-m requested a review from luke-lombardi December 20, 2024 16:54
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants