Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] Namespace creation fails with default pod template #5376

Closed
2 tasks done
nuhaj opened this issue May 15, 2024 · 20 comments
Closed
2 tasks done

[BUG] Namespace creation fails with default pod template #5376

nuhaj opened this issue May 15, 2024 · 20 comments
Assignees
Labels
bug Something isn't working

Comments

@nuhaj
Copy link

nuhaj commented May 15, 2024

Describe the bug

when configuring a basic default pod template the name spaces are not created. Projects are created without error but there are no traces of the namespace for the project. When deploying a workflow the propeller error is
failed to create workflow in propeller namespaces "test-development" not found…

Removing the default pod, the namespaces are created

Expected behavior

I expect the namespaces to be created or an error when creation failed. I expect the default pod template to be used as either primary, default containers or ignored.

Additional context to reproduce

 clusterResourceTemplates = {
      inline = {
        "test_template.yaml" = file("${path}/test_template.yaml")
      }
    }
....
        plugins = {
          k8s = {
            default-pod-template-name = "test_template.yaml" # also tried with "test_template"
          }
        }

Trimming down the default template to the most basic skeleton for testing. test_template.yaml

apiVersion: v1
kind: PodTemplate
metadata:
  name: test_template
  namespace: flyte # default for all namespaces
template:
  metadata:
    labels:
      foo: from-pod-template
  spec:
    containers:
      - name: noob # should be ignore, also tried with default and primary
        image: docker.io/rwgrim/docker-noop

Screenshots

No response

Are you sure this issue hasn't been raised already?

  • Yes

Have you read the Code of Conduct?

  • Yes
@nuhaj nuhaj added bug Something isn't working untriaged This issues has not yet been looked at by the Maintainers labels May 15, 2024
Copy link

welcome bot commented May 15, 2024

Thank you for opening your first issue here! 🛠

Copy link

runllm bot commented May 15, 2024

The issue you're experiencing seems related to the configuration and namespace handling of PodTemplates in Flyte. According to this discussion, PodTemplates need to be present in the namespace where the execution is triggered. Additionally, another discussion mentions that PodTemplates are namespace-specific and should be configured accordingly. Lastly, this issue highlights that the default-pod-template-name should be correctly set in the FlytePropeller configuration.

Would you like more details on how to configure the namespaces and PodTemplates correctly?

Sources:


This is an AI-generated response and your feedback is appreciated! Please leave a 👍 if this is helpful and 👎 if it is not.

Mention @RunLLM to ask follow-up questions. See runllm.com for more!

@eapolinario eapolinario removed the untriaged This issues has not yet been looked at by the Maintainers label May 16, 2024
@davidmirror-ops
Copy link
Contributor

@nuhaj what version of Flyte are you using?
@hamersaw is this the PR that freed the watch from having to have the default Pod template set? flyteorg/flytepropeller#614

@nuhaj
Copy link
Author

nuhaj commented May 16, 2024

flyte-binary-release:v1.12.0

@hamersaw
Copy link
Contributor

@nuhaj what version of Flyte are you using?
@hamersaw is this the PR that freed the watch from having to have the default Pod template set? flyteorg/flytepropeller#614

Yes, this looks like it.

@nuhaj
Copy link
Author

nuhaj commented Jun 17, 2024

Any updates here? In a similar issue we can not use clusterResourceTemplate to define namespace for flyte-binary

@davidmirror-ops
Copy link
Contributor

@nuhaj I'll work on reproducing this behavior, but I'm curious, what's the content of your test_template.yaml file?

@nuhaj
Copy link
Author

nuhaj commented Jun 17, 2024

I posted a trimmed version above that I tested just to see if the error would resolve, I expect it to ignore the container since name: noob and its expecting name to be default or primary.

Below is the default pod for all namespaces that I initially planned to set. The intent was a default pod with persistent volume claim. The definition below works when defined in the task decorator as a compile-time pod template but not as a default pod

apiVersion: v1
kind: PodTemplate
metadata:
  name: flyte-workflow-base
  namespace: flyte
template:
  metadata:
    name: flyte-workflow-base
  spec:
    initContainers:
      - name: init
        image: alpine
        volumeMounts:
        - name: shared-data
          mountPath: /data
    containers:
      - name: primary       
        volumeMounts:
          - name: shared-data
            mountPath: /data
    volumes:  
      - name: shared-data
        ephemeral:
          volumeClaimTemplate:
            spec:
              accessModes: ["ReadWriteOnce"]
              resources:
                requests:
                  storage: 5Gi

@davidmirror-ops
Copy link
Contributor

Thanks for sharing.
My confusion comes from what I see Flyte expects in a template for the ClusterResources controller (see example).
The PodTemplate is a different resource and I don't see how any PodTemplate definition would be able to create the project-domain namespaces.
In that sense, what the clusterResourceTemplates section should contain is a namespace spec, even the PodTemplate can be part of that section I guess, but the sole PodTemplate won't create the namespaces, unless I'm missing something. Does that make sense?

@nuhaj
Copy link
Author

nuhaj commented Jun 17, 2024

Yes they are separate issues, I was attempting to create a default podtemplate with pvc and separately also override the default namespace of {{project}}-{{domain}}. For this open report we can focus on the default podtemplate resource.

@davidmirror-ops
Copy link
Contributor

@nuhaj regarding the PodTemplate behavior I just had success using the flyte-binary v1.12.0 with the following config:

In the Helm values:

inline:
    plugins:
      k8s:
        inject-finalizer: true
        default-pod-template-name: "flyte-workflow-base"

The PodTemplate:

apiVersion: v1
kind: PodTemplate
metadata:
  name: flyte-workflow-base
  namespace: flyte
template:
  metadata:
    name: flyte-workflow-base
  spec:
    initContainers:
      - name: init
        image: alpine
        volumeMounts:
        - name: shared-data
          mountPath: /data
    containers:
      - name: default 
        image: rwgrim/docker-noop     
        volumeMounts:
          - name: shared-data
            mountPath: /data
        terminationMessagePath: "/dev/foo"
    hostNetwork: false
    volumes:  
      - name: shared-data
        ephemeral:
          volumeClaimTemplate:
            spec:
              accessModes: ["ReadWriteOnce"]
              resources:
                requests:
                  storage: 5Gi

And running a simple workflow:

import typing
from flytekit import task, workflow
@task
def say_hello(name: str) -> str:
    return f"hello {name}!"
@task
def greeting_length(greeting: str) -> int:

@workflow
def wf(name: str = "union") -> typing.Tuple[str, int]:
    greeting = say_hello(name=name)
    greeting_len = greeting_length(greeting=greeting)
    return greeting, greeting_len

if __name__ == "__main__":
    print(f"Running wf() { wf(name='passengers') }")

I get the following Pod spec:

Resulting Pod spec
k describe po fa0428345b5ae4f778cd-n0-0 -n flytesnacks-development

Name:             fa0428345b5ae4f778cd-n0-0
Namespace:        flytesnacks-development
Priority:         0
Service Account:  default
Node:             flytebinary/192.168.67.2
Start Time:       Tue, 18 Jun 2024 15:39:53 -0500
Labels:           domain=development
                  execution-id=fa0428345b5ae4f778cd
                  interruptible=false
                  node-id=n0
                  project=flytesnacks
                  shard-key=2
                  task-name=hello-with-podtemplate-say-hello
                  workflow-name=hello-with-podtemplate-wf
Annotations:      cluster-autoscaler.kubernetes.io/safe-to-evict: false
                  primary_container_name: fa0428345b5ae4f778cd-n0-0
Status:           Running
IP:               10.244.0.35
IPs:
  IP:           10.244.0.35
Controlled By:  flyteworkflow/fa0428345b5ae4f778cd
Init Containers:
  init:
    Container ID:   docker://e0cd9f92a5a8ed64e7d8c7eb7af600ffae930eb6901a146a7df076c5058b5e5b
    Image:          alpine
    Image ID:       docker-pullable://alpine@sha256:77726ef6b57ddf65bb551896826ec38bc3e53f75cdde31354fbffb4f25238ebd
    Port:           <none>
    Host Port:      <none>
    State:          Terminated
      Reason:       Completed
      Exit Code:    0
      Started:      Tue, 18 Jun 2024 15:39:55 -0500
      Finished:     Tue, 18 Jun 2024 15:39:55 -0500
    Ready:          True
    Restart Count:  0
    Environment:    <none>
    Mounts:
      /data from shared-data (rw)
      /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-s6sdb (ro)
Containers:
  fa0428345b5ae4f778cd-n0-0:
    Container ID:  docker://2e76cca15e500fdb6d29ad53d5bb11a768ffde81ca4c5a0b533da55104112e15
    Image:         cr.flyte.org/flyteorg/flytekit:py3.11-1.11.0
    Image ID:      docker-pullable://cr.flyte.org/flyteorg/flytekit@sha256:426e7ba39b07f7b9bbc8df5b3166db1e5ac24a1502251820be2b19f8d92b105c
    Port:          <none>
    Host Port:     <none>
    Args:
      pyflyte-fast-execute
      --additional-distribution
      s3://flyte/flytesnacks/development/C3UVRAYHHEEJLHP57SHO6HCKRQ======/script_mode.tar.gz
      --dest-dir
      .
      --
      pyflyte-execute
      --inputs
      s3://flyte/metadata/propeller/flytesnacks-development-fa0428345b5ae4f778cd/n0/data/inputs.pb
      --output-prefix
      s3://flyte/metadata/propeller/flytesnacks-development-fa0428345b5ae4f778cd/n0/data/0
      --raw-output-data-prefix
      s3://flyte/data/k4/fa0428345b5ae4f778cd-n0-0
      --checkpoint-path
      s3://flyte/data/k4/fa0428345b5ae4f778cd-n0-0/_flytecheckpoints
      --prev-checkpoint
      ""
      --resolver
      flytekit.core.python_auto_container.default_task_resolver
      --
      task-module
      hello-with-podtemplate
      task-name
      say_hello
    State:          Running
      Started:      Tue, 18 Jun 2024 15:39:57 -0500
    Ready:          True
    Restart Count:  0
    Limits:
      cpu:     100m
      memory:  500Mi
    Requests:
      cpu:     100m
      memory:  500Mi
    Environment:
      FLYTE_INTERNAL_EXECUTION_WORKFLOW:  flytesnacks:development:hello-with-podtemplate.wf
      FLYTE_INTERNAL_EXECUTION_ID:        fa0428345b5ae4f778cd
      FLYTE_INTERNAL_EXECUTION_PROJECT:   flytesnacks
      FLYTE_INTERNAL_EXECUTION_DOMAIN:    development
      FLYTE_ATTEMPT_NUMBER:               0
      FLYTE_INTERNAL_TASK_PROJECT:        flytesnacks
      FLYTE_INTERNAL_TASK_DOMAIN:         development
      FLYTE_INTERNAL_TASK_NAME:           hello-with-podtemplate.say_hello
      FLYTE_INTERNAL_TASK_VERSION:        7g1m6UNX8h7taWI8mE39hg
      FLYTE_INTERNAL_PROJECT:             flytesnacks
      FLYTE_INTERNAL_DOMAIN:              development
      FLYTE_INTERNAL_NAME:                hello-with-podtemplate.say_hello
      FLYTE_INTERNAL_VERSION:             7g1m6UNX8h7taWI8mE39hg
      FLYTE_AWS_ENDPOINT:                 http://minio.flyte.svc.cluster.local:9000
      FLYTE_AWS_ACCESS_KEY_ID:            minio
      FLYTE_AWS_SECRET_ACCESS_KEY:        miniostorage
    Mounts:
      /data from shared-data (rw)
      /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-s6sdb (ro)
Conditions:
  Type              Status
  Initialized       True
  Ready             True
  ContainersReady   True
  PodScheduled      True
Volumes:
  shared-data:
    Type:          EphemeralVolume (an inline specification for a volume that gets created and deleted with the pod)
    StorageClass:
    Volume:
    Labels:            <none>
    Annotations:       <none>
    Capacity:
    Access Modes:
    VolumeMode:    Filesystem
  kube-api-access-s6sdb:
    Type:                    Projected (a volume that contains injected data from multiple sources)
    TokenExpirationSeconds:  3607
    ConfigMapName:           kube-root-ca.crt
    ConfigMapOptional:       <nil>
    DownwardAPI:             true
QoS Class:                   Burstable
Node-Selectors:              <none>
Tolerations:                 node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
                             node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events:
  Type     Reason            Age   From               Message
  ----     ------            ----  ----               -------
  Warning  FailedScheduling  21s   default-scheduler  0/1 nodes are available: waiting for ephemeral volume controller to create the persistentvolumeclaim "fa0428345b5ae4f778cd-n0-0-shared-data". preemption: 0/1 nodes are available: 1 No preemption victims found for incoming pod..
  Normal   Scheduled         20s   default-scheduler  Successfully assigned flytesnacks-development/fa0428345b5ae4f778cd-n0-0 to flytebinary
  Normal   Pulling           19s   kubelet            Pulling image "alpine"
  Normal   Pulled            18s   kubelet            Successfully pulled image "alpine" in 946.808333ms (946.812167ms including waiting)
  Normal   Created           18s   kubelet            Created container init
  Normal   Started           18s   kubelet            Started container init
  Normal   Pulling           18s   kubelet            Pulling image "cr.flyte.org/flyteorg/flytekit:py3.11-1.11.0"
  Normal   Pulled            16s   kubelet            Successfully pulled image "cr.flyte.org/flyteorg/flytekit:py3.11-1.11.0" in 2.262136209s (2.262146584s including waiting)
  Normal   Created           16s   kubelet            Created container fa0428345b5ae4f778cd-n0-0
  Normal   Started           16s   kubelet            Started container fa0428345b5ae4f778cd-n0-0

Notice here that I used default as the container name, which should instruct flytepropeller to use this spec as a base for all the containers in the Pod, not only the primary (in which case, you'd use primary as the container name)


In regards to the namespace issue which I wasn't able to reproduce: without even using anything in the clusterResourceTemplates section and creating a new project:

flytectl create project --name projectx --id projectx

The cluster resources controller creates the namespaces:

k get ns
NAME                      STATUS   AGE
default                   Active   63d
flyte                     Active   62d
flytesnacks-development   Active   61d
flytesnacks-production    Active   61d
flytesnacks-staging       Active   61d
kube-node-lease           Active   63d
kube-public               Active   63d
kube-system               Active   63d
projectx-development      Active   6h47m
projectx-production       Active   6h47m
projectx-staging          Active   6h47m

Let me know if you have additional questions.

@nuhaj
Copy link
Author

nuhaj commented Jun 25, 2024

@davidmirror-ops for the namespace we are over riding the default namespace {{ project }}-{{ domain }} in cluster resource template. Instead of projectx-development we would get say flyte-projectx-development
pyflyte run still tries to register the workflow to projectx-development. Would a default pod template be used here to over ride the namespace again?

@davidmirror-ops
Copy link
Contributor

Would a default pod template be used here to over ride the namespace again?

Maybe that would work but I don't think it's a very maintainable workaround.

So, is flyte-projectx a project in your environment? Otherwise flyte will still run your workflows on the project-domain namespace.

If this is an existing namespace, you can instruct Flyte to run your executions on a particular namespace:

configuration:
  inline:
    namespace_mapping:
      template: "my_namespace"

You can create new projects using flytectl create project --name <PROJECT_NAME> --id <PROJECT_NAME>

@davidmirror-ops
Copy link
Contributor

@nuhaj is this still an issue in your environment?

@nuhaj
Copy link
Author

nuhaj commented Jul 22, 2024

@davidmirror-ops I was away. Trying this now. Did you also have a section inside clusterResourceTemplates.inline?

clusterResourceTemplates = {

  "flyte-workflow-base.yaml" = ...

@davidmirror-ops
Copy link
Contributor

@nuhaj No, I wasn't setting anything under that section

@nuhaj
Copy link
Author

nuhaj commented Jul 24, 2024

@davidmirror-ops The init container section of the pod describe output does not appear for me even with a new project and workflow deployment . I do see the pod-template in the config but I don't see the yaml contents of pod-template extrapolated anywhere

100-inline-config.yaml: |
    k8s:
      default-pod-template-name: pod-template
      inject-finalizer: true

How does the helm chart know where "flyte-workflow-base" is ? defined the way you have it there is not context on path

inline:
    plugins:
      k8s:
        inject-finalizer: true
        default-pod-template-name: "flyte-workflow-base"

@davidmirror-ops
Copy link
Contributor

The init container section of the pod describe output does not appear for me even with a new project and workflow deployment

You mean an init container as part of the flyte-binary pod or the execution one?

How does the helm chart know where "flyte-workflow-base" is ?

The logic is not applied by the Helm chart. This field ends up on a configmap that then propeller picks up.

When you define this global Pod template as part of the K8s plugin config, propeller starts a watch looking first for that template in the namespace where the task is being executed, otherwise, it looks up on the flyte namespace (see docs)

@nuhaj
Copy link
Author

nuhaj commented Aug 1, 2024

@davidmirror-ops we managed to get the default pod template working by

  1. Adding the pod template to the clusterResourceTemplate inline definition
  2. Removing namespace from metadata (for default template for all pods)
metadata:
  name: flyte-workflow-base
  namespace: flyte

Thank you for your help, the pod description and template set us in the right direction to debug

@davidmirror-ops
Copy link
Contributor

@nuhaj great, hopefully, we'll improve the PodTemplates docs soon to cover some of the gaps. Any other questions please let us know. Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

4 participants