TEP for runtime scheduling only TaskRun timeouts ⏱️

This TEP addresses tektoncd/pipeline#4078 by proposing that at runtime `TaskRuns` can be configured with a new timeout that applies before they actually start executing - and only after that are the existing timeouts applied. This is meant to make it possible to control when our existing timeout feature works in resource constrained environment and also to get parity with [Google Cloud Build's existing queuettl feature](https://cloud.google.com/build/docs/build-config-file-schema#queuettl).
bobcatfish · Oct 14, 2021 · 52d70f8 · 52d70f8
1 parent 6e3aac3
commit 52d70f8
Show file tree

Hide file tree

Showing 2 changed files with 365 additions and 0 deletions.
diff --git a/teps/0092-scheduling-timeout.md b/teps/0092-scheduling-timeout.md
@@ -0,0 +1,364 @@
+---
+status: proposed
+title: Scheduling Timeout
+creation-date: '2021-10-13'
+last-updated: '2021-10-13'
+authors:
+- '@bobcatfish'
+- '@badamowicz' # author of the original issue (https://github.com/tektoncd/pipeline/issues/4078)
+---
+
+# TEP-0092: Scheduling Timeout
+
+<!-- toc -->
+- [Summary](#summary)
+- [Motivation](#motivation)
+  - [Goals](#goals)
+  - [Non-Goals](#non-goals)
+  - [Use Cases](#use-cases)
+- [Requirements](#requirements)
+- [Proposal](#proposal)
+  - [Notes/Caveats](#notescaveats)
+  - [Risks and Mitigations](#risks-and-mitigations)
+  - [User Experience](#user-experience)
+  - [Performance](#performance)
+- [Design Details](#design-details)
+- [Test Plan](#test-plan)
+- [Design Evaluation](#design-evaluation)
+- [Drawbacks](#drawbacks)
+- [Alternatives](#alternatives)
+- [Infrastructure Needed](#infrastructure-needed)
+- [Upgrade &amp; Migration Strategy](#upgrade--migration-strategy)
+- [Implementation Pull request(s)](#implementation-pull-request-s)
+- [References](#references)
+<!-- /toc -->
+
+## Summary
+
+Add runtime configuration options to allow setting a timeout for `TaskRuns` which applies before the `TaskRun's`
+underlying pod is scheduled, and only start our existing timeout counters after this timeout has passed, if specified.
+
+## Motivation
+
+Current timeout logic doesn't take into account the time a pod may spend waiting to be scheduled; the timeout countdown
+starts as soon as the TaskRun starts. In some environments the pod backing the TaskRun may not start executing until
+some period of time after that.
+
+### Goals
+
+The goal of this proposal is to add configuration to both `PipelineRuns` and `TaskRuns` which would allow for a timeout
+to be applied to the time a ready to run `TaskRun` waits to actually start executing as containers in a scheduled pod.
+
+### Non-Goals
+
+* Support any timeout configuration around any other pod or container states
+* Support any timeouts related to the queuing of any types other than `TaskRuns` (e.g. `PipelineRuns`)
+
+### Use Cases
+
+1. **Running many tasks in a resource constrained environment.** For example:
+   1. Build the artifacts of a large Java application
+   2. Start a pipeline that starts some 35 Tasks in parallel, each performing Selenium tests against the deployed 
+      application. Each of these tasks consists of:
+      * a Pod with Websphere Liberty server along with the deployed application
+       * an Oracle pod as sidecar
+       * a Pod with a mock server as sidecar
+   3. Do some cleanup stuff 
+   Those 35 parallel TaskRuns cause a huge peak of CPU and memory consumption on the cluster's nodes and may eventually 
+   force Pods of other PipelineRuns to go to status Pending. This is expected behaviour and OK for us. However we had to 
+   increase all timeouts to avoid those pending Pods being killed due to timeout.
+2. Supporting an environment where users may "fire and forget" a number of PipelineRuns or TaskRuns, more than the
+   environment can handle, causing the Runs to be queued such that eventually the user no longer cares about these
+   older queued runs and wants them to naturally expire to make way for newer, more relvant runs
+3. **Using Tekton to get parity with a build system that provides a "pending" or "queued"
+   timeout option.** For example [Google Cloud Build's queueTtl](https://cloud.google.com/build/docs/build-config-file-schema#queuettl)
+   (could be viewed as subset of first use case: where the build system is running in a resource constrained environment)
+
+## Requirements
+
+* Allow this timeout to be specified at runtime and to differ between TaskRuns
+
+#### Boundaries of the timeout
+
+This new timeout would apply from the creation of the `TaskRun` until
+[the Pod condition `PodScheduled`](https://kubernetes.io/docs/concepts/workloads/pods/pod-lifecycle/#pod-conditions) has
+been added to the pod, indicating that a node is available for the pod to run on and it has been assigned to that node.
+
+Assumptions:
+* We don't want to include init container execution time in the timeout
+* We don't want to include image pull time in the timeout
+
+Other options:
+* We could stop the timeout once [the Pod Condition is `Ready` or `Initialized`](https://kubernetes.io/docs/concepts/workloads/pods/pod-lifecycle/#pod-conditions).
+  This would mean the timeout would also include the time for init containers to execute (which Pipelines still uses
+  for some features) as well as the time to pull any images needed by the containers. `Ready` would also include a
+  positive indication from any readiness probes that might have been added to the pod.
+* We could stop the timeout once [the Pod phase is `Running`](https://kubernetes.io/docs/concepts/workloads/pods/pod-lifecycle/#Pod-phase),
+  i.e. no longer `Pending`. This would mean the timeout would include the time to pull any images needed by the containers.
+* We could use the `startTime` of the `TaskRun` as the beginning of the timeout
+  * Con: this means if the `TaskRun` is queued by the controller for processing, it would not timeout. Suggest including
+    this time in the timeout, i.e. treat the controller queue time as part of the overall time to schedule; however it
+    is a bit weird to take this into account for `TaskRuns` but not for `PipelineRuns`
+  * Pro: this would be consistent with how the current timeout logic behaves
+
+## Proposal
+
+We add configuration to both `PipelineRuns` and `TaskRuns` to allow configuring this new timeout for `TaskRuns`. When
+specified, this timeout would apply before any other timeouts are started.
+
+### Notes/Caveats
+
+* `PipelineRuns` can be created in [pending](https://github.com/tektoncd/pipeline/blob/main/docs/pipelineruns.md#pending-pipelineruns)
+  state. This is not a feature of `TaskRuns` _yet_ but [the intial PR description](https://github.com/tektoncd/community/pull/203)
+  talks about adding this feature for `TaskRuns` as well. When that feature exists, we'd need to decide whether a `pending`
+  `TaskRun` is considered started or not. Since this feature was added in order to
+  [allow control over scheduling in clusters under heavy load](https://github.com/tektoncd/community/blob/main/teps/0015-pending-pipeline.md#motivation),
+  which is related to the use cases for this proposal, it probably makes sense to consider `TaskRuns` in this state
+  also not yet scheduled and time them out with this new feature.  Con: The pending feature can be used for other
+  use cases as well which have nothing to do with load.
+* Kubernetes itself doesn't provide timeout like functionality by default; the closest feature is that
+  [failed and succeeded pods will be garbage collected](https://kubernetes.io/docs/concepts/workloads/pods/pod-lifecycle/#pod-garbage-collection)
+  if the number of pods in existence exceeds a threshold.
+
+### Risks and Mitigations
+
+* This may open the door for even more fine-grained timeout requests, for example, not including the time to pull an
+  image in the timeout of the Run execution. This would technically be possible (e.g. use the difference between the
+  [`waiting` and `running` state of each container](https://kubernetes.io/docs/concepts/workloads/pods/pod-lifecycle/#container-state-waiting))
+  but if users want really fine grained control they may be better off implementing this in a separate controller
+  which can observe execution and apply policies. On the other hand, we already have the precedent of the `timeout`
+  section in our PipelineRun spec, maybe adding more configuration here won't be the end of the world.
+  * Mitigation: Consider this the last timeout related feature we add; on the next timeout related feature request
+    we consider building a separate controller to control timeouts. Not sure this is realistic though; maybe in reality
+    we're likely to end up accepting a number of different timeouts to support different scenarios.
+
+### User Experience
+
+* As we add more timeout options it may become more difficult to reason about which timeout is applying when
+* Can't easily adapt this configuration to other requests (e.g. including or not including image pull time in the
+  timeout)
+* At the `PipelineRun` level, this timeout applies to individual `TaskRuns` which is not the case for any other
+  timeouts
+
+### Performance
+
+Additional timeouts to consider will mean additional `TaskRuns` queued for later reconciling within the controller.
+This would mean at most 2x the number of timeouts for each individual `TaskRun`.
+
+## Design Details
+
+This feature would be introduced [at the alpha stability level](https://github.com/tektoncd/pipeline/blob/main/docs/install.md#alpha-features).
+
+### PipelineRun
+
+A new option `taskrun-scheduling` would be added to
+[the existing timeout configuration in PipelineRun](https://github.com/tektoncd/pipeline/blob/main/docs/pipelineruns.md#configuring-a-failure-timeout):
+
+```yaml
+spec:
+  timeouts:
+    pipeline: "0h0m60s"
+    tasks: "0h0m40s"
+    taskrun-scheduling: "5m" # this is the new option
+    finally: "0h0m20s"
+```
+
+When set:
+* This timeout would be applied to all `TaskRuns` created when executing the `Pipeline`
+* The order of timeout application would be:
+  1. At the start of the `Pipeline` execution, the `pipeline` and `tasks` timeout counters would begin
+  2. At the start of each `Task` being executed as a `TaskRun`, the `scheduling` timeout counter would begin
+  3. Once the pod corresponding to the `TaskRun` has been scheduled, if the Pipeline Task configuration includes a
+     timeout for the individual Task, the counter for that timeout would begin
+  4. Once the `finally` Tasks in the `Pipeline` begin executing, the `tasks` timeout counter would no loner be counting,
+     and the `finally` counter would start
+  5. The `scheduling` timeout would apply to the `finally` Tasks in the same way as it is applied to the other tasks
+
+When not set, no scheduling timeout would apply (i.e. there is no scheduling timeout by default).
+
+### TaskRun
+
+[TaskRun still currently only has one timeout field](https://github.com/tektoncd/pipeline/blob/main/docs/taskruns.md#configuring-the-failure-timeout)
+(while [`Pipeline` configuration was recently updated to make a dictionary for different timeout configuration](https://github.com/tektoncd/community/blob/main/teps/0046-finallytask-execution-post-timeout.md#proposal)):
+
+```yaml
+spec:
+  timeout: 60s
+```
+
+To support this new `scheduling` timeout we would follow
+[the same pattern we followed with adding more timeout configuration to PipelineRuns](https://github.com/tektoncd/community/blob/main/teps/0046-finallytask-execution-post-timeout.md)
+and add a dictionary to hold configuration for the existing timeout feature and this new one. The previous `timeout`
+field would be deprecated once this feature is beta.
+
+```yaml
+spec:
+  timeouts:
+    scheduling: "5m" # this is the new option
+    execution: "10m" # this option would have the same functionality as the existing `timeout` field
+```
+
+When set, the order of timeout application would be:
+1. At the creation of the `TaskRun`, the `scheduling` timeout counter would begin
+2. Once the pod executing the `TaskRun` has been scheduled, the `scheduling` timeout would stop and the `execution`
+   timeout would begin 
+
+When the `scheduling` timeout is not set for a `TaskRun`, the `execution` timeout counter would begin as it currently
+does (from the `startTime` of the `TaskRun`).
+
+We can decide to deprecate the existing `timeout` field in favor of the dictionary; if we decide this we should also
+deprecate the `timeout` field in `PipelineRuns`.
+
+### What if the controller is totally overloaded?
+
+If the controller was so overloaded that it had to queue `TaskRuns` to reconcile, the `scheudling` timeout may be
+evaluated after it has already completed. In this case it would expire immediately once the controller was finally
+able to reconcile it.
+
+If we wanted to work around this we'd need to have a separate controller in charge of timeouts (maybe not a bad idea
+overall?) though it could also become overloaded.
+
+### What if the pod gets rescheduled?
+
+If the pod gets rescheduled, the `scheduling` timer would not restart, i.e. it would still be considered already
+scheduled.
+
+(As a side note, we already do not handle this situation well, see [pipelines#2813](https://github.com/tektoncd/pipeline/issues/2813).)
+
+### Syntax options
+
+#### The new option(s)
+
+In the proposal above the new option in `TaskRuns` `timeouts` section is named `scheduling` (and the new field in
+`PipelineRuns` - `taskrun-scheduling` is named after it):
+
+```
+spec:
+  timeouts:
+    scheduling: "5m" # this is the new option
+```
+
+Here is a list of some possible names for this option, comment if you prefer any of these (or have other ideas):
+
+* `pending`
+* `queued`
+* `unscheduled`
+* `pre`
+* `before`
+* `system`
+* `overhead`
+* `setup`
+* `init` (confusing b/c of init containers)
+* `scheduling`
+
+
+In the `PipelineRun` `timeouts` configuration, we preface the option name with `taskruns-` to try to indicate that this
+option - unlike the existing timeout configuration - is applied to _every_ `taskrun`:
+
+```yaml
+spec:
+  timeouts:
+    taskrun-scheduling: "5m" # this is the new option
+```
+
+If we wanted to be even more explicit, we could create a new section to hold timeout configuration that applies to
+every `TaskRun`:
+
+```yaml
+spec:
+  timeouts:
+    taskruns: # this section would indicate that this timeout should be applied to all pipeline tasks
+      scheduling: "5m" # this is the new option
+```
+
+#### The field in TaskRuns that represents the existing timeout
+
+In the proposal above, we replace the existing `timeout` field in `TaskRun` with a dictionary (supporting both at least
+for the moment), and use `execution` as the name of the field that does the same thing as the existing field:
+
+```yaml
+spec:
+  timeouts:
+    scheduling: "5m" # this is the new option
+    execution: "10m" # this option would replace the existing `timeout` field
+```
+
+Here is a list of some possible names for this option, comment if you prefer any of these (or have other ideas):
+
+* `execution`
+* `run`
+* `completion`
+
+## Test Plan
+
+* Create an end to end tests that ultimately creates an unschedulable pod and verify the `scheduling` timeout is enforced
+* Create an end to end test (or reconciler test if you fake the times?) that uses both the `scheduling` timeout and the
+  `execution` timeout, and ultimately the execution times out (e.g. the `Task` just sleeps) - ensure the full time of
+  both timeouts has been counted, e.g. not just the `execution` timeout
+
+## Design Evaluation
+
+[Tekton Design Principles](https://github.com/tektoncd/community/blob/master/design-principles.md):
+* **Reusability**:
+  * Since this is being added as a runtime feature, Pipelines and Tasks will be just as reusable as they were before
+    the feature. If we need to specify these new scheduling timeouts at the level of Tasks within a Pipeline, this would
+    make the Pipeines slightly less reusable since we would be potentially mixing environment specific configuration
+    into the Pipeline definition.
+* **Simplicity**
+  * This does make the timeout logic more complex, and as mentioned in [risks and mitigations](#risks-and-mitigations)
+    this continues the trend of adding more and more complex timeout logic. One upside is that this is only in the
+    runtime configuration and not (yet) the Pipeline or Task authoring time.
+  * This follows the same trend as the recent addition of [finally timeout configuration](https://github.com/tektoncd/community/blob/main/teps/0046-finallytask-execution-post-timeout.md)
+  * As we add more timeout options it may become more difficult to reason about which timeout is applying when
+  * It is debatable that this feature is absolutely necessary; there are use cases to support it however it could
+    potentially be 
+* **Flexibility**
+  * This proposal does not involve bringing in any new dependencies, coupling Pipelines to any new tools,
+    or introducing templating logic
+  * Again implementing this very specific timeout configuration in our API does open the door for adding more of this
+    configuration and the proposal is not very flexible in that it adapting it to new timeout cases will mean adding
+    new config options and explaining in detail how they relate to the existing ones
+* **Conformance**
+  * Likely we would include this timeout configuration in our conformance surface for `PipelineRuns` and `TaskRuns`.
+    If something other than Kubernetes was used to back the API, the implementors could make their own decisions about
+    what `scheduling` means in that environment. The recommendation would be to stop the timer once the execution of the
+    Run is ready to begin.
+
+## Drawbacks
+
+See [risks and mitigations](#risks-and-mitigations).
+
+## Alternatives
+
+* Configure this timeout at the controller level as well
+  * Pro: easy to set and forget in an environment that is always resource constrained
+* Also configure this timeout at the pipeline task level, i.e. allow it to vary between Tasks in a Pipeline
+  * Con: Can't think of use cases that would require this.
+* Configure at the controller level instead (i.e. no API changes, only controller configuration).
+  * Con: harder to configure in response to bursty traffic
+  * Con: tough if sharing an environment and differnet users have totally different expectations or use cases
+  * Con: won't help get parity with [Google Cloud Build's queueTtl](https://cloud.google.com/build/docs/build-config-file-schema#queuettl)
+    hahaha (except through some extreme hacks...)
+* Create a new controller to handle timeouts in general with its own configuration.
+  * Potentially timeout logic could be moved into a separate controller as an implementation detail; but the more
+    pressing question is whether or not this configuration is part of the API. The precedent is for this configuration
+    to be in the API; we could decide to add this new feature as an annotation only but this means timeout config
+    would be spread in 2-3 places (3 including the existing `timeout` field(s))
+  * Con: Would be difficult to provide runtime configuration - maybe via an annotation?
+
+## Infrastructure Needed
+
+No extra infra needed.
+
+## Upgrade & Migration Strategy
+
+* This feature would be introduced [at the alpha stability level](https://github.com/tektoncd/pipeline/blob/main/docs/install.md#alpha-features)
+* The existing `TaskRun` `timeout` option would be deprecated once this feature reaches `beta` stability
+
+## Implementation Pull request(s)
+
+TBD
+
+## References
+
+* [When does a timeout counter actually start?](https://github.com/tektoncd/pipeline/issues/4078)
diff --git a/teps/README.md b/teps/README.md
@@ -224,3 +224,4 @@ This is the complete list of Tekton teps:
 |[TEP-0080](0080-support-domainscoped-parameterresult-names.md) | Support domain-scoped parameter/result names | implemented | 2021-08-19 |
 |[TEP-0081](0081-add-chains-subcommand-to-the-cli.md) | Add Chains sub-command to the CLI | proposed | 2021-08-31 |
 |[TEP-0084](0084-endtoend-provenance-collection.md) | end-to-end provenance collection | proposed | 2021-09-16 |
+|[TEP-0092](0092-scheduling-timeout.md) | Scheduling Timeout | proposed | 2021-10-13 |