From caaf58e17d3fb2a3b1cf180819f6208065f3b86a Mon Sep 17 00:00:00 2001 From: Stephan Behnke Date: Fri, 27 Sep 2024 09:40:21 -0700 Subject: [PATCH] Update Workflow docs editing --- docs/architecture/in-memory-queue.md | 34 +-- docs/architecture/message-protocol.md | 90 +++--- .../architecture/speculative-workflow-task.md | 258 +++++++++--------- 3 files changed, 199 insertions(+), 183 deletions(-) diff --git a/docs/architecture/in-memory-queue.md b/docs/architecture/in-memory-queue.md index 644fcd3d504..0910a7f1f60 100644 --- a/docs/architecture/in-memory-queue.md +++ b/docs/architecture/in-memory-queue.md @@ -1,21 +1,23 @@ -# In-memory timer queue -This queue is similar to normal persisted timer queue, but it exists in memory only and never gets -persisted. It is created with generic `MemoryScheduledQueueFactory`, but currently serves only -[speculative Workflow Task](./speculative-workflow-task.md) timeouts, therefore the only queue this factory creates -is `SpeculativeWorkflowTaskTimeoutQueue` which uses same task executor as normal timer queue: -`TimerQueueActiveTaskExecutor`. +# In-memory Timer Queue -Implementation uses `PriorityQueue` by `VisibilityTimestamp`: a task on top is the task that -executed next. +This queue is similar to the normal persisted timer queue, but it exists only in memory, ie it +never gets persisted. It is created by a generic `MemoryScheduledQueueFactory`, but currently serves +only [speculative Workflow Task](./speculative-workflow-task.md) timeouts. Therefore, the only queue +this factory creates is `SpeculativeWorkflowTaskTimeoutQueue` which uses the same task executor as +the normal timer queue: `TimerQueueActiveTaskExecutor`. -In-memory queue supports only `WorkflowTaskTimeoutTask` and there are two timeout types -enforced by in-memory queue: `SCHEDULED_TO_START` and `START_TO_CLOSE`. +Its implementation uses a `PriorityQueue` sorted by `VisibilityTimestamp`: the task on top is the +task that is executed next. -Executor of `WorkflowTaskTimeoutTask` from in-memory queue is the same as for normal timer queue, -although it does one extra check for speculative Workflow Task. It checks if a task being executed still the same -as stored in mutable state (`CheckSpeculativeWorkflowTaskTimeoutTask`). This is because MS can lose and create -a new speculative Workflow Task, which will be a different Workflow Task and a timeout task must be skipped for it. +The in-memory queue only supports `WorkflowTaskTimeoutTask` and it only enforces +`SCHEDULED_TO_START` and `START_TO_CLOSE`. + +Note that while the in-memory queue's executor of `WorkflowTaskTimeoutTask` is the same as for +the normal timer queue, it does one extra check for speculative Workflow Tasks: +`CheckSpeculativeWorkflowTaskTimeoutTask` checks if a task being executed is still the *same* task +that's stored in mutable state. This is important since the mutable state can lose and create a *new* +speculative Workflow Task, and therefore the old timeout task must be ignored. > #### TODO -> Future refactoring is necessary to make logic (and probably naming) clearer. It is not clear -> if in-memory queue might have other applications besides timeouts for speculative Workflow Tasks. +> Future refactoring is necessary to make the logic (and probably naming) clearer. It is not clear +> yet if the in-memory queue has other applications besides timeouts for speculative Workflow Tasks. diff --git a/docs/architecture/message-protocol.md b/docs/architecture/message-protocol.md index 0d415c47f8f..347eff676e4 100644 --- a/docs/architecture/message-protocol.md +++ b/docs/architecture/message-protocol.md @@ -1,65 +1,67 @@ -# Message protocol +# Message Protocol ## Why it exists -Usually communication between server and worker uses events and commands: events go from server to worker, -worker process them and generates commands that go back to server. Events are attached to Workflow Task, which -worker gets as response to `PollWorkflowTask` API call, and worker sends commands back -when it completes Workflow Task with `RespondWorkflowTaskCompleted` API. Workflow Task works as transport on RPC level. +Usually, communication between the server and the worker uses events and commands: events go from +the server to the worker, the worker processes them and generates commands that go back to server. +The events are attached to the Workflow Task, which the worker receives from the `PollWorkflowTask` +API call, and the worker sends commands back when it completes the Workflow Task with the +`RespondWorkflowTaskCompleted` API. -Unfortunately, this way or communication didn't work for Workflow Update. Server can't use events -to ship Update request to the worker, because worker might reject Update, and it must completely disappear. -Because history is immutable, server can't delete events from it. Initial implementation -was using transient event with Update request, which wasn't written to history. This implementation -was proven to be error-prone and hard to handle on the SDK side. Commands that go back from worker to server -also can't be used for Update because some SDKs assume that every command will produce exactly one event, -which is not true for Update rejections that don't produce any events. +Unfortunately, this protocol didn't work for Workflow Update. The server cannot use events to ship +Update request to the worker because in case the Update is rejected, it must completely disappear. +But because the history is immutable, the server cannot delete any events from it. The initial +implementation was using a transient (not written to history) event instead, but that implementation +proved to be error-prone and hard to handle on the SDK side. Similarly, commands can't be used +for Update either because some SDKs assume that every command will produce *exactly* one event, +which is not true for Update rejections as they don't produce an event. -Another protocol was required to implement Workflow Update. Messages are attached to Workflow Task and go in -both directions, similar to events and commands but don't have limitations listed above. +Another protocol was required to implement Workflow Update: Messages are attached to Workflow Task +and travel in both directions. They are similar to events and commands but don't have the same +limitations listed above. ## `Message` proto message -This might look confusing: ```protobuf message Message {} ``` -but first `message` word refers to protobuf messages and second `Message` is `protocolpb.Message` -data struct used by Temporal. Most fields are self-explanatory, but some fields need explanation. +The first `message` refers to protobuf messages and the second `Message` is `protocolpb.Message` +data struct used by Temporal. ### `protocol_instance_id` -is an identifier of the object which this message belongs to. Because currently messages are used for -Workflow Update only, it is `update_id`. +This field identifies what object this message belongs to. Because currently messages are only used for +Workflow Update, it is the same as `update_id`. > #### TODO -> In the future, signals and queries might use message protocol too. -> In these case `protocol_instance_id` will be `query_id` or `signal_id. +> In the future, signals and queries might use the message protocol, too. +> In that case `protocol_instance_id` would be `query_id` or `signal_id. ### `body` -is intentionally of type `Any` (not `oneof`) to support pluggable interfaces which Temporal server -might not be aware of. +This field is intentionally of type `Any` (not `oneof`) to support pluggable interfaces which the +Temporal server might not be aware of. -### `sequence_id` -Because messages might intersect with events and commands, it is important to specify when -a particular message must be processed. This field can be `event_id`, and it will indicate event -after which message should be processed by worker, or `command_index`, and it will indicate -command after which message should be processed by server. +### `sequencing_id` +Because messages might intersect with events and commands, it is important to specify in what order +messages must be processed. This field can be: +- `event_id`, to indicate the event after which this message should be processed by the worker, or +- `command_index`, to indicate the command after which message should be processed by the server. > #### TODO -> In fact, this is not used. Server always set `event_id` equal to event Id before `WorkflowTaskStartedEvent`, -> which essentially means, that all messages are processed after all events (all SDKs respect this field though). -> This is because buffered events are reordered on server (see `reorderBuffer()` func) and intersection -> with them based on `event_id` is not possible. When reordering is removed, this field can be set to the right value. +> `event_id` is *always* set to the id before the `WorkflowTaskStartedEvent` by the server, +> which means that all messages are processed after all events. This is because buffered events are +> reordered on the server (see `reorderBuffer()` func) and intersecting them based on `event_id` +> is not possible. When reordering is removed, this field can be set to the right value. + +> #### TODO +> `command_index` is not used because SDKs use a different approach: a special command of type +> `COMMAND_TYPE_PROTOCOL_MESSAGE` is added to a command list to indicate the place where a message +> must be processed. This command has only `message_id` fields which point to a particular message. +> +> When the Update is rejected, `COMMAND_TYPE_PROTOCOL_MESSAGE` is *not* added to the list of commands, +> though, because of the aforementioned limitation of requiring each command to produce an event. +> The server will assume that any message that wasn't mentioned in a `COMMAND_TYPE_PROTOCOL_MESSAGE` +> command is rejected. Those messages will be processed after all commands were processed first, +> in the order they arrived in. > -> `command_index` is not used because SDKs use different approach: special command of type `COMMAND_TYPE_PROTOCOL_MESSAGE` -> is added to a command list to indicate place where a message must be processed. This command has only `message_id` fields -> which points to a particular message. This is done this way because of limitation described above: -> all commands must produce exactly one event and vice versa. Because Update rejection messages -> doesn't produce events at all, `COMMAND_TYPE_PROTOCOL_MESSAGE` is not added to the list of commands for Update rejections. -> Once processed, a message is removed from the list of messages to process. Therefore, -> all Update rejections, as well as messages which don't have `COMMAND_TYPE_PROTOCOL_MESSAGE` command -> are processed last (because messages are processed after commands). -> Server doesn't require `COMMAND_TYPE_PROTOCOL_MESSAGE` command, and if it is not present, all messages -> will be processed after all commands in the order they arrive. -> When 1:1 limitation is removed, `command_index` might be used. +> When the 1:1 limitation between commands and events is removed, `command_index` can be used. > #### NOTE -> All SDKs process all queries last (after events and messages). +> All SDKs process all queries *last* (i.e. after events and messages). diff --git a/docs/architecture/speculative-workflow-task.md b/docs/architecture/speculative-workflow-task.md index 8b2c481b3e5..a7fdd75de02 100644 --- a/docs/architecture/speculative-workflow-task.md +++ b/docs/architecture/speculative-workflow-task.md @@ -6,157 +6,169 @@ There are three types of Workflow Task: 2. Transient 3. Speculative -Every Workflow Task ships history to the worker. Last events must be `WorkflowTaskScheduled` -and `WorkflowTaskStarted`. There might be some events in between if they come after Workflow Task -was scheduled but not started (e.g., Workflow worker was down and didn't poll for Workflow Task). - -**Normal Workflow Task** is created by server every time when server needs Workflow to make progress. -If Workflow Task fails (worker responds with call to `RespondWorkflowTaskFailed` or an error occurred while -processing `RespondWorkflowTaskCompleted`) or times out (worker is disconnected), -server writes corresponding Workflow Task failed event in the history and increase attempt count -in the mutable state. For the next attempt, Workflow Task events (`WorkflowTaskScheduled` -and `WorkflowTaskStarted`) are not written into the history but attached to the response of -`RecordWorkflowTaskStarted` API. These are transient Workflow Task events. Worker is not aware of any "transience" -of these events. If Workflow Task keeps failing, attempt counter is getting increased in mutable state, -but no new fail events are written into the history and new transient Workflow Task events are just recreated. -Workflow Task, which has transient Workflow Task events, is called **transient Workflow Task**. -When Workflow Task finally completes, `WorkflowTaskScheduled` and `WorkflowTaskStarted` events -are getting written to history followed by `WorklfowTaskCompleted` event. +Every Workflow Task ships history events to the worker. It must always contain the two events +`WorkflowTaskScheduled` and `WorkflowTaskStarted`. There might be some events in-between them if +they came in after the Workflow Task was scheduled but not yet started (e.g. when the Workflow worker +was down and didn't poll for Workflow Task). + +A **normal Workflow Task** is created by the server when it needs a Workflow to make progress. If +the Workflow Task fails (i.e. worker responds with a call to `RespondWorkflowTaskFailed` or an error +occurred while processing `RespondWorkflowTaskCompleted`) or times out (e.g. worker is disconnected), +the server writes a corresponding Workflow Task failed event to the history and increases the +attempt count in the mutable state. + +For the next attempt, a **transient Workflow Task** is used: the Workflow Task events +`WorkflowTaskScheduled` and `WorkflowTaskStarted` are *not* written to the history, but attached to +the response from the `RecordWorkflowTaskStarted` API. The Worker does not know they are transient, +though. If the Workflow Task keeps failing, the attempt counter is increased in the mutable state - +but no new fail events are written into the history and transient Workflow Task events are created +again. When the Workflow Task finally completes, the `WorkflowTaskScheduled` and `WorkflowTaskStarted` +events are written to the history, followed by the `WorklfowTaskCompleted` event. > #### TODO -> Although `WorkflowTaskInfo` struct has `Type` field, `WORKFLOW_TASK_TYPE_TRANSIENT` value is currently -> not used and `ms.IsTransientWorkflowTask()` method checks if attempts count > 1. - -**Speculative WT** is similar to transient WT: it creates `WorkflowTaskScheduled` -and `WorkflowTaskStarted` events only in response of `RecordWorkflowTaskStarted` API, -but it does it from the very first attempt. Also after speculative Workflow Task is scheduled and mutable state is updated -it is not written to the database. `WorkflowTaskInfo.Type` field is assigned to `WORKFLOW_TASK_TYPE_SPECULATIVE` value. -Essentially speculative Workflow Task can exist in memory only, and it never gets written to the database. -Similar to CPU *speculative execution* (which gives a speculative Workflow Task its name) where branch execution -can be thrown away, a speculative Workflow Task can be dropped as it never existed. -Overall logic, is to try to do the best and allow speculative Workflow Task to go through, but if anything -goes wrong, quickly give up, convert speculative Workflow Task to normal and follow normal procedures. - -Zero database writes also means that transfer and regular timer tasks can't be used for -speculative Workflow Tasks. Special [in-memory-queue](./in-memory-queue.md) is used for force speculative Workflow Task timeouts. +> Although the `WorkflowTaskInfo` struct has a `Type` field, `WORKFLOW_TASK_TYPE_TRANSIENT` value is +> currently not used. Instead, `ms.IsTransientWorkflowTask()` checks if the attempts count > 1. + +**Speculative Workflow Task** is similar to the transient one in that it attaches the +`WorkflowTaskScheduled` and `WorkflowTaskStarted` events to the response from the +`RecordWorkflowTaskStarted` API. But there are some differences: +- it happens on the first attempt already +- it is *never* written to the database +- it is scheduled differently (see more details below) +- its `WorkflowTaskInfo.Type` is `WORKFLOW_TASK_TYPE_SPECULATIVE` + +Similar to a CPU's *speculative execution* (which gives the Workflow Task its name) where a branch +execution can be thrown away, a speculative Workflow Task can be dropped as if it never existed. +The overall strategy is to optimistically assume the speculative Workflow Task will go through, but +if anything goes wrong, give up quickly and convert the speculative Workflow Task to a normal one. + +Zero database writes also means that transfer and regular timer tasks can't be used here. Instead, +a special [in-memory-queue](./in-memory-queue.md) is used for speculative Workflow Task timeouts. > #### TODO -> It is important to point out that `WorkflowTaskScheduled` and `WorkflowTaskStarted` events for transient -> and speculative Workflow Task are added to `PollWorkflowTask` response only but not to `GetWorkflowExecutionHistory` response. -> This has unpleasant consequence: when worker receives speculative Workflow Task on sticky task queue, but -> Workflow is already evicted from cache, it sends a request to `GetWorkflowExecutionHistory`, which -> returns history without speculative events, which leads to `premature end of stream` error on worker side. -> It fails Workflow Task, clears stickiness, and everything works fine after that, but one extra failed Workflow Task appears -> in the history. Fortunately, it doesn't happen often. +> It is important to point out that the `WorkflowTaskScheduled` and `WorkflowTaskStarted` events +> for transient and speculative Workflow Task are only added to `PollWorkflowTask` response - and +> not to the `GetWorkflowExecutionHistory` response. This has an unpleasant consequence: when the +> worker receives a speculative Workflow Task on a sticky task queue, but the Workflow is already +> evicted from its cache, it issues a `GetWorkflowExecutionHistory` request, which returns the +> history *without* speculative events. This leads to a `premature end of stream` error on the +> worker side. The worker fails the Workflow Task, clears stickiness, and everything works fine +> after that - but a failed Workflow Task appears in the history. Fortunately, it doesn't happen often. ## Speculative Workflow Task & Workflow Update -Speculative Workflow Task was introduced to support zero writes for Workflow Update, this is why it doesn't write -nor events, neither mutable state. +Speculative Workflow Task was introduced to make it possible for Workflow Update to have zero writes +for when it is rejected. This is why it doesn't persist any events or the mutable state. > #### TODO -> Another application can be a replacement for query task that will unify two different -> code paths (by introducing 3rd one and slowly deprecating existing two). +> The task processig for Queries could be replaced by using speculative Workflow Tasks under the hood. ## Scheduling of Speculative Workflow Task -Because currently speculative Workflow Task is used for Workflow Update only it is created in -`UpdateWorkflowExecution` API handler only. And because a normal transfer task can't be created -(speculative Workflow Task doesn't write to the database) it is directly added to matching service -with a call to `AddWorkflowTask` API. It is crucial to notice that Workflow lock -must be released before this call because in case of sync match, matching service will -do a callback to history service to start Workflow Task. This call will also try to acquire Workflow lock. - -But call to matching can fail (for various reasons), and then this error can't be properly handled -outside of Workflow lock or returned to the user. Instead, an in-memory timer task -is created for `SCHEDULED_TO_START` timeout for speculative Workflow Task even if it is on normal task queue -(for normal Workflow Task `SCHEDULED_TO_START` timeout timer is created only for sticky task queue). -If call to matching failed, `UpdateWorkflowExecution` API caller will observe short delay -(`SpeculativeWorkflowTaskScheduleToStartTimeout` = 5s), but underneath timeout timer will fire, -convert speculative Workflow Task to normal and create a transfer task which will eventually push it through. +As of today, speculative Workflow Tasks are only used for Workflow Update, i.e. in the +`UpdateWorkflowExecution` API handler. Since a normal transfer task can't be created (because that +would require a database write), it is added directly to the Matching service with a call to the +`AddWorkflowTask` API. + +It is crucial to note that Workflow lock *must* be released before this call because in case of +sync match, the Matching service will make a call to the history service to start the Workflow Task, +which will attempt to get the Workflow lock and result in a deadlock. + +However, when the call to the matching service fails (e.g. due to networking issues), that error +can't be properly handled outside of the Workflow lock, or returned to the user. In that case, the +`UpdateWorkflowExecution` API caller will observe a short delay +(`SpeculativeWorkflowTaskScheduleToStartTimeout` is 5s) until the timeout timer task fires, then +the speculative Workflow Task is converted to a normal one and creates a transfer task which will +eventually reach matching and the worker. + +The timeout timer task is is created for a `SCHEDULED_TO_START` timeout for every speculative +Workflow Task - even if it is on normal task queue. In comparision, for a normal Workflow Task, the +`SCHEDULED_TO_START` timeout timer is only created for sticky task queues. ## Start of Speculative Workflow Task -Speculative Workflow Task's `WorkflowTaskScheduled` and `WorkflowTaskStarted` events are shipped on -`TransientWorkflowTask` field of `RecordWorkflowTaskStartedResponse` and merged to the history -before shipping to worker. This code path is not different from transient Workflow Task. +Speculative Workflow Task's `WorkflowTaskScheduled` and `WorkflowTaskStarted` events are shipped +inside the `TransientWorkflowTask` field of `RecordWorkflowTaskStartedResponse` and are merged to +the history before it is shipped to the worker. It's the same code path as for the transient +Workflow Task. ## Completion of Speculative Workflow Task -### StartTime in the Token -Because a server can lose a speculative Workflow Task, it will not always be completed. Moreover, new -speculative Workflow Task can be created after the first one is lost, and then worker will try to complete the first one. -To prevent this `StartedTime` was added to Workflow Task token and if it doesn't match to start time in mutable state, -Workflow Task can't be completed. All other checks aren't necessary anymore, but left there just in case. + +### `StartTime` in the Token +Because the server can lose a speculative Workflow Task, it will not always be completed. Moreover, +a new speculative Workflow Task can be created after the first one is lost, but the worker will +try to complete the first one. To prevent this, `StartedTime` was added to the Workflow Task token +and if it doesn't match the start time in mutable state, the Workflow Task can't be completed. ### Persist or Drop -While completing speculative Workflow Task server makes a decision: write speculative events followed by -`WorkflowTaskCompleted` event or drop speculative events and make speculative Workflow Task disappear. -Server can drop events only if it knows that this Workflow Task didn't change the Workflow state. Currently, -conditions are (check `skipWorkflowTaskCompletedEvent()` func): +While completing a speculative Workflow Task, the server makes a decision to either write the +speculative events followed by a `WorkflowTaskCompleted` event - or drop the speculative events and +make the speculative Workflow Task disappear. The latter can only happen, if the server knows that +this Workflow Task didn't change the Workflow state. Currently, the conditions are +(check `skipWorkflowTaskCompletedEvent()` func): - response doesn't have any commands, - response has only Update rejection messages. > #### TODO -> There is one more condition that forces speculative Workflow Task to be persisted: if there are -> events in the history prior to speculative Workflow Task, which were shipped by this speculative Workflow Task -> to the worker, then it can't be dropped. This is because an old version of some SDKs didn't -> support getting same events twice which would happen when the server drops one speculative Workflow Task, -> and then creates another with the same events. Now old SDKs support it, and with some -> compatibility flag, this condition can be omitted. - -When a server decides to drop a speculative Workflow Task, it needs to communicate this decision to SDK. -Because SDK needs to know where to roll back its history event pointer, i.e., after what event, -all other events need to be dropped. SDK uses `ResetHistoryEventId` field on `RespondWorkflowTaskCompletedRespose`. -Server set it to `LastCompletedWorkflowTaskStartedEventId` field value because -SDK uses `WorkflowTaskStartedEventID` as history checkpoint. +> There is one more special case: when the speculative Workflow Task contained other events +> (e.g. activity scheduled), then it can't be dropped because they would need to be sent again in the +> next Workflow Task, but older SDK versions don't support receiving the same events twice. A +> compatibility flag is needed to safely allow SDKs to opt-in to this optimization. -### Heartbeat -Workflow Task can heartbeat. If worker completes Workflow Task with `ForceCreateNewWorkflowTask` is set to `true` -then server will create new Workflow Task even there are no new events. -Because currently speculative Workflow Task is used for Workflow Update only, it is very unlikely -that speculative Workflow Task can be completed as a heartbeat. Update validation logic -is supposed to be quick and worker should respond with a rejection or acceptance message. -But if it happens, server will persist all speculative Workflow Task events and create new Workflow Task as normal. +When the server decides to drop a speculative Workflow Task, it needs to communicate this decision to +the worker - because the SDK needs to roll back to a previous history event and drop all events after +that one. To do that, the server will set the `ResetHistoryEventId` field on the +`RespondWorkflowTaskCompletedResponse` to the mutable state's `LastCompletedWorkflowTaskStartedEventId` +(since the SDK uses `WorkflowTaskStartedEventID` as its history checkpoint). -> #### TODO -> This is just a design decision, which can be changed later. Server can drop speculative Workflow Task -> when it heartbeats and create new one as speculative too. No new events will be added to the -> history which will save history events but also will decrease visibility of heartbeats. +### Heartbeat +Workflow Tasks can heartbeat: when the worker completes a Workflow Task with `ForceCreateNewWorkflowTask` +set to `true`, the server will create a new Workflow Task even if there are no new events. Currently, +since speculative Workflow Tasks are only used for Workflow Update, it is very unlikely to occur here. +The Update validation logic is supposed to be quick, and the worker is expected to respond with a +rejection or acceptance message. If it does happen, the server will persist all speculative Workflow Task +events and create a new Workflow Task as normal. + +> #### NOTE +> This is a design decision, which could be changed later: instead, the server could drop the +> speculative Workflow Task when it heartbeats and create a new speculative Workflow Task. No +> new events would be added to the history - but heartbeats would not be visible anymore. ## Conversion to Normal Workflow Task -Speculative Workflow Task is never written to the database. If, while speculative Workflow Task is executed, -something triggers mutable state write (i.e., new events come in), then speculative Workflow Task is converted -to normal, and then written to the database. `Type` field value is changed to `WORKFLOW_TASK_TYPE_NORMAL`, -in-memory timer is replaced with normal persisted timer, and corresponding speculative Workflow Task -`WorkflowTaskScheduled` and `WorkflowTaskStarted` events are written to the history -(`convertSpeculativeWorkflowTaskToNormal()` func). +Speculative Workflow Tasks are never written to the database. However, if during the exection of a +speculative Workflow Task, a mutable state write is required (i.e., a new events comes in), then the +speculative Workflow Task is converted to a normal one, and then written to the database. This means +the `Type` field value is changed to `WORKFLOW_TASK_TYPE_NORMAL`, an in-memory timer is replaced with +a persisted timer, and the corresponding speculative Workflow Task `WorkflowTaskScheduled` and +`WorkflowTaskStarted` events are written to the history (`convertSpeculativeWorkflowTaskToNormal()` +func). ## Failure of Speculative Workflow Task -Workflow Task failure indicates a bug in the Workflow code, SDK, or server. -There are two major cases when a Workflow Task fails: -1. Worker explicitly calls `RespondWorkflowTaskFailed` API, -2. Worker calls `RespondWorkflowTaskCompleted` API, but there was error in request or while processing the request. - -When speculative Workflow Task is failing `WorkflowTaskFailed` event is written to the history (followed by -`WorkflowTaskScheduled` and `WorkflowTaskStarted` events) because Workflow Task failure needs to be visible -to Workflow author. +A Workflow Task failure indicates a bug in the Workflow code, SDK, or server. The most common scenarios are: +1. Worker calls `RespondWorkflowTaskFailed` API +2. Worker calls `RespondWorkflowTaskCompleted` API, but there is an error in the request or + while processing the request. -Speculative Workflow Task is retired the same way as normal Workflow Task, which means that it becomes a transitive Workflow Task: -2nd failed attempt is not written to the history. +When a speculative Workflow Task is failing, a `WorkflowTaskFailed` event is written to the history +(followed by `WorkflowTaskScheduled` and `WorkflowTaskStarted` events) because a Workflow Task +failure must be visible to the Workflow author. Then, it is retried the same way a normal +Workflow Task is: it becomes a transitive Workflow Task. ## Speculative Workflow Task Timeout -Speculative Workflow Task timeouts are enforced with special [in-memory timer queue](./in-memory-queue.md). -Unlike for normal Workflow Task `SCHEDULE_TO_START` timeout timer is created if speculative Workflow Task -is scheduled on both sticky and **normal** task queue. `START_TO_CLOSE` timer is created - when a Workflow Task is started. There is only one timer exists for speculative Workflow Task at any given time. -Pointer to that timer is stored inside mutable state because timer needs to be canceled -when Workflow Task completes or fails: speculative Workflow Task timer can't be identified by `ScheduledEventID` -because there might be another speculative Workflow Task with the same `ScheduledEventID`, and if not canceled -can times out wrong Workflow Task. - -The behaviour in timeout handler is similar to when a Workflow Task fails. -First `WorkflowTaskScheduled` and `WorkflowTaskStarted` events are written to the history -(because they were not written for speculative Workflow Task), then `WorkflowTaskTimeout` event. -New Workflow Task is scheduled as normal, but because attempt count is increased, -it automatically becomes a transient Workflow Task. +Speculative Workflow Task timeouts are enforced with a special [in-memory timer queue](./in-memory-queue.md). + +A `SCHEDULE_TO_START` timeout timer is always created, regardless of whether a sticky or normal +task queues is used. A normal Workflow Task will usually only do that for a sticky task queue. + +A `START_TO_CLOSE` timeout timer is created when a speculative Workflow Task is started. There is +only one active timer at any given time. The timer needs to be canceled when the Workflow Task +completes or fails, but it cannot be identified by its `ScheduledEventID` because there might be +another speculative Workflow Task with the same `ScheduledEventID`, and it could time out the +wrong Workflow Task. Therefore, a pointer to that timer is stored inside the mutable state. + +The behaviour in the timeout handler is similar to when a Workflow Task fails: first the +`WorkflowTaskScheduled` and `WorkflowTaskStarted` events are written to the history (because they +were not written for speculative Workflow Task) and then the `WorkflowTaskTimeout` event. The new +Workflow Task is scheduled as normal, but because the attempt count is increased, it automatically +becomes a transient Workflow Task. ## Replication of Speculative Workflow Task -Speculative Workflow Task is not replicated. Worker can try to complete it in new cluster, and server -will return `NotFound` error. +Speculative Workflow Tasks are not replicated. The worker can try to complete it in new cluster, +and the server will return a `NotFound` error.