From caaf58e17d3fb2a3b1cf180819f6208065f3b86a Mon Sep 17 00:00:00 2001
From: Stephan Behnke <stephan.behnke@temporal.io>
Date: Fri, 27 Sep 2024 09:40:21 -0700
Subject: [PATCH] Update Workflow docs editing

---
 docs/architecture/in-memory-queue.md          |  34 +--
 docs/architecture/message-protocol.md         |  90 +++---
 .../architecture/speculative-workflow-task.md | 258 +++++++++---------
 3 files changed, 199 insertions(+), 183 deletions(-)

diff --git a/docs/architecture/in-memory-queue.md b/docs/architecture/in-memory-queue.md
index 644fcd3d504..0910a7f1f60 100644
--- a/docs/architecture/in-memory-queue.md
+++ b/docs/architecture/in-memory-queue.md
@@ -1,21 +1,23 @@
-# In-memory timer queue
-This queue is similar to normal persisted timer queue, but it exists in memory only and never gets  
-persisted. It is created with generic `MemoryScheduledQueueFactory`, but currently serves only
-[speculative Workflow Task](./speculative-workflow-task.md) timeouts, therefore the only queue this factory creates
-is `SpeculativeWorkflowTaskTimeoutQueue` which uses same task executor as normal timer queue:
-`TimerQueueActiveTaskExecutor`.
+# In-memory Timer Queue
 
-Implementation uses `PriorityQueue` by `VisibilityTimestamp`: a task on top is the task that
-executed next.
+This queue is similar to the normal persisted timer queue, but it exists only in memory, ie it 
+never gets persisted. It is created by a generic `MemoryScheduledQueueFactory`, but currently serves
+only [speculative Workflow Task](./speculative-workflow-task.md) timeouts. Therefore, the only queue
+this factory creates is `SpeculativeWorkflowTaskTimeoutQueue` which uses the same task executor as
+the normal timer queue: `TimerQueueActiveTaskExecutor`.
 
-In-memory queue supports only `WorkflowTaskTimeoutTask` and there are two timeout types
-enforced by in-memory queue: `SCHEDULED_TO_START` and `START_TO_CLOSE`.
+Its implementation uses a `PriorityQueue` sorted by `VisibilityTimestamp`: the task on top is the
+task that is executed next.
 
-Executor of `WorkflowTaskTimeoutTask` from in-memory queue is the same as for normal timer queue,
-although it does one extra check for speculative Workflow Task. It checks if a task being executed still the same
-as stored in mutable state (`CheckSpeculativeWorkflowTaskTimeoutTask`). This is because MS can lose and create
-a new speculative Workflow Task, which will be a different Workflow Task and a timeout task must be skipped for it. 
+The in-memory queue only supports `WorkflowTaskTimeoutTask` and it only enforces
+`SCHEDULED_TO_START` and `START_TO_CLOSE`.
+
+Note that while the in-memory queue's executor of `WorkflowTaskTimeoutTask` is the same as for
+the normal timer queue, it does one extra check for speculative Workflow Tasks:
+`CheckSpeculativeWorkflowTaskTimeoutTask` checks if a task being executed is still the *same* task
+that's stored in mutable state. This is important since the mutable state can lose and create a *new*
+speculative Workflow Task, and therefore the old timeout task must be ignored. 
 
 > #### TODO
-> Future refactoring is necessary to make logic (and probably naming) clearer. It is not clear
-> if in-memory queue might have other applications besides timeouts for speculative Workflow Tasks.
+> Future refactoring is necessary to make the logic (and probably naming) clearer. It is not clear
+> yet if the in-memory queue has other applications besides timeouts for speculative Workflow Tasks.
diff --git a/docs/architecture/message-protocol.md b/docs/architecture/message-protocol.md
index 0d415c47f8f..347eff676e4 100644
--- a/docs/architecture/message-protocol.md
+++ b/docs/architecture/message-protocol.md
@@ -1,65 +1,67 @@
-# Message protocol
+# Message Protocol
 
 ## Why it exists
-Usually communication between server and worker uses events and commands: events go from server to worker,
-worker process them and generates commands that go back to server. Events are attached to Workflow Task, which
-worker gets as response to `PollWorkflowTask` API call, and worker sends commands back
-when it completes Workflow Task with `RespondWorkflowTaskCompleted` API. Workflow Task works as transport on RPC level.
+Usually, communication between the server and the worker uses events and commands: events go from
+the server to the worker, the worker processes them and generates commands that go back to server.
+The events are attached to the Workflow Task, which the worker receives from the `PollWorkflowTask`
+API call, and the worker sends commands back when it completes the Workflow Task with the 
+`RespondWorkflowTaskCompleted` API.
 
-Unfortunately, this way or communication didn't work for Workflow Update. Server can't use events
-to ship Update request to the worker, because worker might reject Update, and it must completely disappear.
-Because history is immutable, server can't delete events from it. Initial implementation
-was using transient event with Update request, which wasn't written to history. This implementation
-was proven to be error-prone and hard to handle on the SDK side. Commands that go back from worker to server
-also can't be used for Update because some SDKs assume that every command will produce exactly one event, 
-which is not true for Update rejections that don't produce any events.
+Unfortunately, this protocol didn't work for Workflow Update. The server cannot use events to ship
+Update request to the worker because in case the Update is rejected, it must completely disappear.
+But because the history is immutable, the server cannot delete any events from it. The initial
+implementation was using a transient (not written to history) event instead, but that implementation
+proved to be error-prone and hard to handle on the SDK side. Similarly, commands can't be used
+for Update either because some SDKs assume that every command will produce *exactly* one event, 
+which is not true for Update rejections as they don't produce an event.
 
-Another protocol was required to implement Workflow Update. Messages are attached to Workflow Task and go in
-both directions, similar to events and commands but don't have limitations listed above.
+Another protocol was required to implement Workflow Update: Messages are attached to Workflow Task
+and travel in both directions. They are similar to events and commands but don't have the same 
+limitations listed above.
 
 ## `Message` proto message
-This might look confusing:
 ```protobuf
 message Message {}
 ```
-but first `message` word refers to protobuf messages and second `Message` is `protocolpb.Message`
-data struct used by Temporal. Most fields are self-explanatory, but some fields need explanation.
+The first `message` refers to protobuf messages and the second `Message` is `protocolpb.Message`
+data struct used by Temporal.
 
 ### `protocol_instance_id`
-is an identifier of the object which this message belongs to. Because currently messages are used for
-Workflow Update only, it is `update_id`.
+This field identifies what object this message belongs to. Because currently messages are only used for
+Workflow Update, it is the same as `update_id`.
 
 > #### TODO
-> In the future, signals and queries might use message protocol too.
-> In these case `protocol_instance_id` will be `query_id` or `signal_id.
+> In the future, signals and queries might use the message protocol, too.
+> In that case `protocol_instance_id` would be `query_id` or `signal_id.
 
 ### `body`
-is intentionally of type `Any` (not `oneof`) to support pluggable interfaces which Temporal server
-might not be aware of.
+This field is intentionally of type `Any` (not `oneof`) to support pluggable interfaces which the 
+Temporal server might not be aware of.
 
-### `sequence_id`
-Because messages might intersect with events and commands, it is important to specify when
-a particular message must be processed. This field can be `event_id`, and it will indicate event
-after which message should be processed by worker, or `command_index`, and it will indicate
-command after which message should be processed by server.
+### `sequencing_id`
+Because messages might intersect with events and commands, it is important to specify in what order
+messages must be processed. This field can be:
+- `event_id`, to indicate the event after which this message should be processed by the worker, or
+- `command_index`, to indicate the command after which message should be processed by the server.
 
 > #### TODO
-> In fact, this is not used. Server always set `event_id` equal to event Id before `WorkflowTaskStartedEvent`,
-> which essentially means, that all messages are processed after all events (all SDKs respect this field though).
-> This is because buffered events are reordered on server (see `reorderBuffer()` func) and intersection 
-> with them based on `event_id` is not possible. When reordering is removed, this field can be set to the right value.
+> `event_id` is *always* set to the id before the `WorkflowTaskStartedEvent` by the server,
+> which means that all messages are processed after all events. This is because buffered events are
+> reordered on the server (see `reorderBuffer()` func) and intersecting them based on `event_id`
+> is not possible. When reordering is removed, this field can be set to the right value.
+
+> #### TODO
+> `command_index` is not used because SDKs use a different approach: a special command of type
+> `COMMAND_TYPE_PROTOCOL_MESSAGE` is added to a command list to indicate the place where a message
+> must be processed. This command has only `message_id` fields which point to a particular message.
+>
+> When the Update is rejected, `COMMAND_TYPE_PROTOCOL_MESSAGE` is *not* added to the list of commands,
+> though, because of the aforementioned limitation of requiring each command to produce an event.
+> The server will assume that any message that wasn't mentioned in a `COMMAND_TYPE_PROTOCOL_MESSAGE`
+> command is rejected. Those messages will be processed after all commands were processed first,
+> in the order they arrived in.
 > 
-> `command_index` is not used because SDKs use different approach: special command of type `COMMAND_TYPE_PROTOCOL_MESSAGE`
-> is added to a command list to indicate place where a message must be processed. This command has only `message_id` fields
-> which points to a particular message. This is done this way because of limitation described above:
-> all commands must produce exactly one event and vice versa. Because Update rejection messages 
-> doesn't produce events at all, `COMMAND_TYPE_PROTOCOL_MESSAGE` is not added to the list of commands for Update rejections.
-> Once processed, a message is removed from the list of messages to process. Therefore,
-> all Update rejections, as well as messages which don't have `COMMAND_TYPE_PROTOCOL_MESSAGE` command
-> are processed last (because messages are processed after commands).
-> Server doesn't require `COMMAND_TYPE_PROTOCOL_MESSAGE` command, and if it is not present, all messages
-> will be processed after all commands in the order they arrive.
-> When 1:1 limitation is removed, `command_index` might be used.
+> When the 1:1 limitation between commands and events is removed, `command_index` can be used.
 
 > #### NOTE
-> All SDKs process all queries last (after events and messages).
+> All SDKs process all queries *last* (i.e. after events and messages).
diff --git a/docs/architecture/speculative-workflow-task.md b/docs/architecture/speculative-workflow-task.md
index 8b2c481b3e5..a7fdd75de02 100644
--- a/docs/architecture/speculative-workflow-task.md
+++ b/docs/architecture/speculative-workflow-task.md
@@ -6,157 +6,169 @@ There are three types of Workflow Task:
   2. Transient
   3. Speculative
 
-Every Workflow Task ships history to the worker. Last events must be `WorkflowTaskScheduled`
-and `WorkflowTaskStarted`. There might be some events in between if they come after Workflow Task
-was scheduled but not started (e.g., Workflow worker was down and didn't poll for Workflow Task).
-
-**Normal Workflow Task** is created by server every time when server needs Workflow to make progress.
-If Workflow Task fails (worker responds with call to `RespondWorkflowTaskFailed` or an error occurred while
-processing `RespondWorkflowTaskCompleted`) or times out (worker is disconnected),
-server writes corresponding Workflow Task failed event in the history and increase attempt count
-in the mutable state. For the next attempt, Workflow Task events (`WorkflowTaskScheduled`
-and `WorkflowTaskStarted`) are not written into the history but attached to the response of
-`RecordWorkflowTaskStarted` API. These are transient Workflow Task events. Worker is not aware of any "transience"
-of these events. If Workflow Task keeps failing, attempt counter is getting increased in mutable state,
-but no new fail events are written into the history and new transient Workflow Task events are just recreated.
-Workflow Task, which has transient Workflow Task events, is called **transient Workflow Task**.
-When Workflow Task finally completes, `WorkflowTaskScheduled` and `WorkflowTaskStarted` events
-are getting written to history followed by `WorklfowTaskCompleted` event.
+Every Workflow Task ships history events to the worker. It must always contain the two events
+`WorkflowTaskScheduled` and `WorkflowTaskStarted`. There might be some events in-between them if
+they came in after the Workflow Task was scheduled but not yet started (e.g. when the Workflow worker
+was down and didn't poll for Workflow Task).
+
+A **normal Workflow Task** is created by the server when it needs a Workflow to make progress. If 
+the Workflow Task fails (i.e. worker responds with a call to `RespondWorkflowTaskFailed` or an error
+occurred while processing `RespondWorkflowTaskCompleted`) or times out (e.g. worker is disconnected),
+the server writes a corresponding Workflow Task failed event to the history and increases the
+attempt count in the mutable state.
+
+For the next attempt, a **transient Workflow Task** is used: the Workflow Task events
+`WorkflowTaskScheduled` and `WorkflowTaskStarted` are *not* written to the history, but attached to
+the response from the `RecordWorkflowTaskStarted` API. The Worker does not know they are transient,
+though. If the Workflow Task keeps failing, the attempt counter is increased in the mutable state - 
+but no new fail events are written into the history and transient Workflow Task events are created
+again. When the Workflow Task finally completes, the `WorkflowTaskScheduled` and `WorkflowTaskStarted`
+events are written to the history, followed by the `WorklfowTaskCompleted` event.
 
 > #### TODO
-> Although `WorkflowTaskInfo` struct has `Type` field, `WORKFLOW_TASK_TYPE_TRANSIENT` value is currently
-> not used and `ms.IsTransientWorkflowTask()` method checks if attempts count > 1. 
-
-**Speculative WT** is similar to transient WT: it creates `WorkflowTaskScheduled`
-and `WorkflowTaskStarted` events only in response of `RecordWorkflowTaskStarted` API,
-but it does it from the very first attempt. Also after speculative Workflow Task is scheduled and mutable state is updated
-it is not written to the database. `WorkflowTaskInfo.Type` field is assigned to `WORKFLOW_TASK_TYPE_SPECULATIVE` value. 
-Essentially speculative Workflow Task can exist in memory only, and it never gets written to the database.
-Similar to CPU *speculative execution* (which gives a speculative Workflow Task its name) where branch execution 
-can be thrown away, a speculative Workflow Task can be dropped as it never existed.
-Overall logic, is to try to do the best and allow speculative Workflow Task to go through, but if anything
-goes wrong, quickly give up, convert speculative Workflow Task to normal and follow normal procedures.
-
-Zero database writes also means that transfer and regular timer tasks can't be used for
-speculative Workflow Tasks. Special [in-memory-queue](./in-memory-queue.md) is used for force speculative Workflow Task timeouts.
+> Although the `WorkflowTaskInfo` struct has a `Type` field, `WORKFLOW_TASK_TYPE_TRANSIENT` value is
+> currently not used. Instead, `ms.IsTransientWorkflowTask()` checks if the attempts count > 1.
+
+**Speculative Workflow Task** is similar to the transient one in that it attaches the
+`WorkflowTaskScheduled` and `WorkflowTaskStarted` events to the response from the
+`RecordWorkflowTaskStarted` API. But there are some differences:
+- it happens on the first attempt already
+- it is *never* written to the database
+- it is scheduled differently (see more details below)
+- its `WorkflowTaskInfo.Type` is `WORKFLOW_TASK_TYPE_SPECULATIVE`
+
+Similar to a CPU's *speculative execution* (which gives the Workflow Task its name) where a branch
+execution can be thrown away, a speculative Workflow Task can be dropped as if it never existed.
+The overall strategy is to optimistically assume the speculative Workflow Task will go through, but
+if anything goes wrong, give up quickly and convert the speculative Workflow Task to a normal one.
+
+Zero database writes also means that transfer and regular timer tasks can't be used here. Instead,
+a special [in-memory-queue](./in-memory-queue.md) is used for speculative Workflow Task timeouts.
 
 > #### TODO
-> It is important to point out that `WorkflowTaskScheduled` and `WorkflowTaskStarted` events for transient
-> and speculative Workflow Task are added to `PollWorkflowTask` response only but not to `GetWorkflowExecutionHistory` response.
-> This has unpleasant consequence: when worker receives speculative Workflow Task on sticky task queue, but
-> Workflow is already evicted from cache, it sends a request to `GetWorkflowExecutionHistory`, which
-> returns history without speculative events, which leads to `premature end of stream` error on worker side.
-> It fails Workflow Task, clears stickiness, and everything works fine after that, but one extra failed Workflow Task appears
-> in the history. Fortunately, it doesn't happen often.
+> It is important to point out that the `WorkflowTaskScheduled` and `WorkflowTaskStarted` events
+> for transient and speculative Workflow Task are only added to `PollWorkflowTask` response - and
+> not to the `GetWorkflowExecutionHistory` response. This has an unpleasant consequence: when the 
+> worker receives a speculative Workflow Task on a sticky task queue, but the Workflow is already
+> evicted from its cache, it issues a `GetWorkflowExecutionHistory` request, which returns the 
+> history *without* speculative events. This leads to a `premature end of stream` error on the 
+> worker side. The worker fails the Workflow Task, clears stickiness, and everything works fine
+> after that - but a failed Workflow Task appears in the history. Fortunately, it doesn't happen often.
 
 ## Speculative Workflow Task & Workflow Update
-Speculative Workflow Task was introduced to support zero writes for Workflow Update, this is why it doesn't write 
-nor events, neither mutable state.
+Speculative Workflow Task was introduced to make it possible for Workflow Update to have zero writes
+for when it is rejected. This is why it doesn't persist any events or the mutable state.
 
 > #### TODO
-> Another application can be a replacement for query task that will unify two different
-> code paths (by introducing 3rd one and slowly deprecating existing two).
+> The task processig for Queries could be replaced by using speculative Workflow Tasks under the hood.
 
 ## Scheduling of Speculative Workflow Task
-Because currently speculative Workflow Task is used for Workflow Update only it is created in
-`UpdateWorkflowExecution` API handler only. And because a normal transfer task can't be created
-(speculative Workflow Task doesn't write to the database) it is directly added to matching service
-with a call to `AddWorkflowTask` API. It is crucial to notice that Workflow lock
-must be released before this call because in case of sync match, matching service will
-do a callback to history service to start Workflow Task. This call will also try to acquire Workflow lock.
-
-But call to matching can fail (for various reasons), and then this error can't be properly handled
-outside of Workflow lock or returned to the user. Instead, an in-memory timer task
-is created for `SCHEDULED_TO_START` timeout for speculative Workflow Task even if it is on normal task queue
-(for normal Workflow Task `SCHEDULED_TO_START` timeout timer is created only for sticky task queue).
-If call to matching failed, `UpdateWorkflowExecution` API caller will observe short delay
-(`SpeculativeWorkflowTaskScheduleToStartTimeout` = 5s), but underneath timeout timer will fire,
-convert speculative Workflow Task to normal and create a transfer task which will eventually push it through.
+As of today, speculative Workflow Tasks are only used for Workflow Update, i.e. in the 
+`UpdateWorkflowExecution` API handler. Since a normal transfer task can't be created (because that
+would require a database write), it is added directly to the Matching service with a call to the
+`AddWorkflowTask` API. 
+
+It is crucial to note that Workflow lock *must* be released before this call because in case of
+sync match, the Matching service will make a call to the history service to start the Workflow Task,
+which will attempt to get the Workflow lock and result in a deadlock.
+
+However, when the call to the matching service fails (e.g. due to networking issues), that error
+can't be properly handled outside of the Workflow lock, or returned to the user. In that case, the
+`UpdateWorkflowExecution` API caller will observe a short delay
+(`SpeculativeWorkflowTaskScheduleToStartTimeout` is 5s) until the timeout timer task fires, then
+the speculative Workflow Task is converted to a normal one and creates a transfer task which will
+eventually reach matching and the worker.
+
+The timeout timer task is is created for a `SCHEDULED_TO_START` timeout for every speculative
+Workflow Task - even if it is on normal task queue. In comparision, for a normal Workflow Task, the
+`SCHEDULED_TO_START` timeout timer is only created for sticky task queues.
 
 ## Start of Speculative Workflow Task
-Speculative Workflow Task's `WorkflowTaskScheduled` and `WorkflowTaskStarted` events are shipped on 
-`TransientWorkflowTask` field of `RecordWorkflowTaskStartedResponse` and merged to the history
-before shipping to worker. This code path is not different from transient Workflow Task.
+Speculative Workflow Task's `WorkflowTaskScheduled` and `WorkflowTaskStarted` events are shipped
+inside the `TransientWorkflowTask` field of `RecordWorkflowTaskStartedResponse` and are merged to 
+the history before it is shipped to the worker. It's the same code path as for the transient
+Workflow Task.
 
 ## Completion of Speculative Workflow Task
-### StartTime in the Token
-Because a server can lose a speculative Workflow Task, it will not always be completed. Moreover, new
-speculative Workflow Task can be created after the first one is lost, and then worker will try to complete the first one.
-To prevent this `StartedTime` was added to Workflow Task token and if it doesn't match to start time in mutable state,
-Workflow Task can't be completed. All other checks aren't necessary anymore, but left there just in case.
+
+### `StartTime` in the Token
+Because the server can lose a speculative Workflow Task, it will not always be completed. Moreover,
+a new speculative Workflow Task can be created after the first one is lost, but the worker will
+try to complete the first one. To prevent this, `StartedTime` was added to the Workflow Task token
+and if it doesn't match the start time in mutable state, the Workflow Task can't be completed.
 
 ### Persist or Drop
-While completing speculative Workflow Task server makes a decision: write speculative events followed by
-`WorkflowTaskCompleted` event or drop speculative events and make speculative Workflow Task disappear.
-Server can drop events only if it knows that this Workflow Task didn't change the Workflow state. Currently,
-conditions are (check `skipWorkflowTaskCompletedEvent()` func):
+While completing a speculative Workflow Task, the server makes a decision to either write the 
+speculative events followed by a `WorkflowTaskCompleted` event - or drop the speculative events and
+make the speculative Workflow Task disappear. The latter can only happen, if the server knows that
+this Workflow Task didn't change the Workflow state. Currently, the conditions are
+(check `skipWorkflowTaskCompletedEvent()` func):
  - response doesn't have any commands,
  - response has only Update rejection messages.
 
 > #### TODO
-> There is one more condition that forces speculative Workflow Task to be persisted: if there are
-> events in the history prior to speculative Workflow Task, which were shipped by this speculative Workflow Task
-> to the worker, then it can't be dropped. This is because an old version of some SDKs didn't
-> support getting same events twice which would happen when the server drops one speculative Workflow Task,
-> and then creates another with the same events. Now old SDKs support it, and with some 
-> compatibility flag, this condition can be omitted.
-
-When a server decides to drop a speculative Workflow Task, it needs to communicate this decision to SDK. 
-Because SDK needs to know where to roll back its history event pointer, i.e., after what event,
-all other events need to be dropped. SDK uses `ResetHistoryEventId` field on `RespondWorkflowTaskCompletedRespose`.
-Server set it to `LastCompletedWorkflowTaskStartedEventId` field value because
-SDK uses `WorkflowTaskStartedEventID` as history checkpoint.
+> There is one more special case: when the speculative Workflow Task contained other events
+> (e.g. activity scheduled), then it can't be dropped because they would need to be sent again in the
+> next Workflow Task, but older SDK versions don't support receiving the same events twice. A
+> compatibility flag is needed to safely allow SDKs to opt-in to this optimization.
 
-### Heartbeat
-Workflow Task can heartbeat. If worker completes Workflow Task with `ForceCreateNewWorkflowTask` is set to `true`
-then server will create new Workflow Task even there are no new events.
-Because currently speculative Workflow Task is used for Workflow Update only, it is very unlikely
-that speculative Workflow Task can be completed as a heartbeat. Update validation logic
-is supposed to be quick and worker should respond with a rejection or acceptance message.
-But if it happens, server will persist all speculative Workflow Task events and create new Workflow Task as normal.
+When the server decides to drop a speculative Workflow Task, it needs to communicate this decision to 
+the worker - because the SDK needs to roll back to a previous history event and drop all events after
+that one. To do that, the server will set the `ResetHistoryEventId` field on the
+`RespondWorkflowTaskCompletedResponse` to the mutable state's `LastCompletedWorkflowTaskStartedEventId`
+(since the SDK uses `WorkflowTaskStartedEventID` as its history checkpoint).
 
-> #### TODO
-> This is just a design decision, which can be changed later. Server can drop speculative Workflow Task
-> when it heartbeats and create new one as speculative too. No new events will be added to the
-> history which will save history events but also will decrease visibility of heartbeats.
+### Heartbeat
+Workflow Tasks can heartbeat: when the worker completes a Workflow Task with `ForceCreateNewWorkflowTask`
+set to `true`, the server will create a new Workflow Task even if there are no new events. Currently, 
+since speculative Workflow Tasks are only used for Workflow Update, it is very unlikely to occur here.
+The Update validation logic is supposed to be quick, and the worker is expected to respond with a
+rejection or acceptance message. If it does happen, the server will persist all speculative Workflow Task
+events and create a new Workflow Task as normal.
+
+> #### NOTE
+> This is a design decision, which could be changed later: instead, the server could drop the
+> speculative Workflow Task when it heartbeats and create a new speculative Workflow Task. No
+> new events would be added to the history - but heartbeats would not be visible anymore.
 
 ## Conversion to Normal Workflow Task
-Speculative Workflow Task is never written to the database. If, while speculative Workflow Task is executed,
-something triggers mutable state write (i.e., new events come in), then speculative Workflow Task is converted
-to normal, and then written to the database. `Type` field value is changed to `WORKFLOW_TASK_TYPE_NORMAL`,
-in-memory timer is replaced with normal persisted timer, and corresponding speculative Workflow Task
-`WorkflowTaskScheduled` and `WorkflowTaskStarted` events are written to the history
-(`convertSpeculativeWorkflowTaskToNormal()` func). 
+Speculative Workflow Tasks are never written to the database. However, if during the exection of a
+speculative Workflow Task, a mutable state write is required (i.e., a new events comes in), then the
+speculative Workflow Task is converted to a normal one, and then written to the database. This means
+the `Type` field value is changed to `WORKFLOW_TASK_TYPE_NORMAL`, an in-memory timer is replaced with
+a persisted timer, and the corresponding speculative Workflow Task `WorkflowTaskScheduled` and 
+`WorkflowTaskStarted` events are written to the history (`convertSpeculativeWorkflowTaskToNormal()`
+func). 
 
 ## Failure of Speculative Workflow Task
-Workflow Task failure indicates a bug in the Workflow code, SDK, or server.
-There are two major cases when a Workflow Task fails:
-1. Worker explicitly calls `RespondWorkflowTaskFailed` API,
-2. Worker calls `RespondWorkflowTaskCompleted` API, but there was error in request or while processing the request.
-
-When speculative Workflow Task is failing `WorkflowTaskFailed` event is written to the history (followed by
-`WorkflowTaskScheduled` and `WorkflowTaskStarted` events) because Workflow Task failure needs to be visible
-to Workflow author.
+A Workflow Task failure indicates a bug in the Workflow code, SDK, or server. The most common scenarios are:
+1. Worker calls `RespondWorkflowTaskFailed` API
+2. Worker calls `RespondWorkflowTaskCompleted` API, but there is an error in the request or
+   while processing the request.
 
-Speculative Workflow Task is retired the same way as normal Workflow Task, which means that it becomes a transitive Workflow Task:
-2nd failed attempt is not written to the history.
+When a speculative Workflow Task is failing, a `WorkflowTaskFailed` event is written to the history
+(followed by `WorkflowTaskScheduled` and `WorkflowTaskStarted` events) because a Workflow Task
+failure must be visible to the Workflow author. Then, it is retried the same way a normal
+Workflow Task is: it becomes a transitive Workflow Task.
 
 ## Speculative Workflow Task Timeout
-Speculative Workflow Task timeouts are enforced with special [in-memory timer queue](./in-memory-queue.md).
-Unlike for normal Workflow Task `SCHEDULE_TO_START` timeout timer is created if speculative Workflow Task
-is scheduled on both sticky and **normal** task queue. `START_TO_CLOSE` timer is created
- when a Workflow Task is started. There is only one timer exists for speculative Workflow Task at any given time.
-Pointer to that timer is stored inside mutable state because timer needs to be canceled
-when Workflow Task completes or fails: speculative Workflow Task timer can't be identified by `ScheduledEventID`
-because there might be another speculative Workflow Task with the same `ScheduledEventID`, and if not canceled
-can times out wrong Workflow Task. 
-
-The behaviour in timeout handler is similar to when a Workflow Task fails.
-First `WorkflowTaskScheduled` and `WorkflowTaskStarted` events are written to the history
-(because they were not written for speculative Workflow Task), then `WorkflowTaskTimeout` event.
-New Workflow Task is scheduled as normal, but because attempt count is increased,
-it automatically becomes a transient Workflow Task.
+Speculative Workflow Task timeouts are enforced with a special [in-memory timer queue](./in-memory-queue.md).
+
+A `SCHEDULE_TO_START` timeout timer is always created, regardless of whether a sticky or normal
+task queues is used. A normal Workflow Task will usually only do that for a sticky task queue.
+
+A `START_TO_CLOSE` timeout timer is created when a speculative Workflow Task is started. There is
+only one active timer at any given time. The timer needs to be canceled when the Workflow Task
+completes or fails, but it cannot be identified by its `ScheduledEventID` because there might be
+another speculative Workflow Task with the same `ScheduledEventID`, and it could time out the
+wrong Workflow Task. Therefore, a pointer to that timer is stored inside the mutable state.
+
+The behaviour in the timeout handler is similar to when a Workflow Task fails: first the
+`WorkflowTaskScheduled` and `WorkflowTaskStarted` events are written to the history (because they 
+were not written for speculative Workflow Task) and then the `WorkflowTaskTimeout` event. The new 
+Workflow Task is scheduled as normal, but because the attempt count is increased, it automatically
+becomes a transient Workflow Task.
 
 ## Replication of Speculative Workflow Task
-Speculative Workflow Task is not replicated. Worker can try to complete it in new cluster, and server
-will return `NotFound` error.
+Speculative Workflow Tasks are not replicated. The worker can try to complete it in new cluster,
+and the server will return a `NotFound` error.