Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Warn on unfinished workflow handlers #294

Merged
merged 3 commits into from
Jul 2, 2024

Conversation

cretz
Copy link
Member

@cretz cretz commented Jun 28, 2024

What was changed

  • Added Workflow.AllHandlersFinished bool property
  • Added HandlerUnfinishedPolicy enum
  • Added UnfinishedPolicy option to [WorkflowSignal] and [WorkflowUpdate] attributes and defaulted to WarnAndAbandon
  • Added unfinished policy option to WorkflowSignalDefinition and WorkflowUpdateDefinition classes (used by manual handler registration)
  • Updated workflow logic to send a warning to the logger if warnable signal and/or update handlers remain on workflow exit
  • Made sure to properly instantiate the workflow class before checking for update/query handler presence so manual ones can be defined in the constructor (not needed for signals because they are buffered, and also changing that part of current code is non-deterministic)
  • Added tests for all permutations of workflow exit (success, failure, cancel), handler (update, signal), unfinished policy (default/warn, abandon), workflow wait for all-complete (true, false), and handler registration approach (on method, manual)

Checklist

  1. Closes Warn or error when update handlers dangle across CAN or workflow exit #261

@cretz cretz requested a review from a team June 28, 2024 21:03
"handler attribute: " +
"`[WorkflowSignal(UnfinishedPolicy=HandlerUnfinishedPolicy.Abandon)]`. The " +
"following signals were unfinished (and warnings were not disabled for their " +
"handler): {Handlers}");

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't the handlers be returned as the own key and the message be constant? That way the message can be easily filtered for and any unfinished updates could be easily extracted? Basically I am asking why we can't treat this like other SDKs logs that use structured logging

Copy link
Member Author

@cretz cretz Jul 1, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you clarify what you mean by constant? This doesn't change if that's what you mean.

We did want to put the literal linked list of this information in the structured logger, but .NET forces linked list formatting and I am matching the Python PR by using JSON. I considered making a whole new data type, but then I would need to make the data type available as part of our public API and guarantee compatibility on it. We are not doing that in the Python PR (there is no extraction of handlers from the warning, though it could be easily done). .NET wasn't meant to add any features over the Python one, which is English message only w/ JSON appended. If we do want to expose data types for handlers we can, but I think that requires more thought and more API surface and more guarantees.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The message here has workflow ID substituted in no Workflow {Id} finished? That makes it harder to filter for occurrences of this log in a logging system since users can't just filter on a direct match.

Not sure you need to make the data type public, as long as the json format is consistent it could be parsed. You do have an UpdateInfo type maybe you could reuse that? I don't feel strongly about requiring the list to always be formatted in json, I would expect the format to be left to the logger, but if others do that's fine.

For example when we warn on unhandled signals in Go we return the list as part of a key-value. For updates I would do something similar (although we would also need to add UpdateID as well)

https://github.com/temporalio/sdk-go/blob/992d42745e67fb863a3ae4f88d4ce7e83ca54b9c/internal/internal_workflow.go#L666

Copy link
Member Author

@cretz cretz Jul 1, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The message here has workflow ID substituted in no Workflow {Id} finished? That makes it harder to filter for occurrences of this log in a logging system since users can't just filter on a direct match.

This is in addition to scoped values. The thing with .NET unlike other SDKs is that their scopes are often not logged. So we also added workflow ID to the message here. Arguably I should add workflow ID to all English messages from a workflow, but regardless, all workflows have a "scope" with a dictionary with this information and the logger can be configured with a format to expose this scoped information (but many don't, and it's not in there by default like it is in Go and Python).

would expect the format to be left to the logger, but if others do that's fine.

In .NET, there is not a good stable/format for list/array items. It is more important to make the English read well than to reuse .NET collection types in the structured log. So we can either do string or new data types w/ custom formatter. I think if users get to the point where they want to extract the structure out, we can expose entirely new, stable data types that reflect this structure (a list of two-field objects), but until we want to expose those data types, we can/should stick with string. And if/when we do want structured objects, we should also add them to the Python warning.

For example when we warn on unhandled signals in Go we return the list as part of a key-value. For updates I would do something similar (although we would also need to add UpdateID as well)

This is more than a list, this is a list of objects with two fields each. The question becomes whether we need a stable data structure for that. I don't think we do. For instance, when Go SDK has the code for this, you will likely need to choose between a new object to get formatting right (that users probably can't access unless we're talking exposure), or JSON formatting. Though Go may be lucky enough to have a logger that defaults to a reasonable format for lists, but .NET and others don't necessarily.

Copy link
Member Author

@cretz cretz Jul 1, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am going to refactor to a new data type that I just won't expose. Users won't really be able to do anything with it except via reflection (but if we decide to expose this data type later we can).

EDIT: There is now a new data type but users can't do anything with really besides reflection. We can't make it extend/implement anything useful like collection, because .NET formatting is short-circuiting that in its formatting logic.

src/Temporalio/Worker/WorkflowInstance.cs Outdated Show resolved Hide resolved
{
if (warnableExecution is { } toRemove)
{
warnableInProgressSignals.Remove(toRemove);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note that this is just going to remove the first entry that has the same name (since signal name is not unique among handler executions). So using AddLast above is arguably misleading, since the order loses meaning via this call. A MultiSet (i.e. collections.Counter in Python might be a good choice for in-progress signals here. In my PR I gave them a unique ID but perhaps that was unnecessary.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note that this is just going to remove the first entry that has the same name (since signal name is not unique among handler executions)

Not true, note that toRemove is a LinkedListNode. It has nothing to do with the contents of the node. This collection was specifically chosen for the cheap node removal by node that .NET offers.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ohhh, right. Yep I was thinking warnableExecution was the string signal name.

Comment on lines +2289 to +2290
private static readonly Action<ILogger, string, WarnableUpdates, Exception?> UpdateWarning =
LoggerMessage.Define<string, WarnableUpdates>(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could maybe combine these two warning functions generically to save on duping the boilerplate part of the message. Not a big deal.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 In this case I am matching the contents of temporalio/sdk-python#556, but can make this change generally if needed.

@cretz cretz merged commit 72f6fb2 into temporalio:main Jul 2, 2024
7 checks passed
@cretz cretz deleted the unfinished-handler-warning branch July 2, 2024 12:49
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Warn or error when update handlers dangle across CAN or workflow exit
4 participants